Skip to main content

Image2Video

Image2Video

See example prompts in the gallery.

Pricing for video generation.

Overview

Astria Image2Video supports a wide catalog of image-to-video, text-to-video, reference-to-video, and motion-control video models. Pick a video_model and pass a video_prompt describing the motion; Astria renders the first frame using the prompt's image stage (the chosen tune + text) and animates it.

Form fields

Pass these alongside text on POST /tunes/:id/prompts:

video_model (required)

The model to animate with. See the enum table below.

video_prompt (required)

Describes camera movement, scene, and object interactions. Should not include the token or LoRA used by the image prompt.

Example: <lora:1533312:1.0> ohwx woman hiking in the alps (in text) + Woman looking at the camera, smiling, puts hands on her hips, confident (in video_prompt).

video_duration (optional)

Integer seconds. Allowed values depend on the chosen model — see the table.

video_first_frame (optional)

Multipart image upload. When provided it overrides the image-stage render and text is no longer required.

video_last_frame (optional)

Multipart image upload for first+last keyframe models.

input_video (optional)

Multipart video upload. Required for motion-control models.

aspect_ratio (optional)

Forwarded to the video model when it supports an aspect-ratio knob (e.g. HappyHorse text-to-video, Kling, Wan, LTX). For image-to-video the aspect ratio is derived from the input image.

Example

curl -X POST -H "Authorization: Bearer $API_KEY" \
https://api.astria.ai/tunes/$TUNE_ID/prompts \
-F 'prompt[text]=<lora:1533312:1.0> A highly detailed image of thoughtful ohwx woman exploring a hidden urban garden' \
-F 'prompt[video_model]=seedance_v15_720p' \
-F 'prompt[video_prompt]=Woman looking at the camera, smiling, puts hands on her hips, confident' \
-F 'prompt[video_duration]=5' \
-F 'prompt[aspect_ratio]=16:9'

Text-to-video

Models in the text-to-video group accept a video_prompt with no first frame and no text — the model generates the video directly. Currently: seedance2_*, happyhorse_720p, happyhorse_1080p.

curl -X POST -H "Authorization: Bearer $API_KEY" \
https://api.astria.ai/tunes/$TUNE_ID/prompts \
-F 'prompt[video_model]=happyhorse_720p' \
-F 'prompt[video_prompt]=A lone explorer walks across endless dunes at sunrise, cinematic.' \
-F 'prompt[video_duration]=5' \
-F 'prompt[aspect_ratio]=16:9'

Motion control

Motion-control models replay the motion from input_video while restyling the subject. Models: kling30_motion_control, kling30_motion_control_pro, wan_animate_720p, dreamactor_m2, happyhorse_motion_control.

curl -X POST -H "Authorization: Bearer $API_KEY" \
https://api.astria.ai/tunes/$TUNE_ID/prompts \
-F 'prompt[text]=ohwx man <faceid:123>' \
-F 'prompt[video_model]=kling30_motion_control_pro' \
-F 'prompt[video_prompt]=match the dance moves' \
-F 'prompt[video_duration]=10' \
-F 'prompt[input_video]=@/path/to/reference.mp4'

Models

Cost is the per-prompt charge in cents at the base duration listed for the model; videos longer than the base scale linearly. _audio variants generate an audio track.

video_modelbase cost (¢)video_duration
seedance_480p102–12
seedance_v15_720p144–12
seedance_v15_audio_720p294–12
seedance2_fast_480p604–15
seedance2_fast_720p1404–15
seedance2_480p1204–15
seedance2_720p2804–15
seedance2_1080p4504–15
wan22_720p435
wan22_fast_480p65
wan22_fast_580p85
wan22_fast_720p115
wan25_720p535, 10
wan26_720p535, 10, 15
wan26_1080p795, 10, 15
wan27_720p555, 10, 15
wan27_1080p835, 10, 15
wan_animate_720p4410
ltx23_720p175, 10, 15, 20
ltx23_1080p225, 10, 15, 20
kling25395, 10
kling30_standard923–15
kling30_standard_audio1393–15
kling30_pro1233–15
kling30_pro_audio1853–15
kling30_4k2633–15
kling30_motion_control27710
kling30_motion_control_pro37010
cinematic_video845, 10, 15
dreamactor_m22910
happyhorse_720p773–10
happyhorse_1080p1323–10
happyhorse_motion_control15410
veo31_fast_720p854, 6, 8
veo31_fast_audio_720p1264, 6, 8
veo31_fast_1080p854, 6, 8
veo31_fast_audio_1080p1264, 6, 8
veo31_fast_4k2648
veo31_fast_audio_4k3088
veo31_lite_720p444, 6, 8
veo31_lite_audio_720p444, 6, 8
veo31_lite_1080p714, 6, 8
veo31_lite_audio_1080p714, 6, 8

Capability matrix

  • Text-to-video (no first frame required): seedance2_*, happyhorse_720p, happyhorse_1080p.
  • Multi-reference images: seedance2_*, happyhorse_720p, happyhorse_1080p.
  • First+last keyframe (video_last_frame): seedance_v15_*, seedance2_*, wan21_*, wan26_*, wan27_*, wan_fast_*, ltx23_*, kling*, veo31_*, hailuo*.
  • Motion control (requires input_video): kling30_motion_control*, wan_animate_720p, dreamactor_m2, happyhorse_motion_control.

Backwards compatibility

The pre-2026-04 syntax embedding flags inside text is still accepted and promoted to the new columns server-side:

<lora:1533312:1.0> ohwx woman hiking in the alps --video --video_model seedance_v15_720p --duration 5 --video_prompt "Woman looking at the camera, smiling, confident"

New integrations should prefer the dedicated form fields.