Seedance 2.0 Reference-to-Video with LLM Prompt
seedance
text to video
video generation
0
94
Nodes & Models
LLM_floyo
Seedance20FastReferenceToVideoFal_floyo
VideoToFrames
LoadImage
WorkflowGraphics
VHS_VideoCombine
VHS_VideoCombine
VHS_VideoCombine
Cinematic video generation with Seedance 2.0 Fast, paired with an LLM prompt builder.
Upload up to nine reference images. Write your scene in plain prose, or use labeled sections (visual, dialog, audio, camera, style). The LLM rewrites it into Seedance's exact prompt format. Then Seedance 2.0 generates the video with native audio.
Default output is 8 seconds at 720p, 16:9. The video gets re-encoded as h265 mp4 with the audio track preserved.
How do you use Seedance 2.0 reference-to-video?
Upload your reference images. Write a scene in prose or labeled sections like [VISUAL], [DIALOG], [AUDIO], [CAMERA], [STYLE]. The LLM converts your input into Seedance's format with proper dialog quoting, layered audio cues, and reference image tags. Set duration, resolution, aspect ratio, and audio toggle. Run.
Reference images Up to nine slots. The LLM tags them as [Image1], [Image2], [Image3] in the final prompt so Seedance knows what to anchor on. Use them for characters you want to keep consistent, environments you want to match, or lighting moods you want to carry through. Want one strong reference? Use a single image and leave the rest empty. Want a multi-element scene? Spread your references across slots and tell the LLM which is which in your prompt.
Prompt Two ways to write it. Plain prose works fine: describe the scene, the motion, the mood. Or use labeled sections for tighter control: [VISUAL] for the scene, [DIALOG] for any spoken lines, [AUDIO] for sound design, [CAMERA] for the move, [STYLE] for the visual treatment. The LLM handles the formatting Seedance expects, including dialog quote rules and safe rewrites of risky language.
LLM model Gemini 2.5 Flash by default. Fast and good at structured prompt rewriting. Swap to a stronger model if your prompts have heavy dialog or layered audio that needs more careful handling.
Duration 8 seconds is the default and the sweet spot for cinematic shots. Want a quick gesture or beat? Try 5s. Want a longer establishing sequence? Push to 10s. Note that Seedance pacing tightens as duration drops, so short clips feel snappier.
Resolution 720p is the default. 1080p for final delivery, 480p for fast iteration when you're testing prompts.
Aspect ratio 16:9 for cinematic and wide compositions. 9:16 for vertical short-form. 1:1 for square posts.
Generate audio On by default. Seedance 2.0 generates audio natively, and the LLM layers ambient sound, foreground effects, and music tone into the prompt for you. Turn it off if you're scoring the clip yourself in post.
Seed Randomize to explore variations of the same prompt. Lock a seed when you want to compare different prompt edits without motion drifting.
What is Seedance 2.0 reference-to-video good for?
Cinematic short clips where visual continuity matters. Reference images keep characters, environments, and art direction locked while Seedance handles motion and audio. Strong fit for anime sequences, film-style action shots, music videos, concept trailers, and any 5-to-10-second beat where mood and look outweigh long narrative.
Best for cinematic moments built around a specific look. Anime fight sequences, atmospheric establishing shots, character-driven beats, action cuts with a specific lighting mood. The LLM prompt builder helps when you want layered audio and clean camera language but don't want to memorize Seedance's exact format.
Skip this for long-form narrative or talking-head dialog scenes. Eight seconds caps your storytelling, and Seedance's audio is better at ambient and music tone than it is at extended dialog. For multi-shot sequences, generate clips one at a time and edit them together.
Also skip this if you have no reference images and want pure text-to-video. The reference-to-video setup is built around image anchoring. A plain text-to-video workflow will be faster and simpler.
FAQ
What's the difference between Seedance 2.0 reference-to-video and text-to-video? Reference-to-video uses your uploaded images as visual anchors for characters, environments, and style. Text-to-video generates everything from the prompt alone. Reference-to-video gives you tighter control over what your subject and scene look like. Text-to-video gives Seedance more freedom to interpret.
How many reference images can Seedance 2.0 take? Up to nine images, three videos, and three audio clips. Most shots only need one or two image references. The LLM tags them in order as [Image1], [Image2], [Image3] so Seedance knows which reference plays which role.
Does Seedance 2.0 generate audio natively? Yes. Audio generation is built in and on by default in this workflow. The LLM layers your audio cues (ambient base, sound effects, music tone) into the prompt so the output has a complete sound bed. Turn it off if you plan to score the clip yourself.
What durations does Seedance 2.0 Fast support? Common options are 5, 8, and 10 seconds. Eight is the default and the sharpest balance of motion and pacing. Shorter clips feel snappier. Longer clips give more breathing room for camera moves and atmosphere.
Why does the workflow use an LLM before Seedance? Seedance 2.0 has specific formatting rules: dialog needs straight double quotes with speaker tags, audio cues belong in a specific spot, camera moves draw from a fixed vocabulary. The LLM handles all of that automatically and applies safe rewrites if your prompt has risky language. You write naturally; the LLM produces the prompt Seedance wants.
How to run Seedance 2.0 online? You can run Seedance 2.0 online through Floyo. No installation, no setup. Open the workflow in your browser, upload your references, write your scene, and hit run.
Read more

