5-Layer Prompt Stack
No matter how creative the idea is, Seedance will still drift into slop if the prompt does not match the way the model reads camera, lighting, motion, and constraints. In practice, it behaves less like a generic text box and more like a compact film set that expects directed shot language. If you only give it plain prose, the subject gets soft, the camera starts guessing, and style words sit on top without really controlling the shot.
This page focuses on the prompting system itself, not account setup or access. The goal is simple: make it clear how Seedance reads a shot, which words tend to stabilize output, and how to structure prompts so subject identity, motion, framing, and quality constraints reinforce each other instead of fighting each other.
Why slop keeps happening
Most weak Seedance generations come from prompt architecture problems, not idea problems.
- If the subject is abstract, identity and silhouette drift quickly.
- If the action is really just mood description, the result behaves like a still image.
- If subject motion and camera motion are mixed together, the model has to guess what is supposed to move.
- If the prompt stacks vague words like
cinematic,beautiful, orepic, it adds flavor without really directing the shot. - If the prompt has no constraints, jitter, flicker, bent limbs, stretching, and identity drift show up fast.
The core point is that Seedance reacts more like a model reading shooting instructions than a model answering a search query.
What this framework is for
Seedance works better when you think of it as a multimodal production setup rather than a plain text-to-video box. In one generation, you can combine:
- Up to
9image references - Up to
3video references - Up to
3audio references - A text prompt
Image references are useful for character sheets, moodboards, product shots, or storyboard panels. Video references are strong for camera motion, choreography, and pacing. Audio references help lock direction for music, voiceover, or sound effects.
Results are usually more stable when @Image, @Video, and @Audio each have a clear job. If you are typing text only, you are using only part of what makes the system strong.
Core structure
Subject -> Action -> Camera -> Style -> ConstraintsOfficial material may present a broader formula, but in practice this 5-layer structure is one of the most reliable ways to keep Seedance outputs coherent. The order matters for a reason.
Subjectlocks identity first.Actiontells the model that the shot should move instead of sit like a still.Camerafixes framing and point of view.Styleadds flavor later so it interferes less with motion.Constraintsclose the gaps with guardrails.
Even before you tune anything else, just keeping these layers separate usually improves consistency.
1. Subject
Subjects get weak fast when they stay abstract. Even a short line works better when it contains specific visual anchors.
Bad: a woman
Good: a young woman with brown hair
Best: late 20s woman, tight dark curls at ear length, small silver hoop in left ear, fitted black turtleneck, neutral expression- One subject per generation is the safest default.
- Two can work when they are separated spatially and labeled clearly, such as
@Character_Aand@Character_B. - Three or more often increases identity confusion sharply.
- If face consistency matters, define at least a few stable features such as hair, clothing, age range, expression, or accessories.
The job of the subject layer is not poetic writing. Its job is to leave the model with reusable visual anchors.
2. Action
Action should describe visible movement, not mood. Use present tense and keep each shot focused on one major motion.
Bad: she looks happy and is enjoying the sunset
Good: she slowly turns toward the camera, breeze lifting the hem of her skirt, eyes narrowing against the lightThe most important rule is to separate subject motion from camera motion.
Bad: spinning camera around a dancing person
Good: the dancer spins slowly, camera holds fixed framing- The subject line says what the person or object does.
- The camera line says how the shot observes that action.
- One major action per shot is the safest default.
- Verbs like
turns,walks,sprints,leans,reaches, andjumpsusually land better than mood-heavy phrasing.
That separation alone usually makes shot interpretation more stable.
3. Camera
This is one of the biggest quality levers in Seedance. Use one primary camera move per generation, and describe rhythm, direction, and control rather than technical camera-sheet details.
Common camera keywords
- Static shots:
fixed,locked-off,static wide,locked tripod - Moving shots:
push-in,dolly in,pull-out,dolly out,pan left,pan right,tracking shot,orbit,arc,360 orbit,aerial,drone shot,handheld,crane up,crane down,gimbal,steadicam walk,whip pan,dolly zoom,rack focus
Speed modifiers
- Very slow:
imperceptible,barely - Safe defaults:
slow,gentle,gradual - Stable motion:
smooth,controlled - Stronger motion:
dynamic,swift
fast is one of the riskiest words in the box. If something needs to move fast, let one element move fast and keep the rest stable.
When you need a compound move, write it in time order instead of stacking everything together.
start: slow dolly-in, then: gentle pan right for the final 2 secondsAdditional practical rules:
- Prefer shot rhythm over camera specs like
f-stop,ISO, or lens math. handheldadds energy, but it also raises the chance of visible instability, so pair it with constraints.whip pan,dynamic, andswiftcan work, but they fail loudly when the rest of the shot is not controlled.
4. Style
Style works best when lighting comes first. In Seedance, lighting has outsized influence on perceived quality. A word like cinematic is usually too vague on its own. Specific lighting, color, and film texture cues land more reliably.
Useful lighting and style modifiers
golden hourrim lightdramatic rim lightsoft key from 45 degreesovercast daylightbacklit silhouette at sunsetvolumetric fogchiaroscuro35mm16mmanamorphic lens flare
This kind of bundle tends to work better:
cinematic film tone, 35mm, warm golden lightingBy contrast, words like glow, glimmer, and glints can introduce flicker artifacts, so they are worth using carefully.
The real job of the style layer is not decoration. It is to define light direction, tone, and surface texture.
5. Constraints
Constraints are the last safety layer. They matter most when you want a shot to feel less synthetic or when character consistency is critical.
Base constraint set
avoid jitteravoid bent limbsavoid identity driftavoid temporal flickerno distortionno stretchingmaintain face consistency
Common quality suffix
sharp claritynatural colorsstable pictureno blurno ghostingno flickering
Constraints are usually cleanest at the very end of the prompt. For character shots, avoid bent limbs and maintain face consistency are close to default safety settings.
Time-coded multi-shot prompts
When you want several beats inside a short clip, time coding is the clearest structure. This is also one of the areas where Seedance can feel noticeably stronger. Give each segment its own subject, action, camera, and lighting state, then specify transitions explicitly.
(0-3s) late 20s woman, tight dark curls, black turtleneck. She stands still, then slowly turns toward camera. Fixed framing. Golden hour rim light. Avoid jitter, maintain face consistency.
hard cut to
(3-7s) she breaks into a sprint through shallow water, coat lifting behind her. Side tracking shot, slow then controlled acceleration. Overcast daylight, natural colors. Avoid temporal flicker, no distortion.
seamless morph into
(7-12s) she jumps onto a rooftop ledge and holds for one beat. Gentle push-in. Backlit sunset silhouette. Sharp clarity, stable picture.- Give each shot one clear job.
- Use transitions like
hard cut toorseamless morph intoinstead of leaving the handoff ambiguous. - Re-state camera and lighting state in each segment rather than assuming the model will carry them perfectly.
- It is usually better to split motion into 2-4 second chunks than to overload a single segment.
@ reference system
The most believable results usually come from clearer reference roles, not just more files. In practice, 6-12 references can work well when every file has one job.
@Image1 = first frame@Image2 = last frame@Video1 = camera motion reference@Video2 = pacing reference@Audio1 = background music@Audio2 = voiceover
If you use many references, stability usually improves when each one owns a single responsibility.
Reusable template
Single shot
[Subject]. [Action]. [Camera]. [Style]. [Constraints].Example
late 20s woman, tight dark curls at ear length, fitted black turtleneck, neutral expression. She slowly turns toward the camera, breeze lifting the edge of her coat. Gentle push-in, fixed eye-level framing. Golden hour rim light, cinematic film tone, 35mm. Avoid jitter, avoid identity drift, maintain face consistency, sharp clarity, natural colors.Multi-shot
(0-3s) [Subject]. [Action]. [Camera]. [Style]. [Constraints].
hard cut to
(3-6s) [Subject]. [Action]. [Camera]. [Style]. [Constraints].Quick checklist
- Is the subject visually specific enough to stay stable
- Is the action written as visible motion instead of mood
- Is there only one main camera move
- Is style defined through lighting and texture rather than vague praise words
- Are the constraints placed at the end
- If references exist, does each file own one role
Keep these rules
- Never mix subject motion and camera motion in the same instruction.
- Pick one camera move and start from
slow,gentle, orcontrolled. - Replace vague mood words with specific lighting and film texture.
- Put constraints last so they can catch jitter, distortion, and identity drift.
- If you use references, give each file exactly one role.
Those five habits alone reduce a large amount of output variance from the same idea. Seedance does not reward magic wording nearly as much as it rewards clear shot architecture. In practice, that means the real skill is keeping Subject -> Action -> Camera -> Style -> Constraints intact from start to finish.