5-Layer Prompt Stack

No matter how creative the idea is, Seedance will still drift into slop if the prompt does not match the way the model reads camera, lighting, motion, and constraints. In practice, it behaves less like a generic text box and more like a compact film set that expects directed shot language. If you only give it plain prose, the subject gets soft, the camera starts guessing, and style words sit on top without really controlling the shot.

This page focuses on the prompting system itself, not account setup or access. The goal is simple: make it clear how Seedance reads a shot, which words tend to stabilize output, and how to structure prompts so subject identity, motion, framing, and quality constraints reinforce each other instead of fighting each other.

Why slop keeps happening

Most weak Seedance generations come from prompt architecture problems, not idea problems.

If the subject is abstract, identity and silhouette drift quickly.
If the action is really just mood description, the result behaves like a still image.
If subject motion and camera motion are mixed together, the model has to guess what is supposed to move.
If the prompt stacks vague words like cinematic, beautiful, or epic, it adds flavor without really directing the shot.
If the prompt has no constraints, jitter, flicker, bent limbs, stretching, and identity drift show up fast.

The core point is that Seedance reacts more like a model reading shooting instructions than a model answering a search query.

What this framework is for

Seedance works better when you think of it as a multimodal production setup rather than a plain text-to-video box. In one generation, you can combine:

Up to 9 image references
Up to 3 video references
Up to 3 audio references
A text prompt

Image references are useful for character sheets, moodboards, product shots, or storyboard panels. Video references are strong for camera motion, choreography, and pacing. Audio references help lock direction for music, voiceover, or sound effects.

Results are usually more stable when @Image, @Video, and @Audio each have a clear job. If you are typing text only, you are using only part of what makes the system strong.

Core structure

Subject -> Action -> Camera -> Style -> Constraints

Official material may present a broader formula, but in practice this 5-layer structure is one of the most reliable ways to keep Seedance outputs coherent. The order matters for a reason.

Subject locks identity first.
Action tells the model that the shot should move instead of sit like a still.
Camera fixes framing and point of view.
Style adds flavor later so it interferes less with motion.
Constraints close the gaps with guardrails.

Even before you tune anything else, just keeping these layers separate usually improves consistency.

1. Subject

Subjects get weak fast when they stay abstract. Even a short line works better when it contains specific visual anchors.

Bad: a woman
Good: a young woman with brown hair
Best: late 20s woman, tight dark curls at ear length, small silver hoop in left ear, fitted black turtleneck, neutral expression

One subject per generation is the safest default.
Two can work when they are separated spatially and labeled clearly, such as @Character_A and @Character_B.
Three or more often increases identity confusion sharply.
If face consistency matters, define at least a few stable features such as hair, clothing, age range, expression, or accessories.

The job of the subject layer is not poetic writing. Its job is to leave the model with reusable visual anchors.

2. Action

Action should describe visible movement, not mood. Use present tense and keep each shot focused on one major motion.

Bad: she looks happy and is enjoying the sunset
Good: she slowly turns toward the camera, breeze lifting the hem of her skirt, eyes narrowing against the light

The most important rule is to separate subject motion from camera motion.

Bad: spinning camera around a dancing person
Good: the dancer spins slowly, camera holds fixed framing

The subject line says what the person or object does.
The camera line says how the shot observes that action.
One major action per shot is the safest default.
Verbs like turns, walks, sprints, leans, reaches, and jumps usually land better than mood-heavy phrasing.

That separation alone usually makes shot interpretation more stable.

3. Camera

This is one of the biggest quality levers in Seedance. Use one primary camera move per generation, and describe rhythm, direction, and control rather than technical camera-sheet details.

Common camera keywords

Static shots: fixed, locked-off, static wide, locked tripod
Moving shots: push-in, dolly in, pull-out, dolly out, pan left, pan right, tracking shot, orbit, arc, 360 orbit, aerial, drone shot, handheld, crane up, crane down, gimbal, steadicam walk, whip pan, dolly zoom, rack focus

Speed modifiers

Very slow: imperceptible, barely
Safe defaults: slow, gentle, gradual
Stable motion: smooth, controlled
Stronger motion: dynamic, swift

fast is one of the riskiest words in the box. If something needs to move fast, let one element move fast and keep the rest stable.

When you need a compound move, write it in time order instead of stacking everything together.

start: slow dolly-in, then: gentle pan right for the final 2 seconds

Additional practical rules:

Prefer shot rhythm over camera specs like f-stop, ISO, or lens math.
handheld adds energy, but it also raises the chance of visible instability, so pair it with constraints.
whip pan, dynamic, and swift can work, but they fail loudly when the rest of the shot is not controlled.

4. Style

Style works best when lighting comes first. In Seedance, lighting has outsized influence on perceived quality. A word like cinematic is usually too vague on its own. Specific lighting, color, and film texture cues land more reliably.

Useful lighting and style modifiers

golden hour
rim light
dramatic rim light
soft key from 45 degrees
overcast daylight
backlit silhouette at sunset
volumetric fog
chiaroscuro
35mm
16mm
anamorphic lens flare

This kind of bundle tends to work better:

cinematic film tone, 35mm, warm golden lighting

By contrast, words like glow, glimmer, and glints can introduce flicker artifacts, so they are worth using carefully.

The real job of the style layer is not decoration. It is to define light direction, tone, and surface texture.

5. Constraints

Constraints are the last safety layer. They matter most when you want a shot to feel less synthetic or when character consistency is critical.

Base constraint set

avoid jitter
avoid bent limbs
avoid identity drift
avoid temporal flicker
no distortion
no stretching
maintain face consistency

Common quality suffix

sharp clarity
natural colors
stable picture
no blur
no ghosting
no flickering

Constraints are usually cleanest at the very end of the prompt. For character shots, avoid bent limbs and maintain face consistency are close to default safety settings.

Time-coded multi-shot prompts

When you want several beats inside a short clip, time coding is the clearest structure. This is also one of the areas where Seedance can feel noticeably stronger. Give each segment its own subject, action, camera, and lighting state, then specify transitions explicitly.

(0-3s) late 20s woman, tight dark curls, black turtleneck. She stands still, then slowly turns toward camera. Fixed framing. Golden hour rim light. Avoid jitter, maintain face consistency.
 
hard cut to
 
(3-7s) she breaks into a sprint through shallow water, coat lifting behind her. Side tracking shot, slow then controlled acceleration. Overcast daylight, natural colors. Avoid temporal flicker, no distortion.
 
seamless morph into
 
(7-12s) she jumps onto a rooftop ledge and holds for one beat. Gentle push-in. Backlit sunset silhouette. Sharp clarity, stable picture.

Give each shot one clear job.
Use transitions like hard cut to or seamless morph into instead of leaving the handoff ambiguous.
Re-state camera and lighting state in each segment rather than assuming the model will carry them perfectly.
It is usually better to split motion into 2-4 second chunks than to overload a single segment.

@ reference system

The most believable results usually come from clearer reference roles, not just more files. In practice, 6-12 references can work well when every file has one job.

@Image1 = first frame
@Image2 = last frame
@Video1 = camera motion reference
@Video2 = pacing reference
@Audio1 = background music
@Audio2 = voiceover

If you use many references, stability usually improves when each one owns a single responsibility.

Reusable template

Single shot

[Subject]. [Action]. [Camera]. [Style]. [Constraints].

Example

late 20s woman, tight dark curls at ear length, fitted black turtleneck, neutral expression. She slowly turns toward the camera, breeze lifting the edge of her coat. Gentle push-in, fixed eye-level framing. Golden hour rim light, cinematic film tone, 35mm. Avoid jitter, avoid identity drift, maintain face consistency, sharp clarity, natural colors.

Multi-shot

(0-3s) [Subject]. [Action]. [Camera]. [Style]. [Constraints].
hard cut to
(3-6s) [Subject]. [Action]. [Camera]. [Style]. [Constraints].

Quick checklist

Is the subject visually specific enough to stay stable
Is the action written as visible motion instead of mood
Is there only one main camera move
Is style defined through lighting and texture rather than vague praise words
Are the constraints placed at the end
If references exist, does each file own one role

Keep these rules

Never mix subject motion and camera motion in the same instruction.
Pick one camera move and start from slow, gentle, or controlled.
Replace vague mood words with specific lighting and film texture.
Put constraints last so they can catch jitter, distortion, and identity drift.
If you use references, give each file exactly one role.

Those five habits alone reduce a large amount of output variance from the same idea. Seedance does not reward magic wording nearly as much as it rewards clear shot architecture. In practice, that means the real skill is keeping Subject -> Action -> Camera -> Style -> Constraints intact from start to finish.

MINventory

Explorer

5-Layer Prompt Stack

5-Layer Prompt Stack

Why slop keeps happening

What this framework is for

Core structure

1. Subject

2. Action

3. Camera

Common camera keywords

Speed modifiers

4. Style

Useful lighting and style modifiers

5. Constraints

Base constraint set

Common quality suffix

Time-coded multi-shot prompts

@ reference system

Reusable template

Single shot

Example

Multi-shot

Quick checklist

Keep these rules

Graph View

Table of Contents

Backlinks