Subject and Action
This page has been rebuilt with unique real photographic references instead of abstract diagrams.
The subject-and-action layer gives an AI video prompt its first readable structure: who is in frame, what they are doing, and how that behavior is expressed.
subject
subject is the main entity in the frame. Before you decide on lens, framing, or lighting, you usually need to lock the subject first.
.jpg?width=1200)
- Prompt fragment:
an elegant man with a laptop, subject isolated, editorial portrait - Real reference: Elegant man with a laptop (Unsplash)
action
action describes what the subject is physically doing right now. A still pose, a walk, and a run create very different energy even before camera movement is added.

- Prompt fragment:
woman running along the waterfront, dynamic action, candid movement - Real reference: Woman running
expression
expression is the emotional signal on the face. It changes genre, tone, and audience perception fast, even when the rest of the setup stays the same.
.jpg?width=1100)
- Prompt fragment:
thoughtful expression, eyes looking off frame, calm mood - Real reference: Elegant woman by storefront (Unsplash)
gesture
gesture covers the smaller body-language layer, especially hands, wrists, and arm direction. It matters a lot in product shots, demonstrations, and intimate detail shots.
.jpg?width=1200)
- Prompt fragment:
delicate hand gesture holding a candle jar, intimate close detail - Real reference: Hands (Unsplash)
interaction
interaction is the visible relationship between people, or between a person and an object. It creates context and narrative much faster than a single isolated figure.
.jpg?width=1400)
- Prompt fragment:
young people in conversation, natural interaction, warm social energy - Real reference: Young people in conversation (Unsplash)
pose
pose is the full-body arrangement: weight distribution, leg angle, shoulder direction, and silhouette. It is one of the fastest ways to shape character presentation.

- Prompt fragment:
full-body pose, one leg forward, relaxed editorial stance - Real reference: Man standing in court (Unsplash),
CC0, cropped
Summary
The main practical rule here is not to collapse all of these into one vague prompt phrase.
subject: who is visibleaction: what they are doingexpression: what emotion the face carriesgesture: how the hands and arms behaveinteraction: what relationship is visiblepose: what the full-body silhouette is doing
In real prompting, subject + action + expression is often enough to stabilize the scene before you add lens, lighting, or movement.