The Ultimate Midjourney Prompt Formula (With Examples)
Affiliate disclosure: some links below are affiliate links. If you sign up through them, captainsmeta may earn a small commission at no extra cost to you.
The Ultimate Midjourney Prompt Formula (With Examples)
Most Midjourney prompts read like Google searches: “cat in space.” Then people wonder why the output looks like everyone else’s. The pros write prompts that read like art direction — and the difference in output is enormous.
There’s no magic. There’s a structure. Once you internalize it, you stop “trying random words” and start directing an image. Here’s the six-layer formula and exactly how to use it.
The formula
ElevenLabs
- Studio-grade AI voices in 30+ languages
- Clone your own voice in minutes
- Perfect for faceless videos & audiobooks
[Subject] + [Action / Pose] + [Setting] + [Lighting] + [Camera / Lens] + [Style + Finish]
That’s it. Stack the layers in roughly that order, be specific in each, and your output quality jumps a full level — without changing tools or paying more.
Layer 1: Subject (who/what)
Be specific. Not “a woman” → “a woman in her 30s, short dark hair, focused expression, wearing a navy wool coat.” Detail in the subject = a defined image instead of a generic one. (For ongoing characters, lock the subject across images with the workflow in How to Get Consistent Characters in Midjourney.)
Layer 2: Action / pose
Give the subject one clear action. “Reaching for a coffee cup,” “leaning against a doorway, looking out a window,” “running through wet streets at night.” One action beats vague atmosphere — Midjourney renders single intentions far better than chaotic scenes.
Layer 3: Setting
Where this happens, with sensory detail. “A quiet café with brass fixtures and afternoon sun through tall windows.” “A neon-soaked Tokyo alley with rain on the pavement.” Setting words do enormous lifting.
Layer 4: Lighting (the secret sauce)
This is where “cinematic” actually lives. Real lighting language: golden hour, blue hour, soft diffused window light, hard rim light, neon glow, volumetric god rays, overcast. Just adding a precise lighting phrase often saves a flat image.
Layer 5: Camera / lens
This is what separates a snapshot from cinema. Medium shot, close-up, wide establishing shot, low-angle, over-the-shoulder, 35mm lens, 50mm, 85mm portrait lens, shallow depth of field, anamorphic. Camera words shape framing and feel as much as content.
Layer 6: Style + finish
Land the look: cinematic, editorial, documentary, painterly, retro film, hyperrealistic, 35mm film grain, teal-and-orange color grade, soft contrast. Reuse the same style words across a set so images feel related.
Worked example
Stack all six layers:
Subject: a woman in her 30s, short dark hair, focused expression, wearing a navy wool coat Action: walking briskly down a wet sidewalk Setting: empty downtown street at dawn, glass storefronts reflecting light Lighting: cool blue-hour light, soft mist Camera: medium tracking shot, 35mm lens, shallow depth of field Style: cinematic, slight film grain, muted teal-and-amber grade
Glue it into one prompt:
“A woman in her 30s with short dark hair and a focused expression, wearing a navy wool coat, walking briskly down a wet downtown sidewalk at dawn, empty glass storefronts reflecting light, cool blue-hour light with soft mist, medium tracking shot, 35mm lens, shallow depth of field, cinematic, slight film grain, muted teal-and-amber grade.”
Compare that to “woman walking in the city.” Same image idea — completely different result.
The cheat sheet
| Layer | Example words |
|---|---|
| Subject | Age, build, hair, expression, clothing, defining feature |
| Action | One clear verb + posture |
| Setting | Specific place + sensory detail |
| Lighting | Golden hour, soft side-light, rim light, neon glow |
| Camera | Wide / medium / close, 35mm/50mm/85mm, depth of field |
| Style | Cinematic, editorial, painterly, film grain, color grade |
How to iterate well
ElevenLabs
- Studio-grade AI voices in 30+ languages
- Clone your own voice in minutes
- Perfect for faceless videos & audiobooks
- Generate with the full 6-layer prompt.
- Identify the one thing that’s off (lighting flat? framing wrong? motion mood?).
- Change only that layer and regenerate.
- Repeat until it sings.
Changing one variable at a time is how you learn what Midjourney responds to. Change five and you learn nothing.
Common problems and fixes
- Generic, flat result → add specific lighting + camera layers. They lift average prompts the most.
- Faces/hands distorted → pull back to a medium or wide shot; tight close-ups stress detail.
- Ignored instructions → trim the prompt; lead with the most important layers (Subject + Lighting + Camera).
- Inconsistent images in a series → reuse the same Lighting, Camera, and Style words across every prompt.
When to break the formula
You can omit layers — for example, a tight studio portrait might not need a setting beyond “clean gradient background.” The formula isn’t a checkbox list; it’s a menu of the levers that matter. Pull the ones that lift this specific image.
The bottom line
A great Midjourney prompt isn’t a sentence — it’s six stacked layers of art direction: Subject, Action, Setting, Lighting, Camera, Style. Be specific in each, iterate one variable at a time, and reuse style words across a set. Internalize the formula and you stop hoping for good outputs — you direct them.
👉 Next: keep characters on-model with How to Get Consistent Characters in Midjourney, and grab ready prompts in 50 Copy-Paste Prompts for Stunning Social Media Graphics.