ai-image-mastery

The Ultimate Midjourney Prompt Formula (With Examples)

May 29, 2026

Affiliate disclosure: some links below are affiliate links. If you sign up through them, captainsmeta may earn a small commission at no extra cost to you.

The Ultimate Midjourney Prompt Formula (With Examples)

Most Midjourney prompts read like Google searches: “cat in space.” Then people wonder why the output looks like everyone else’s. The pros write prompts that read like art direction — and the difference in output is enormous.

There’s no magic. There’s a structure. Once you internalize it, you stop “trying random words” and start directing an image. Here’s the six-layer formula and exactly how to use it.

The formula

For more consistent results here, ElevenLabs is worth trying.

Editor's Top Choice

No Image

ElevenLabs

$ 6.00

Studio-grade AI voices in 30+ languages
Clone your own voice in minutes
Perfect for faceless videos & audiobooks

Link verified 21h ago

Check Deal & Offers

*FTC Disclosure: We earn commissions when you purchase through our links. Read details.

[Subject] + [Action / Pose] + [Setting] + [Lighting] + [Camera / Lens] + [Style + Finish]

That’s it. Stack the layers in roughly that order, be specific in each, and your output quality jumps a full level — without changing tools or paying more.

Layer 1: Subject (who/what)

Be specific. Not “a woman” → “a woman in her 30s, short dark hair, focused expression, wearing a navy wool coat.” Detail in the subject = a defined image instead of a generic one. (For ongoing characters, lock the subject across images with the workflow in How to Get Consistent Characters in Midjourney.)

Layer 2: Action / pose

Give the subject one clear action. “Reaching for a coffee cup,” “leaning against a doorway, looking out a window,” “running through wet streets at night.” One action beats vague atmosphere — Midjourney renders single intentions far better than chaotic scenes.

Layer 3: Setting

Where this happens, with sensory detail. “A quiet café with brass fixtures and afternoon sun through tall windows.” “A neon-soaked Tokyo alley with rain on the pavement.” Setting words do enormous lifting.

Layer 4: Lighting (the secret sauce)

This is where “cinematic” actually lives. Real lighting language: golden hour, blue hour, soft diffused window light, hard rim light, neon glow, volumetric god rays, overcast. Just adding a precise lighting phrase often saves a flat image.

Layer 5: Camera / lens

This is what separates a snapshot from cinema. Medium shot, close-up, wide establishing shot, low-angle, over-the-shoulder, 35mm lens, 50mm, 85mm portrait lens, shallow depth of field, anamorphic. Camera words shape framing and feel as much as content.

Layer 6: Style + finish

Land the look: cinematic, editorial, documentary, painterly, retro film, hyperrealistic, 35mm film grain, teal-and-orange color grade, soft contrast. Reuse the same style words across a set so images feel related.

Worked example

Stack all six layers:

Subject: a woman in her 30s, short dark hair, focused expression, wearing a navy wool coat Action: walking briskly down a wet sidewalk Setting: empty downtown street at dawn, glass storefronts reflecting light Lighting: cool blue-hour light, soft mist Camera: medium tracking shot, 35mm lens, shallow depth of field Style: cinematic, slight film grain, muted teal-and-amber grade

Glue it into one prompt:

“A woman in her 30s with short dark hair and a focused expression, wearing a navy wool coat, walking briskly down a wet downtown sidewalk at dawn, empty glass storefronts reflecting light, cool blue-hour light with soft mist, medium tracking shot, 35mm lens, shallow depth of field, cinematic, slight film grain, muted teal-and-amber grade.”

Compare that to “woman walking in the city.” Same image idea — completely different result.

The cheat sheet

Layer	Example words
Subject	Age, build, hair, expression, clothing, defining feature
Action	One clear verb + posture
Setting	Specific place + sensory detail
Lighting	Golden hour, soft side-light, rim light, neon glow
Camera	Wide / medium / close, 35mm/50mm/85mm, depth of field
Style	Cinematic, editorial, painterly, film grain, color grade

How to iterate well

For more consistent results here, ElevenLabs is worth trying.

Editor's Top Choice

No Image

ElevenLabs

$ 6.00

Studio-grade AI voices in 30+ languages
Clone your own voice in minutes
Perfect for faceless videos & audiobooks

Link verified 21h ago

Check Deal & Offers

*FTC Disclosure: We earn commissions when you purchase through our links. Read details.

Generate with the full 6-layer prompt.
Identify the one thing that’s off (lighting flat? framing wrong? motion mood?).
Change only that layer and regenerate.
Repeat until it sings.

Changing one variable at a time is how you learn what Midjourney responds to. Change five and you learn nothing.

Common problems and fixes

Generic, flat result → add specific lighting + camera layers. They lift average prompts the most.
Faces/hands distorted → pull back to a medium or wide shot; tight close-ups stress detail.
Ignored instructions → trim the prompt; lead with the most important layers (Subject + Lighting + Camera).
Inconsistent images in a series → reuse the same Lighting, Camera, and Style words across every prompt.

When to break the formula

You can omit layers — for example, a tight studio portrait might not need a setting beyond “clean gradient background.” The formula isn’t a checkbox list; it’s a menu of the levers that matter. Pull the ones that lift this specific image.

The bottom line

A great Midjourney prompt isn’t a sentence — it’s six stacked layers of art direction: Subject, Action, Setting, Lighting, Camera, Style. Be specific in each, iterate one variable at a time, and reuse style words across a set. Internalize the formula and you stop hoping for good outputs — you direct them.

👉 Next: keep characters on-model with How to Get Consistent Characters in Midjourney, and grab ready prompts in 50 Copy-Paste Prompts for Stunning Social Media Graphics.

Frequently asked questions

Does this formula work in other generators too?

Yes, the structure works in any generator such as Flux, Ideogram, or DALL-E. Each tool weighs words slightly differently, so test and adjust.

How long should a prompt be?

Long enough to be specific, short enough to stay focused. Trim filler words, and if a layer does not change the result when removed, it was not doing anything.

Do I need to know real photography terms?

Just a handful — shot types, depth of field, and common lighting and lens phrases. It is the highest-ROI vocabulary you can pick up for AI image work.

What about negative prompts?

Use them sparingly to remove a specific recurring problem like text or watermarks. Over-using negatives can confuse the model, so always lead with the positive prompt.