Your First AI Image: What to Say and How to Say It

You have picked a model. Maybe Nano Banana, maybe Midjourney, maybe something else. You are looking at a text box. You type “a cat.” You get a cat. It is fine. It is not what you wanted.

The gap between “a cat” and “a photograph of a marmalade cat asleep on a windowsill in afternoon light, dust motes visible in the sun, soft focus background of a garden” is not technical skill. It is observation. It is knowing what you actually see in your head and putting it into words.

This guide teaches that skill.

The one rule

Describe what a camera would see, not what you want to feel.

“A peaceful scene” is a feeling. “A woman reading on a park bench under a large oak tree, dappled sunlight, autumn leaves on the ground, golden hour” is what a camera would see. The feeling comes from the details, not from naming the feeling.

Every good prompt follows this principle. You do not tell the model what emotion to create. You describe the scene that would create it.

Start with five things

Every image has five things worth describing. You do not need all five every time, but the more you include, the closer the result.

1. The subject

Who or what is in the image. Be specific.

Weak: “a person”
Better: “a woman in her 60s with silver hair and reading glasses”
Even better: “a woman in her 60s with silver hair pulled back, wire-rimmed reading glasses, wearing a cream linen shirt”

The model cannot see your imagination. The more specific you are, the less it has to guess.

2. The setting

Where are they? What surrounds them?

Weak: “outside”
Better: “sitting on a wooden bench in a park”
Even better: “sitting on a weathered wooden bench in a small London park, surrounded by mature oak trees, a gravel path behind her”

Settings create context. A portrait in a studio feels different from a portrait in a kitchen. Name the place.

3. The light

Light changes everything. It is the single most impactful thing you can describe.

“Golden hour” — warm, low sun, long shadows. The light of the last hour before sunset.
“Overcast” — soft, even, no harsh shadows. Flattering for portraits.
“Window light” — directional and soft. One side bright, the other in gentle shadow.
“Harsh midday sun” — high contrast, strong shadows. Dramatic but unflattering for skin.
“Neon” — coloured, artificial, urban. Pink and blue reflections.
“Candlelight” — warm, dim, intimate. Soft orange glow.

You do not need photography terms. “The kind of light you get right before sunset” works just as well as “golden hour.” Describe it however it comes to you.

4. The mood

Not as an emotion — as a visual quality.

“Quiet and still” → sparse composition, soft colours, negative space
“Busy and alive” → lots of detail, bright colours, movement
“Dark and cinematic” → deep shadows, rich colours, dramatic framing
“Clean and minimal” → white space, simple geometry, few colours

The mood tells the model how much to put in and what palette to lean toward.

5. The style (optional)

How do you want it to look? Like a photograph? A painting? An illustration?

“A photograph” — realistic, detailed
“A film photograph” — slight grain, softer colours, a specific era feel
“An editorial photograph” — sharp, intentional, magazine-quality
“A watercolour painting” — soft edges, visible brushwork, muted tones
“A digital illustration” — clean lines, flat colours, graphic

If you do not specify, most models default to a clean, realistic photographic look. That is often fine.

Three prompts, walked through

Prompt 1: The portrait

A photograph of a man in his 30s with short dark hair and a trimmed beard, standing in a sunlit doorway. He is wearing a dark green knit jumper. The light is coming from behind him, creating a rim light around his shoulders and hair. The background is a blurred interior — warm, lived-in, out of focus. The mood is calm and approachable.

Why it works: Subject (specific person), setting (sunlit doorway), light (backlit rim light), background (blurred interior), mood (calm). Five things.

Prompt 2: The product

A coffee bag standing upright on a weathered wooden table. The bag is matte black with a large white label that reads “ORIGIN” in uppercase letters. Behind it, a ceramic pour-over and a white mug, slightly out of focus. The lighting is soft and directional from the left, like a window. The overall feel is minimal and editorial — like a spread in a design magazine.

Why it works: Subject (coffee bag with specific label), setting (wooden table with props), light (soft from the left), style (editorial, magazine). The props give the model context for the scene.

Prompt 3: The landscape

A wide view of a Scottish highland valley in early autumn. Muted greens and amber heather. A narrow river winding through the centre. Low clouds sitting just above the hilltops, not raining but heavy. The light is flat and overcast — no shadows, no drama. The mood is meditative and remote. No people, no buildings, no roads visible.

Why it works: Setting (specific place and season), colour (muted greens and amber), light (overcast, flat), mood (meditative), and crucially — what is not in the image (no people, no buildings). Telling the model what to exclude is as important as telling it what to include.

The iteration loop

Your first result will be close but not right. That is normal. Here is how to refine:

If the composition is wrong: “Move the subject to the left third of the frame” or “Zoom in closer — just head and shoulders.”

If the colours are off: “Warmer tones — more amber, less grey” or “Desaturate everything slightly, like a faded film photograph.”

If there is too much going on: “Simplify the background — I want more negative space” or “Remove the extra objects on the table.”

If the light is wrong: “Make the light softer — less contrasty, more overcast” or “Add a stronger light from the right side.”

Each refinement is a sentence. You are having a conversation with the model, just like vibe coding with Claude Code. Describe what you see that is wrong, and what you want instead.

Common mistakes

Listing keywords instead of describing. “Cat, cute, fluffy, aesthetic, 4K, trending on ArtStation” is keyword soup. The model does better with a sentence: “A fluffy ginger kitten curled up on a velvet cushion, soft focus, warm afternoon light.” Sentences give structure. Keywords do not.

Being too vague about the subject. “A beautiful woman” gives the model nothing specific. What does she look like? What is she wearing? What is she doing? Where is she? Specificity is kindness.

Forgetting the light. The same subject in the same setting looks completely different at golden hour versus harsh midday. Always describe the light. It is the highest-impact thing you can add.

Over-specifying technical details. You do not need “shot on Canon EOS R5 with 85mm f/1.4 at ISO 400.” Those details can help with specific models, but for beginners they add complexity without proportional benefit. Describe the look you want instead: “shallow depth of field with the background blurred” or “everything in sharp focus from front to back.”

Where to go from here

Once you can describe what you see, you are ready for more:

AI Image Models in 2026 — which model is best for which job
The AI Creative’s Toolbox — platforms that give you access to multiple models, including Flora Fauna where you can try 80+ models on one canvas
Nano Banana Prompt Pack — 25 tested prompts with breakdowns, if you want to see exactly how the structure works at a higher level (member content)
The Photographer’s Prompt Guide — when you are ready to go deep on perspective, lens choices, and film aesthetics (member content)

AI Image Models in 2026 — the models available right now
The AI Creative’s Toolbox — where to access them
What Is Vibe Coding — the same principle applies: describe what you want, the AI handles the rest
Prompting Claude Code: 10 Before-and-Afters — the same craft applied to building things

Art & Algorithms publishes guides, tutorials, and prompt packs at the intersection of art and code. Subscribe for the full archive.