Six models. Each genuinely good at something different. The landscape shifts monthly, but in April 2026, these are the ones that matter and the jobs they are best at.

The quick version

If you want the answer before the explanation:

StrengthModelWhy
Highest arena scoreGPT Image 1.5Leads Artificial Analysis (Elo 1,265). Best instruction-following.
Best value + 4K resolutionNano Banana 2Near-parity quality at half the cost. Native 4K. 14 reference images.
Cinematic aestheticMidjourney V7Dramatic compositions no other model matches. Personalisation profiles.
Photorealism + anatomyFlux 2Skin texture, fabric detail, anatomical precision. Strong with camera prompts.
Text in imagesIdeogram 3.0Built specifically for legible, styled text in images.
Full control, self-hostedStable Diffusion 3.5 / Flux 2 KleinOpen weights, no per-image cost, deepest customisation ecosystem.

Now the detail.


Nano Banana (Google)

The name started as a joke. A Google DeepMind product manager submitted their model anonymously to a public arena at 2:30 in the morning and needed a pseudonym. “Nano Banana” stuck when the model topped the leaderboard. Google eventually embraced it.

There are two versions that matter:

Nano Banana 2 (built on Gemini 3.1 Flash Image) is the everyday workhorse. Fast — 4 to 6 seconds at standard resolution — and surprisingly capable. It has a unique feature called Image Search Grounding: during generation, it retrieves real images from Google Search and uses them as visual context. This noticeably improves accuracy for real-world subjects like landmarks and brand logos. It accepts up to 14 reference images (10 object + 4 character) in a single generation. Output goes up to 4K.

Nano Banana Pro (built on Gemini 3 Pro Image) trades speed for polish. Richer textures, more natural lighting, better spatial composition. Slower — 10 to 20 seconds at standard resolution — but the results look like they came from professional design software. Accepts up to 11 reference images (6 object + 5 character). Same 4K ceiling. Pro has text-based Search Grounding (pulling factual information from Google Search) but not the Image Search Grounding that Nano Banana 2 has.

The difference: Pro has an edge in absolute quality — richer textures, better lighting. Nano Banana 2 is 3 to 5 times faster, half the price, has Image Search Grounding, and accepts more reference images. For most creative work, start with Nano Banana 2 and move to Pro when the brief demands the polish.

Pricing: Via Google’s API, roughly $0.067 per image (NB2) or $0.134 per image (Pro). Free access through Google AI Studio with limits.

Where to use it: Google AI Studio (free tier available), Vertex AI API, the Gemini app, or multi-model platforms like Flora Fauna.


Flux (Black Forest Labs)

Flux is the photorealism model. Built by the team behind Stable Diffusion, Black Forest Labs raised $300 million in December 2025 and released Flux 2 a month earlier.

The current lineup:

  • Flux 2 Max — highest quality, includes web-grounded generation
  • Flux 2 Pro — production-grade, the one most professionals use
  • Flux 2 Flex — optimised for text rendering in images
  • Flux 2 Dev — open weights on Hugging Face, non-commercial licence
  • Flux 2 Klein (4B and 9B) — small, fast models for consumer hardware

What Flux does best: Photorealism. Skin texture, fabric detail, anatomical accuracy, product photography. If you need an image that could be mistaken for a photograph, Flux is the first choice.

Where it falls short: No consumer-facing interface. The best models are API-only. Running Flux locally requires serious hardware — the full model needs around 33 GB of VRAM, and even with quantisation you need at least 16 GB. The open-weights models have non-commercial licences (except Klein 4B, which is Apache 2.0).

Pricing: Credit-based through the BFL API. Roughly $0.10 per image for Flux 2 Pro. Klein starts at $0.014. Also available through hosted platforms like Replicate, Fal.ai, and Flora Fauna.


Midjourney

Midjourney V7 has been the default model since June 2025. The platform is no longer Discord-only — there is now a proper web interface at midjourney.com.

What Midjourney does best: Art. Cinematic compositions, dramatic lighting, emotional depth, editorial illustration, game concept art. No other model consistently produces images with this level of aesthetic intentionality. V7 introduced personalisation profiles that learn your visual preferences over time.

Where it falls short: Text rendering in images is unreliable — misspellings and distortions persist. Hands and anatomy, while improved from V6, still produce occasional errors. Character consistency degrades past 3 to 5 images in a series. And there is no free trial — you pay before you see results.

There is also the aesthetic question: Midjourney outputs have a distinctive cinematic polish. Beautiful, but recognisable. If the brief calls for raw, documentary, or deliberately imperfect visuals, this aesthetic can work against you.

Pricing: Basic $10/month, Standard $30/month, Pro $60/month, Mega $120/month. Annual billing saves 20%.

Access: midjourney.com (web) and Discord. No public API — Midjourney is not available on third-party platforms or multi-model tools.


GPT Image 1.5 (OpenAI)

DALL-E is gone. OpenAI deprecated DALL-E 3 on March 4, 2026. GPT Image 1.5, introduced in December 2025, is their current model.

It currently leads the Artificial Analysis Text-to-Image Arena with an Elo rating of 1,265 — the highest of any tested model. It also leads the Image Editing Arena.

What GPT Image does best: Following complex instructions. Multi-step compositions, spatial reasoning, in-image text. It is the most reliable model for “generate exactly what I described.” The multi-modal integration means it can reason about images, not just generate from text.

Where it falls short: The outputs have a characteristic commercial polish — clean, professional, but sometimes visibly artificial. Resolution caps below what Google offers. Multi-face complex scenes can produce degraded faces. Sequential character consistency is improving but not yet reliable enough for comic strips or storyboards.

Pricing: $0.009 per image (low quality) to $0.133 per image (high quality) through the API. Also available in all ChatGPT paid plans.


Ideogram 3.0

Released March 2025 and still current. Ideogram’s entire identity is text in images.

What Ideogram does best: Generating legible, styled, accurate text within images. Posters, social media graphics, advertisements, book covers — anything where words need to appear inside the image and look intentional. Version 3.0 added style references (upload up to 3 reference images), batch generation from CSV files, and inpainting.

Where it falls short: Portrait rendering. Skin textures can appear unnatural, proportions can be inconsistent. For non-text-heavy scenes, other models outperform it on photorealism and artistic quality. Complex fantasy compositions can be unpredictable.

Pricing: Free tier available (roughly 40 images per day). Plus at $15/month, Pro at $48/month.

Access: ideogram.ai (web), API available. Also available on Flora Fauna.


Stable Diffusion

Still relevant — but for a specific reason. Stable Diffusion’s advantage in 2026 is not raw output quality (Flux and GPT Image surpass it on benchmarks). Its advantage is control.

SD 3.5 is the current version. Fully open weights with a community licence — free for any use unless your company earns over $1M annually. SDXL (the previous generation) remains the most widely deployed version due to its ecosystem depth: tens of thousands of fine-tuned checkpoints, LoRAs, and ControlNets on Civitai and Hugging Face.

What Stable Diffusion does best: Customisation. If you need a model fine-tuned to your specific art style, trained on your product catalogue, or configured for a niche aesthetic that no general model handles — Stable Diffusion is the only practical choice. No per-image costs. Full control over the generation pipeline.

Where it falls short: The learning curve is real. ComfyUI’s node-based interface is powerful but intimidating. Local running requires a dedicated GPU. And the gap between SD 3.5’s base output quality and the leading proprietary models is visible without fine-tuning.

Pricing: The model is free. Hosted access through platforms like Replicate or Flora Fauna charges per-image.


What the benchmarks say

The Artificial Analysis Text-to-Image Arena ranks models by blind user preference votes:

RankModelEloApproximate cost per image
1GPT Image 1.51,265$0.13
2Nano Banana 21,258$0.07
3Nano Banana Pro1,215$0.13
4Flux 2 Max1,201$0.07

Midjourney and Ideogram do not participate in this arena, so their absence is not a quality judgment. The arena favours photorealism and prompt adherence. Midjourney’s artistic strengths and Ideogram’s text rendering are not well captured by this methodology.

How to choose

The honest answer: most of these models can do most things. The difference is in the details — and in how you prompt them.

What actually matters for commercial work:

Resolution. Nano Banana (both Pro and 2) outputs at up to 4K natively. Flux 2 reaches approximately 4 megapixels (roughly 2000x2000). GPT Image 1.5 caps at 1536px on the long edge. Midjourney V7 produces approximately 1K native with 2x upscaling available. Ideogram 3.0 generates up to 1536px natively. If you need print-ready or large-format work, resolution matters more than any benchmark score — and right now Nano Banana leads.

Reference images. Nano Banana 2 accepts up to 14 reference images (10 object + 4 character) in a single generation. Nano Banana Pro accepts up to 11 (6 object + 5 character). Midjourney has character references and personalisation profiles that encode your aesthetic preferences. Flux has IP-Adapter workflows via ComfyUI. If your work requires consistency across a series — product shots, character designs, brand assets — reference support is the differentiator, not raw quality.

Prompting for realism. Any of the top models can produce photorealistic output with the right prompt engineering. A widely-used community technique: referencing specific film stocks (Kodak Portra 400, Fuji Pro 400H), camera types (iPhone 15 Pro, Leica M11), and shot descriptions (shallow depth of field, golden hour, handheld). This works because these terms connect to large bodies of training data with those aesthetic characteristics. Flux and Midjourney are the most community-tested for camera-optical references. No model maker officially documents this as a feature — it is practitioner knowledge, not a guarantee.

Consistency at scale. Generating one beautiful image is easy. Generating fifty that look like they belong to the same campaign is hard. Midjourney’s personalisation profiles help here. Nano Banana’s reference image support helps. Stable Diffusion’s LoRA fine-tuning is the most controllable option. There is no single winner — it depends on your pipeline.

Text in images. If you need legible, styled text inside the image — a poster, a social graphic, an advertisement — Ideogram 3.0 is still the most reliable. Nano Banana and GPT Image have improved, but Ideogram was built for this.

The practical approach: Most working creatives do not pick one model. They use two or three depending on the brief. Midjourney for mood and concept, Nano Banana for production-quality 4K output, Ideogram when text is involved. If you find yourself switching between platforms constantly, a multi-model workspace like Flora lets you access Nano Banana, Flux, GPT Image, Ideogram, Stable Diffusion, and dozens more from one canvas. Try Flora — 25% off for 12 months →

Copyright and commercial use is its own topic — licensing terms differ across models and change without notice. We will cover this in a dedicated guide.


Art & Algorithms publishes guides, tutorials, and prompt packs at the intersection of art and code. Subscribe for the full archive.