Amazing Images with AI Image Generation
AI image generation is near-perfect, very fast, and low-cost or even free. Enjoy!
AI Image Generation is Getting Amazing
We should take a moment to appreciate how far AI has come. AI image generation has rapidly improved in recent years, revolutionizing the way we create and manipulate visual content. In April 2023, Midjourney broke new ground with Midjourney V5, creating photo-realistic image outputs beyond what was possible before. I wrote an article on prompting Midjourney at the time.
Despite this progress, there were limitations. AI was ‘finicky’ and depended on prompt specifics; artifacts, such as six fingered hands, persisted; diffusion models were slow; text couldn’t be replicated accurately. Midjourney V5 at the time was well ahead of alternatives. None were perfect.
Today, with many leading AI image generation models, all much better than before, the landscape of AI image generation is very different. They are extremely capable and useful to professionals and consumers alike, and augmented by AI-enhanced features in traditional graphics editors. User choices are ample.
The AI Image Generation Leaders
Midjourney has advanced to Midjourney V6.1, and at least five other top AI image generation models are competitive: OpenAI’s Dall-E 3, StabilityAI’s Stable Diffusion 3, Ideogram 2, Google’s Imagen 3, Black Forest Labs’ FLUX.1. We will cover each of these further below.
DALL-E 3
DALL-E 3, released by OpenAI in late 2023, offers enhanced image quality and more accurate interpretations of text prompts over its predecessor, DALL-E 2. DALL-E 3 is particularly noteworthy for its ability to handle nuanced and specific requests, accurately depicting complex scenes, specific art styles, or even abstract concepts.
Access: DALL-E 3 is available from OpenAI on ChatGPT and via their API. It’s also available in Microsoft Copilot, where you can generate high-quality images for free, and then use Microsoft Designer interface with its AI-enhanced editing capabilities to further adjust or fix images.
Midjourney V6.1
Midjourney’s latest version is 6.1, released last month. One strength of Midjourney has been its ability to create stunningly artistic and imaginative images, making it favored by artists and designers for its strong aesthetic qualities.
Their latest versions have made images more coherent and added greater control, in personalization (personal user preferences), style reference,
For even more control in generation, Midjourney recently added an image editor so users can clean up, embellish and adjust generated images to their liking.
Midjourney operated solely through Discord for a long time, and you can still access it there, but they finally made the Midjourney website fully available. You’ll need a paid account to access Midjourney and its features.
Stable Diffusion 3
Stable Diffusion 3 is the latest iteration of the popular open Stable Diffusion AI image generation model, building upon its predecessors with significant improvements in generating text and adhering to prompts.
SD 3 comes in 3 models: SD3 Medium, a 2B parameter model; 8B parameter SD3 Large; and 8B SD3 Large Turbo, with a faster inference time.
Stable Diffusion has long stood out as an open alternative, allowing LORA fine-tunes to customize their models. However, Stability AI failed to monetize their AI models, so Stable Diffusion 3 is not fully open like prior versions. Rather, Stable Diffusion 3 Medium is available for download for non-commercial use only.
They offer paid access to Stable Assistant chatbot, that bundles Stable Diffusion 3 with image tools.
Black Forest Labs’ FLUX.1
FLUX.1 released in August 2024, is the newest AI image generation model, but the team at Black Forest Labs include veterans of the Stable Diffusion team, so they aren’t new to cutting-edge models. They bring some qualities of Stable Diffusion to FLUX.1, which excels in prompt following, visual quality and detail, efficiency, and text rendering.
FLUX.1 comes in three variants: FLUX.1 pro is their highest-quality closed version; FLUX.1 dev is open weights, good quality and available to try on Fal.ai or via download for non-commercial use; FLUX.1 schnell is their fastest variant and released under Apache 2.0 license, so its fully open for download.
While available via API and through partners like Replicate, fal.ai, and Mystic, Grok 2 on X gives access FLUX.1 this image generation; available for X premium users which costs $8 a month. One benefit is that X Meme Lord Elon isn’t as tight on censorship, so you can make your own satire images.
As noted in our recent weekly report, you can train LORA fine-tunes of the open weights Flux.1 models - dev or schnell - for personalized or niche AI image generation.
Ideogram 2
Recently released Ideogram 2 represents a significant advance, improving on Ideogram’s already excellent text rendering while also improving their AI model in realism, prompt adherence and control. They’ve added color palette control, more realistic style capabilities, and “Design style” that boosts accuracy of text in generated images.
Ideogram 2 with Design style is particularly useful for creating marketing materials, social media content, and brand assets. This capability in typography and design elements makes it a valuable tool for graphic designers and marketers.
I’m a fan of Ideogram’s magic prompt and remixing capabilities, and I’m also a fan of their pricing. The free tier is enough for occasional users. Their basic tier costs only $7/ mo and has editing, upscaling and up to 1600 images a month.
Google Imagen 3
Imagen 3, the latest version of Google's text-to-image AI model, leverages Google's depth in AI research and vast data resources to produce highly detailed and contextually accurate image generation. Imagen 3 shines in generating photorealistic and coherent images with exceptional detail, and can also adhere to prompts in a variety of styles.
However, Google hampers it with cautious guardrails. I got “We couldn’t create what you asked for” when requesting a Saint George oil painting. Blocking Taylor Swift dancing or Pope playing basketball I can understand, but why block Saint George and his mythical battle? So too was the “The Girl with the Pearl Earring” prompt that made our cover image.
The dark side to how good AI image generation has gotten is that in the AI era, you literally can no longer believe your eyes. Mis-information, AI influencers, and deep-fakes are ever-more pervasive; all images online are suspect. Imagen 3 is good enough to create deep-fakes, so Google locks it down, hard.
Imagen 3 is still excellent for artistic and photorealistic renditions of nature and (generic) people. Imagen 3 is available in Gemini and Google’s AI test kitchen.
Conclusion - It’s All Good
AI image generation is rapidly improving, with new models and tools showing the rapid pace of innovation.
Each AI image generation model offers unique strengths: DALL-E 3 adherence to complex prompts, Midjourney’s high-quality and artistry, Flux's efficiency and versatility, Ideogram 2's text and logo capabilities, and Google Imagen 3's photorealistic output. Yet all of them are excellent in common ways:
Text-to-image models are high quality, with photorealism at your fingertips.
Prompting is more forgiving and easier, with re-prompting.
Rendering of text has gone from terrible to terrific, in particular with Ideogram 2.0 and FLUX.1.
Image generation is more controllable, with style-guide, color palettes, and personalization methods.
Image-to-image and image remixing are added creative options to get desired outputs.
LORA based fine-tunes are available for open AI image generation models, enabling image models for particular topics, styles or genres.
AI image generation is now high-quality, cheap, accessible, and fast. I cannot recommend any specific image generation model because they are all good.
Professional graphics artists and heavy users may need a paid account, but occasional users can get by with free and available AI image generators on Copilot, Gemini, Ideogram, Grok, and ChatGPT. Keep a variety on tap depending on your needs. Enjoy!
Great overview/update, thank you. I have been especially frustrating by the weird non-words that AI (especially Dall-E) puts into images. I'll have to check out a couple of these newer ones to see how they handle it.