
I heard a lot about the new image generation models last week. So, I tested to see what’s improved. I gave the prompt below to various image generation models – old and new.
A Calvin and Hobbes strip. Calvin is boxing Hobbes, with a dialog bubble from Calvin, saying “Bring it on!”
Stable Diffusion XL Lightning

Stable Diffusion XL Base

Dall-E API

Runway ML

ImageGen 3

Dall-E 3 API

Ideogram 2.0

Flux.dev via Fal.ai

ChatGPT Plus

A few observations:
- Text generation has come a long way. The newer models have little problem generating clear text.
- Flux.1 seems to be the better of the newly released models
- But OpenAI’s ChatGPT seems to create as good an output as Flux.1
On the last point, it’s noteworthy that Dall-E-3 (the engine behind ChatGPT) gives a poor result. Clearly, prompting makes a difference. Here’s how ChatGPT modified my prompt to Dall-E-3.
A comic strip style image featuring Calvin, a young boy with spiky hair, standing in a playful boxing stance with oversized boxing gloves. He looks determined as he says ‘Bring it on!’ in a speech bubble. Facing him is Hobbes, a tall and slightly bemused tiger, also in a mock boxing pose with a gentle smile, as if humoring Calvin. The scene is set in Calvin’s backyard, typical of a Calvin and Hobbes comic, with a simple and uncluttered backdrop.
But just as clearly, prompting is far from the primary driver. Here’s the result of the above prompt on the Dall-E 3 API. The model ChatGPT is using behind the scenes seems to be a significant improvement over Dall-E 3.

The same detailed prompt does extremely well on ImageGen 3, though.

Update: 6 Oct 2024. Here’s what I get with meta.ai.

Update: 8 Oct 2024. Here’s what I got with Flux 1.1 Pro with the short prompt. (The detailed prompt gave me an error: “NSFW content detected in image. Try running it again, or try a different prompt.”)

Update: 4 Dec 2024. With Amazon Nova Canvas, here’s what the detailed prompt gave me.

Comments