DALL-E 2 vs Stability AI’s Stable Diffusion XL – Comparison of Image Generation Capabilities

Let’s talk about AI-powered image generation, namely Stable Diffusion XL and DALL-E 2.

StabilityAI’s cutting-edge models, like Stable Diffusion XL, offer accuracy and realism in image generation. On the other hand, DALLยทE 2 by OpenAI stands out for its ability to create original, lifelike images, though with a touch more comic-like features.

This article compares the capabilities of DALL-E and Stable Image with regards to image creation in various use cases and gives tips on prompting for image creation.

Use Cases:

  1. Photorealistic Portraits
  2. Complex Landscapes
  3. Imaginative Illustrations
  4. Object Manipulation
  5. Producing Stylized Artwork

Use Case 1: Photorealistic Portraits

When comparing Stability AI and Dall-E 2 for photorealistic portraits, I found that Stability AI excelled in capturing subtle details and producing realistic images. DALL-E 2, on the other hand, provided more imaginative and creative results.

Prompt 1 for Use Case 1 (Photorealistic Portraits):

Create a photorealistic portrait of a middle-aged woman with medium length brown hair, wearing glasses, and a floral patterned blouse. Ensure the background is blurred but recognizable as a cozy living room setting with a bookshelf and a plant. Pay attention to details such as facial features, wrinkles, and lighting to enhance realism. The final image should evoke a sense of warmth and familiarity.

Result Dall-E 2

Style undefined

Result Stable Diffusion XL 1.0

Style undefined

Prompt 2 for Use Case 1 (Photorealistic Portraits:

Create a high-definition photograph of a realistic human face with intricate details such as skin texture, facial features, and expression.
Ensure the image exhibits accurate color reproduction, lifelike shading techniques, and a seamless blend of light and shadow for a photorealistic look. The final output should closely resemble a professional portrait, showcasing capabilities in creating realistic imagery.

Result Dall-E 2

Style: Natural

Result Stable Diffusion XL 1.0

Style: Photographic

Result Comparison

At the very least, I sure am very impressed with the capabilities of both AI Systems. Overall, I find Stable Diffusions Results more photorealistic, while DALL-E 2 generally creates more creative, slightly more comic style images.

Given a long prompt, specifying the photorealistic requirements, DALL-E 2 was quicker to misinterpret the prompt. One result I had gave a multicolored face that had no resemblance whatsoever to a realistic photograph. Also, by asking for a face “with intricate details such as skin texture, facial features and expression” DALL-E 2 did not return the desired life-like photo result.

So for creating photorealistic type results, rather use Stable Diffusion.

Use Case 2: Complex Landscapes

When it comes to generating complex landscapes, I found that StabilityAI’s results are more suitable for realistic settings, while DALL-E’s output is ideal for imaginative compositions. Consider your project’s needs when choosing between StabilityAI vs DALL-E 2.

Prompt for Use Case 2:

Generate a complex landscape image incorporating various elements such as mountains, rivers, and forests. The image should be detailed and realistic, showcasing your ability to create intricate scenes. Please ensure that the composition is well-balanced and visually engaging, demonstrating your capabilities in generating complex visual content.

I expect the image to display a high level of creativity and artistic flair, capturing the essence of a stunning natural landscape. Pay attention to the color palette, lighting, and textures to create a compelling and visually appealing result.

Result Dall-E 2

The image shows a surreal representation of a landscape with vibrant colors and abstract shapes.

Result Stable Diffusion XL 1.0

The image displays a detailed and realistic depiction of a forest scene with impressive clarity.

Result Comparison

It is evident that both AI systems generated visually appealing images, but with distinct differences.

DALL-E 2 produced a more abstract and artistic interpretation of the landscape, focusing on vibrant colors and abstract shapes.

Stability AI Stable Diffusion created a highly detailed and realistic representation of a forest scene with an emphasis on clarity and precision.

The better AI system for the use case would depend on the specific requirements. If the goal is to generate artistic and abstract images, DALL-E 2 would be more suitable. However, if the aim is to create realistic and detailed visuals, Stability AI Stable Image would be the preferred choice.

Use Case 3: Imaginative Illustrations

As you can probably expect by now, in the case of imaginative illustrations, I found that DALL-E 2 excelled in creating surreal and dreamlike images, while StabilityAI focused more on detailed artworks.

Prompt for Use Case 3: Imaginative Illustrations

Generate an awe-inspiring illustration that captures the essence of a whimsical dream. The scene should depict a surreal world filled with vibrant colors, where a majestic unicorn stands in the center, its flowing mane merging with the clouds above. Let the landscape be adorned with lush greenery, blooming flowers, and shimmering waterfalls, all intertwined in an ethereal tapestry of enchantment.

Result Dall-E 2

Result Stable Diffusion XL 1.0

Result Comparison

If the goal is to create abstract and artistic interpretations, OpenAI DALL-E 2 might be the preferred choice. It produced a wonderful illustration following the prompt relatively accurately. The prompt asked for the unicorns mane to merge with the clouds above, which was not applied by Dall-E 2.

If the aim is to generate more realistic transformations or representations, Stability AI can be considered more suitable. Though the unicorn created by Stable Diffusion XL has three hind legs, which is one of the more common AI generated image mistakes.

Talking about displaying accuracy based on the prompt – generally you will get better results from shorter, less precise prompts. The more detailed and specific your prompt is, the harder it is for the AI system to translate it into an cohesive image that still makes sense. You will also find that especially if you have an idea of the desired result in your head, it will most likely take a few trials until you generate an image that is suitable.

Use Case 4: Object Manipulation

When examining StabilityAI and DALL-E 2 for object manipulation, I found that DALL-E 2 had a more diverse set of objects it could manipulate with intricate details.

On the other hand, StabilityAI provided more stable outputs with consistent quality.

Prompt for Use Case 4: Object Manipulation

Generate two images for comparison.
The first image should depict a car with unusual modifications, such as oversized wheels or unconventional design features.
The second image should showcase the same car in a more realistic setting, such as a city street or highway.

Please pay attention to the level of detail and realism in both images, and highlight any unique characteristics that set them apart. Finally, assess the accuracy and visual appeal of each image, and provide an analysis of how well the object manipulation was executed by the AI.

Result Dall-E 2

Result Stable Diffusion XL 1.0

Result Comparison

DALL-E 2 not only produced a result with more vibrant colors and sharper details in the image, but also followed the prompt better.

On the other hand, Stability AI has generated a visually impressive image, but it appears to have more subtle colors, a softer overall look and created the realistic setting required in the last part of the prompt poorly.

In conclusion, in this case DALL-E 2’s result may be preferred for its high level of detail and vivid colors.

Use Case 5: Producing Stylized Artwork

As a graphic designer, I used to often create artworks for the movie industry. Any artwork shown in an advertisement, movie or similar needs the artists clearance for such use, so usually it is easier and especially cheaper for the company to employ a designer to create an artwork that looks like, say, a Picasso, and put this up in the million dollar mansion they are designing as a set.

Since the rise of AI Image creation tools, such inquiries have dropped to zero. With tools such as DALL-E 2 or Stable Diffusion, the art director can pretty much ask an assistant to generate the desired image and upscale it to print with sufficient resolution. So let’s see, which tool I would recommend to use.

Prompt for Use Case 5: Producing Stylized Artwork

Generate a watercolor painting in the style of Pablo Picasso depicting a vibrant and whimsical cityscape with geometric buildings and expressive brushstrokes. Ensure that the painting captures the essence of Picasso’s cubist style through the use of fragmented forms, vibrant colors, and bold lines.

Result Dall-E 2

Result Stable Diffusion XL 1.0

Result Comparison

FeatureDALL-E 2Stable Diffusion
Overall qualityLower qualityHigher quality
CompositionMore chaoticMore balanced
ColorLess vibrantMore vibrant
BrushstrokesLess expressiveMore expressive
DetailLess detailMore detail
Accuracy to styleLess accurateMore accurate

Overall, Stable Diffusion generated a painting that was more accurate to the style of Pablo Picasso and of higher quality than the painting generated by DALL-E 2.

Stable Diffusion’s painting had a more balanced composition, more vibrant colors, more expressive brushstrokes, and more detail. It also captured the essence of Picasso’s cubist style through the use of fragmented forms, vibrant colors, and bold lines.

Conclusion: Stable Diffusion is the better AI system for this use case.

Overall Evaluation of both platforms

The choice between the two AI systems ultimately depends on the specific requirements and preferences of image generation tasks. Here is a list of strength I have found these Image Generations tools have:

DALL-E 2 strengths:

  • Visually appealing and detailed images
  • Capable in generating realistic faces and expressions
  • Especially good at creating “comic style images”
  • Creative results

Stable Diffusion XL 1.0 strengths:

  • Good quality output, capturing textures and intricacies
  • Produces images approaching realistic photography
  • Capable in generating realistic faces and expressions
  • Better at following prompts accurately

Lastly, you might want to consider the dimensions you want your results to have. Here are the dimensions the two AI Systems can generate:

Dimensions

DALL-E 2Stable Diffusion
Square: 1024×1024 pixels
Portrait: 1024×1333 pixels
Landscape: 1333×1024 pixels
1024 x 1024
1152 x 896
896 x 1152
1216 x 832
832 x 1216
1344 x 768
768 x 1344
1536 x 640
640 x 1536

Image resolution: Keep in mind that you can always use Stability AI’s Clipdrop Image Upscale to up- or downscale your image if you need it with better resolution.

AI Image Creation with Promptmate.io

In Promptmate it is easy to switch between different AI Systems, simply using a drop down menu and choosing from all integrated AI Systems, which you can find here.

You can also build prompt chains, using for example Claude or ChatGPT to help you create and refine a prompt and then trying out various Image Creation tools for your best results. Furthermore you can immediately add a more steps for example to upscale or remove background of your result as needed. Try it out!

Summary

This article provides insights into the differences, strengths and weaknesses of the AI Image Generation tools DALL-E 2 and Stable Diffusion XL and identifies how the results differ in quality, style and other aspects.