Here’s how OpenAI’s DALL-E 3 could leapfrog the competition


All generative AI models for images currently use diffusion models. OpenAI presents an alternative that is significantly faster and could power new models like DALL-E 3.

DALL-E 2, Stable Diffusion, or Midjourney use diffusion models that gradually synthesize an image from noise during image generation. The same iterative process is used in audio or video models.

While diffusion models produce great results and can be controlled via text prompts, they are comparatively slow and require anywhere from 10 to 2,000 times more computing power than GANs. This hinders their use in real-time applications.

OpenAI is therefore developing a new variant of generative AI models called Consistency Models.


Consistency models are designed to combine the advantages of diffusion models and GANs

According to OpenAI, consistency models support fast, one-step image synthesis, but also allow for few-step sampling, “to trade compute for sample quality”. Consistency models can therefore produce useful results even without an iterative process, and could soon be suitable for real-time applications.

OpenAI’s consistency models learn from an iterative process to skip it later if needed. | Picture: OpenAI

Like diffusion models, they also support zero-shot data editing, eg for inpainting, colorization, or super-resolution tasks. Consistency models can either distilled from a pre-trained diffusion model or be trained in isolation. According to OpenAI, consistency models are able to outperform many GANs in monkey step generation and all other non-adversarial, single-step generative models.

The consistency model (bottom row) can directly generate an image, the diffusion model only shows noise for now. | Picture: OpenAI

The company conducted all testing on relatively small networks and image datasets, for example, training a neural network to synthesize cat images. All models were released by the company as open source for research purposes.

A new generative AI architecture for DALL-E 3 and video synthesis?

According to the authors, there are also striking similarities to other AI techniques used in other fields, such as deep Q-learning from reinforcement learning or momentum-based contrastive learning from semi-supervised learning. “This offers exciting prospects for cross-pollination of ideas and methods among these diverse fields.,” the team says.

In the months leading up to the release of DALL-E 2, OpenAI had published several articles on diffusion models and finally presented GLIDE, a very impressive model at the time. So the research on consistency models could be an indication that OpenAI is looking for new and more effective generative AI architectures that could, for example, enable a much faster DALL-E 3 and be used for real-time video generation.


Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top