
What is a Diffusion Model?
A diffusion model is a type of generative AI model that creates data (typically images) by learning to reverse a gradual noising process. It starts with pure random noise and iteratively refines it into a coherent output, guided by a text prompt or other conditioning signal.
Why It Matters
Diffusion models power the current generation of AI image generators β Stable Diffusion, DALL-E 3, Midjourney, and Flux. They produce higher-quality, more controllable images than earlier approaches like GANs, and have expanded into video (Sora, Runway), audio, and 3D generation. Understanding diffusion is essential for anyone working with AI-generated visual content.
How It Works
Diffusion models operate in two phases:
Forward process (training):
- Take a clean image from the training set
- Gradually add Gaussian noise over many steps until it becomes pure noise
- Train a neural network (typically a U-Net or transformer) to predict and remove the noise at each step
Reverse process (generation):
- Start with pure random noise
- The trained model predicts and removes a small amount of noise at each step
- After many denoising steps (typically 20β50), a coherent image emerges
Latent diffusion (used by Stable Diffusion) improves efficiency by operating in a compressed latent space rather than pixel space β reducing computation by orders of magnitude.