Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution

Hey everyone,

Text diffusion models have now reached the quality of GPT-2 and even won the ICML 2024 Best Paper award. These models could be strong competitors to current LLMs like ChatGPT, offering unique features like accepting prompts from any position and generating multiple tokens at once.

Though the concept has potential, the challenge lies in the heavy investments already made in GPTs and autoregressive models. Switching to diffusion models might be tough for tech companies because of the cost and time needed for them to catch up.

For more details, I made an explainer video: https://youtu.be/K_9wQ6LZNpI

Thanks for sharing! I doubt the diffusion models will scale way beyond auto-regressive models, but I’m hopeful for improved quality/computer tradeoffs.

This has nothing to do with autoregressive models. Recent diffusion models also have a transformer backbone.

Diffusion models can be guided more effectively and predictably than autoregressive models, which is their main strength.

They use techniques like classifier guidance and classifier-free guidance for control.

Interesting point! Diffusion models do hold promise, but, indeed, getting the industry to move away from established autoregressive models won’t be easy.

However, if diffusion LLMs prove more efficient, there’s a chance we could see some exciting new developments in the future.