Text diffusion models have now reached the quality of GPT-2 and even won the ICML 2024 Best Paper award. These models could be strong competitors to current LLMs like ChatGPT, offering unique features like accepting prompts from any position and generating multiple tokens at once.
Though the concept has potential, the challenge lies in the heavy investments already made in GPTs and autoregressive models. Switching to diffusion models might be tough for tech companies because of the cost and time needed for them to catch up.
Thanks for sharing! I doubt the diffusion models will scale way beyond auto-regressive models, but I’m hopeful for improved quality/computer tradeoffs.
Interesting point! Diffusion models do hold promise, but, indeed, getting the industry to move away from established autoregressive models won’t be easy.
However, if diffusion LLMs prove more efficient, there’s a chance we could see some exciting new developments in the future.