The popularity of ChatGPT and Midjourney has made the technology diffusion model behind them the foundation of the “generative AI” revolution.
Even, it is highly sought after by researchers in the industry, and its popularity far exceeds that of GAN, which once attacked the world.
Just when diffusion models were at their most powerful, some netizens suddenly announced in a high profile:
The era of Diffusion models is over! Consistency models are crowned king!
what on earth is it? ? ?
It turns out that OpenAI released a blockbuster and valuable paper "Consistency Models" in March, and released the model weights on GitHub today.
##Paper address: https://arxiv.org/abs/2303.01469
Project address: https://github.com/openai/consistency_models
"Consistency Model" in training speed It subverts the diffusion model and can "generate in one step", completing simple tasks an order of magnitude faster than the diffusion model, and using 10-2000 times less calculations.
So, how fast is this?Some netizens said that it is equivalent to generating 64 images with a resolution of 256x256 in about 3.5 seconds, which is
18 images per second!
Moreover, one of the main advantages of the latest model is that it can achieve high-quality samples without the need for "adversarial training" .
##This research
was conducted by Ilya Sutskever, one of the Hinton students of Turing’s Big Three and the main promoter of AlexNet Written by , as well as Chinese scholars Mark Chen and Prafulla Dhariwal who developed DALL-E 2, you can imagine how hard-core the research content is.Some netizens even said that the “consistency model” is the future research direction. I believe we will definitely laugh at the diffusion model in the future.
So, the diffusion model also disappears?
Faster, stronger, no need for confrontation
Currently, this paper is still an unfinalized version, and research is still ongoing.
In 2021, OpenAI CEO Sam Altman wrote a blog discussing how Moore’s Law should be applied to all fields.
## Altman publicly talked about artificial intelligence on Twitter some time ago and said that artificial intelligence is achieving "leapfrog". He said, "A new version of Moore's Law may soon appear, with the number of intelligences in the universe doubling every 18 months."
To others, Altman’s optimism may seem unfounded.
But the latest research conducted by the team led by OpenAI’s chief scientist Ilya Sutskever provides strong support for Altman’s claim.
It is said that 2022 is the first year of AIGC, because many models are based on the diffusion model.
The popularity of the diffusion model gradually replaced GAN and became the most effective image generation model in the current industry. For example, DALL.E 2 and Google Imagen are both diffusion models.
However, the newly proposed "consistency model" has been proven to be able to output the same quality content as the diffusion model in a shorter time.
This is because this "consistency model" uses a single-step generation process similar to GAN.
In contrast, the diffusion model uses a repeated sampling process to gradually eliminate noise in the image.
This method, although impressive, relies on performing hundreds to thousands of steps to achieve good results, which is not only expensive to operate, but also slow.
The continuous iterative generation process of the diffusion model consumes 10-2000 more calculations than the "consistency model" times, even slowing down inference during training.
The power of the "Consistency Model" lies in its ability to make a trade-off between sample quality and computing resources when necessary.
Additionally, this model is capable of performing zero-shot data editing tasks such as image patching, colorization, or stroke-guided image editing.
Zero-shot image editing using a consensus model trained by distillation on LSUN Bedroom 256^256
The "Consistency Model" also converts data into noise when using mathematical equations and ensures that the resulting output is consistent for similar data points, thereby enabling them to smooth transition.
This type of equation is called "Probability Flow Ordinary Differential Equation" (Probability Flow ODE).
This study named such models "consistency" because they maintain this self-consistency between input data and output data.
These models can be trained in either distillation mode or isolation mode.
In distillation mode, the model is able to extract data from a pre-trained diffusion model, enabling it to be executed in a single step.
In detached mode, the model does not depend on the diffusion model at all, making it a completely independent model.
It is worth noting that both training methods remove "adversarial training" from them.
I have to admit that adversarial training will indeed produce a more powerful neural network, but the process is more circuitous. That is, it introduces a set of misclassified adversarial samples and then retrains the target neural network with the correct labels.
Therefore, adversarial training will also lead to a slight decrease in the accuracy of deep learning model predictions, and it may even bring unexpected side effects in robotic applications.
Experimental results show that the distillation technique used to train the "consistency model" is better than that used for the diffusion model.
The "Consistency Model" achieved the latest state-of-the-art FID scores of 3.55 and 6.20 on the CIFAR10 image set and ImageNet 64x64 data set, respectively.
#This is simply realized, diffusion model The quality of GANs, the speed, is doubly perfect.
In February, Sutskever posted a tweet suggesting that
Many people believe that great AI progress must include a new "idea." But that’s not the case: Many of AI’s greatest advances have come in the form of, well, that familiar humble idea that, if done well, becomes incredible.The latest research proves just that, and tweaking an old concept can change everything.
Author IntroductionAs the co-founder and chief scientist of OpenAI, Ilya Sutskever No need to go into details, just take a look at this group photo of the "top performers".
## (far right of the picture)
Yang Song (Song Yang)Previously, he received a bachelor's degree in mathematics and physics from Tsinghua University and a master's and doctorate in computer science from Stanford University. In addition, he has interned at Google Brain, Uber ATG, and Microsoft Research. As a machine learning researcher, he focuses on developing scalable methods to model, analyze and generate complex high-dimensional data. His interests span multiple areas, including generative modeling, representation learning, probabilistic reasoning, artificial intelligence security, and AI for science. Mark Chen is the head of OpenAI’s multimodal and cutting-edge research department, He is also the coach of the U.S. Computer Olympiad team. Previously, he earned a bachelor's degree in mathematics and computer science from MIT and worked as a quantitative trader at several proprietary trading firms, including Jane Street Capital. After joining OpenAI, he led the team to develop DALL-E 2 and introduced vision into GPT-4. In addition, he led the development of Codex, participated in the GPT-3 project, and created Image GPT. ##Prafulla Dhariwal is a Research Scientist at OpenAI, working on generative models and autonomous Supervised learning. Before that, he was an undergraduate at MIT, studying computing, mathematics, and physics. Interestingly, the diffusion model can beat GAN in the field of image generation, which was what he proposed in the 2021 NeurIPS paper. Mark Chen
Prafulla Dhariwal
OpenAI opened the source code of the consistency model today .
Finally back to Open AI.
Faced with so many crazy breakthroughs and announcements every day. Netizens asked: Should we take a break or speed up?
This will significantly save researchers the cost of training models compared to diffusion models.
Some netizens also gave future use cases of the "consistency model": real-time editing, NeRF rendering, real-time games render.
There is currently no demo demonstration, but it is worth confirming that the speed of image generation can be greatly improved and is always the winner.
We upgraded directly from dial-up to broadband.
Brain-computer interface, plus ultra-realistic images generated in almost real time.
The above is the detailed content of OpenAI releases a new consistency model, GAN speed reaches 18FPS, and can generate high-quality images in real time.. For more information, please follow other related articles on the PHP Chinese website!