Many people know that AlphaGo, which defeated Li Sedol, Ke Jie and other top international chess players, has had three iterations, namely the first-generation AlphaGo Lee, which defeated Li Sedol, and the second-generation AlphaGo Master, which defeated Ke Jie. And the third generation AlphaGo Zero, which beats the previous two generations.
AlphaGo’s chess skills can increase from generation to generation. Behind this is actually an obvious trend in AI technology, which is the increasing proportion of reinforcement learning.
In recent years, reinforcement learning has undergone another "evolution". People call the "evolved" reinforcement learning deep reinforcement learning.
But the sample efficiency of deep reinforcement learning agents is low, which greatly limits their application in practical problems.
Recently, many model-based methods have been designed to solve this problem, and learning in the imagination of world models is one of the most prominent methods.
However, while nearly unlimited interaction with a simulated environment sounds appealing, the world model must remain accurate over long periods of time.
Inspired by the success of Transformer in sequence modeling tasks, Vincent Micheli, Eloy Alonso, and François Fleure of Cornell University introduced IRIS. This is a data-efficient agent that learns in a world model composed of discrete autoencoders and autoregressive Transformers.
On the Atari 100k benchmark, over the equivalent of just two hours of gameplay, IRIS achieved an average human-normalized score of 1.046 and outperformed humans in 10 out of 26 games.
Previously, LeCun once said that reinforcement learning will lead to a dead end.
Now it seems that Cornell University’s Vincent Micheli, Eloy Alonso, Francois Fleure and others are integrating world models and reinforcement learning (more precisely, deep reinforcement learning), and the bridge connecting the two is Transformers.
When it comes to artificial intelligence technology, what many people can think of is deep learning.
In fact, although deep learning is still active in the field of AI, many problems have been exposed.
The most commonly used method of deep learning now is supervised learning. Supervised learning may be understood as "learning with reference answers". One of its characteristics is that the data must be labeled before it can be used for training. But now a large amount of data is unlabeled data, and the cost of labeling is very high.
So much so that in response to this situation, some people joked that "there is as much intelligence as there are artificial intelligence."
Many researchers, including many experts, are reflecting on whether deep learning is "wrong".
So, reinforcement learning began to rise.
Reinforcement learning is different from supervised learning and unsupervised learning. It uses an agent to continuously trial and error, and rewards and punishes the AI according to the trial and error results. This is DeepMind’s method for making various chess and card AI and game AI. Believers of this path believe that as long as the reward incentives are set correctly, reinforcement learning will eventually create a real AGI.
But reinforcement learning also has problems. In LeCun’s words, “reinforcement learning requires a huge amount of data to train the model to perform the simplest tasks.”
So reinforcement learning and deep learning were combined to become deep reinforcement learning.
Deep reinforcement learning, reinforcement learning is the skeleton, and deep learning is the soul. What does this mean? The main operating mechanism of deep reinforcement learning is actually basically the same as reinforcement learning, except that a deep neural network is used to complete this process.
What’s more, some deep reinforcement learning algorithms simply add a deep neural network to the existing reinforcement learning algorithm to implement a new set of deep reinforcement learning algorithms. The very famous deep reinforcement learning Algorithm DQN is a typical example.
Transformers first appeared in 2017 and were proposed in Google’s paper “Attention is All You Need”.
Before the emergence of Transformer, the progress of artificial intelligence in language tasks had lagged behind the development of other fields. “Natural language processing has been somewhat of a latecomer to this deep learning revolution that’s happened over the past decade,” says Anna Rumshisky, a computer scientist at the University of Massachusetts Lowell. “In a sense, NLP was Lagging behind computer vision, Transformer changes this."
In recent years, the Transformer machine learning model has become one of the main highlights of the advancement of deep learning and deep neural network technology. It is mainly used for advanced applications in natural language processing. Google is using it to enhance its search engine results.
Transformer quickly became a leader in applications such as word recognition focused on analyzing and predicting text. It sparked a wave of tools like OpenAI’s GPT-3 that can be trained on hundreds of billions of words and generate coherent new text.
Currently, the Transformer architecture continues to evolve and expand into many different variants, extending from language tasks to other domains. For example, Transformer has been used for time series prediction and is also the key innovation behind DeepMind’s protein structure prediction model AlphaFold.
Transformers have also recently entered the field of computer vision, and they are slowly replacing convolutional neural networks (CNN) in many complex tasks.
Regarding the research results of Cornell University, some foreign netizens commented: "Please note that these two Hours are the length of shots from the environment, and training on the GPU takes a week."
Some people also question: So this system learns on a particularly accurate potential world model? Does the model require no pre-training?
In addition, some people feel that the results of Vincent Micheli and others from Cornell University are not ground-breaking breakthroughs: "It seems that they just trained the world model, vqvae and actor critics, all of which are Replay buffer from those 2 hours of experience (and about 600 epochs)".
Reference: https://www.reddit.com/r/MachineLearning/comments/x4e4jx/r_transformers_are_sample_efficient_world_models/
The above is the detailed content of Transformers+world model, can it save deep reinforcement learning?. For more information, please follow other related articles on the PHP Chinese website!