Everyone has encountered an age-old problem.
It’s a Friday night and you’re trying to pick a restaurant to eat at, but you don’t have a reservation. Should you wait in line at your favorite restaurant that’s packed with people, or try a new restaurant in the hope of discovering some tastier surprises?
The latter does have the potential to lead to surprises, but this kind of curiosity-driven behavior comes with risks: the food at that new restaurant you try might be even tastier.
Curiosity is the driving force for AI to explore the world, and there are countless examples - autonomous navigation, robot decision-making, optimized detection results, etc.
In some cases, machines use "reinforcement learning" to accomplish a goal. In this process, the AI agent repeatedly learns from good behaviors that are rewarded and bad behaviors that are punished.
Just like the dilemma humans face when choosing a restaurant, these agents are also trying to balance the time it takes to discover better actions (exploration) and taking the past that results in high rewards time of action (utilization).
Curiosity that is too strong will distract the agent from making a favorable decision, while curiosity that is too weak means that the agent will never be able to discover a favorable decision.
In pursuit of making AI agents have "just the right amount" of curiosity, researchers from MIT's Computer Science and AI Laboratory (CSAIL) created an algorithm that overcomes the problem of AI being too "curious" and problems with being distracted by the task at hand.
The algorithm they developed automatically increases curiosity when needed and decreases it if the agent has enough supervision from the environment that it already knows what to do.
Paper link: https://williamd4112.github.io/pubs/neurips22_eipo.pdf
After testing with more than 60 video games, this algorithm Able to succeed in exploration tasks of varying difficulty, whereas previous algorithms could only solve easy or hard difficulties individually. This means that AI agents can use less data to learn decision rules and maximize incentives.
"If you have a good grasp of the exploration-exploitation trade-off, you can learn the correct decision rules more quickly, and anything less requires a lot of data, which can mean the consequences of It's suboptimal medical treatment, the site's profits are down, and the robot doesn't learn to do the right thing."
Pulkit Agrawal, one of the study leaders, a professor at MIT and director of the Improbable AI Laboratory, said.
It seems difficult to explain the psychological basis of curiosity from a psychological perspective. We have not yet fully understood the underlying neurological principles of this challenge-seeking behavior.
With reinforcement learning, this process is emotionally "pruned", stripping the problem down to its most basic level, but the technical implementation is quite complex.
Essentially, an agent should only be curious when there is not enough supervision to try different things, and if there is supervision, it must adjust its curiosity and reduce its curiosity.
In the test game tasks, a large part is that the small agent runs around the environment looking for rewards and performs a long series of actions to achieve some goals. This seems to be a logical test of the researchers' algorithm. platform.
In experiments with games such as "Mario Kart" and "Montezuma's Revenge", researchers divided the above games into two different categories:
One is an environment with sparse supervision, where the agent receives less guidance, which is a "difficult" exploration game; the other is an environment with more intensive supervision, which is a "simple" exploration game.
Suppose in "Mario Kart", just remove all rewards, you don't know when an enemy kills you. You don't get any rewards when you collect a coin or jump over a pipe. The agent is only told at the end how it performed. This is a sparsely supervised environment, which is a difficult task. In this kind of task, algorithms that stimulate curiosity perform very well.
And if the agent is in a densely supervised environment, that is, there are rewards for jumping pipes, collecting coins, and killing enemies, then the best performance is the algorithm with no curiosity at all, because it often gets As a reward, just follow the process and you will get a lot without additional exploration.
If you use an algorithm that encourages curiosity, the learning speed will be very slow.
Because a curious agent may try to run fast in different ways, wander around, and visit every corner of the game. These things are fun, but they don’t help the agent succeed in the game and receive rewards.
As mentioned above, in reinforcement learning, algorithms that stimulate curiosity and inhibit curiosity are generally used to correspond to sparsely supervised (difficult) and supervised intensive (simple) tasks respectively, and cannot be mixed.
This time, the MIT team’s new algorithm always performed well, no matter what the environment.
Future work may involve returning to a quest that has delighted and troubled psychologists for years: an appropriate measure of curiosity—no one really knows the right way to mathematically define curiosity.
Zhang Weihong, a doctoral student at MIT CSAIL, said:
Tune the algorithm for the problem you are interested in by improving the exploration algorithm. We need curiosity to solve challenging problems, but on some problems curiosity can degrade performance. Our algorithm eliminates the balancing burden of adjusting exploration and exploitation.
For problems that previously took a week to solve, the new algorithm can obtain satisfactory results within a few hours.
He is co-author of a new paper on this work with Eric Chen, 22, a CSAIL master of engineering at MIT.
Deepak Pathak, a teacher at Carnegie Mellon University, said:
"Intrinsic reward mechanisms like curiosity are the basis for guiding agents to discover useful and diverse behaviors, but this is not should come at the expense of doing well at a given task. This is an important question in AI, and this paper provides a way to balance this trade-off. See how this approach scales from games to the real world It will be a very interesting thing on robot intelligence."
Alison Gopnik, Distinguished Professor of Psychology and Associate Professor of Philosophy at the University of California, Berkeley, pointed out that one of the biggest challenges in current AI and cognitive science is how to Balance "exploration and utilization", the former is the search for information, the latter is the search for rewards.
"This paper uses impressive new technology to automate this work, designing an agent that can systematically balance curiosity about the world and desire for rewards, making AI intelligent The body has taken an important step towards becoming as smart as real children," he said.
References:
https://techxplore.com/news/2022-11-bad-ai-curious.html
https://www.csail. mit.edu/news/ensuring-ai-works-right-dose-curiosity
The above is the detailed content of AI curiosity doesn't just kill the cat! MIT's new reinforcement learning algorithm, this time the agent is 'difficult and easy to take all'. For more information, please follow other related articles on the PHP Chinese website!