Google DeepMind 用於 3D 虛擬環境的通用 AI 代理「SIMA」是什麼？ [CEDEC 2024]-遊戲新聞-PHP中文網

2024年8月21日，在遊戲開發者大會「CEDEC 2024」上舉辦了分會場「SIMA：利用電玩遊戲開發通用人工智慧代理」。

Google DeepMindの3D仮想環境向け汎用型AIエージェント「SIMA」とは？［CEDEC 2024］

在本次會議中，我們將概述用於3D 虛擬環境的Google DeepMind 通用AI 代理
「SIMA」（可擴展可指導的多世界代理）、使用遊戲的訓練方法、從研究中獲得的學習和挑戰以及未來的項目。公司技術策略/AI研發資料策略部門負責人Mufarek介紹了公司的發展方向等資訊。

Google DeepMind 及其遊戲 DNA

Mufarek 最初將Google DeepMind 的使命描述為“構建造福人類的負責任的人工智能”，或者開發可用於安全解決現實世界問題的AGI（通用人工智能），他解釋說，他的目標是讓它變得有用，並且。介紹了他近15年來所做的研究。他開始研究雅達利的棋盤遊戲和簡單遊戲，最後開始開發強化學習演算法，靈感來自神經科學和他對大腦運作原理的理解。

此外，透過應用從這些專案中獲得的知識並進行研究，透過結合公司的人工智慧模型「AlphaProof」

和
「AlphaGeometry 2」，可以將從這些專案中獲得的知識應用於2024年國際數學奧林匹克競賽。據說他的能力已經達到了銀牌水平。 還有人提到，這樣的結果也被用在 Google 的生成式 AI 「Gemini」 中了。

SIMA 在研究中使用了遊戲，因為其大多數成員，包括 Mufarek 本人和 Google DeepMind 執行長 Demis Hassabis，都是前遊戲開發人員。他說：「遊戲是我們的 DNA。」他也表示，SIMA 的研究和遊戲開發的共同點比人們想像的要多。

Mufarek先生解釋了研究和遊戲開發的過程如下。換句話說，如果你“提出一個假設並經歷反覆試驗”，你最終將“發現一個具有巨大潛力的重要作品”。然而，「在某些時候，這部分會停止工作，你最終會陷入一種狀態，你甚至不知道它為什麼會發生，也不知道它為什麼會起作用。」從那時起，這是一個漫長的、反覆的、要求很高的過程，“發現所有行不通的方法”，但只要有足夠的耐心、資源、對最初假設的信心和堅持，你就會找到解決方案。關於。一切都從那裡加速，良好地嚙合併融合在一起。

History of AI research using games

Mufarek says that games have long contributed to the advancement of AI research and will continue to be the driving force driving research forward. Specifically, games provide AI research with ``rich, dynamic, and complex environments in which people can interact and learn,'' ``scalable and reproducible experiments,'' and ``controlled and safe testing.''

When it comes to rich, dynamic, and complex environments that you can interact with and learn from, the challenges presented in games, such as solving moving puzzles in virtual space, strategizing against opponents, and adapting to changing situations, can be compared to the diverse range of real-world situations. It was explained that AI models can help develop advanced problem-solving skills and decision-making abilities that can be adapted to various situations.

For scalable and reproducible experiments, researchers can easily create instances of game environments, run many simulations simultaneously, and use the vast amounts of data they can collect to train and evaluate AI models. was mentioned. Additionally, experiments can be consistently replicated, ensuring the reliability and validity of research results.

When it comes to controlled and safe testing, evaluating the performance of an AI model in a variety of virtual situations can help identify potential flaws and limitations and improve algorithms without the risks associated with real-world testing. was shown. This is particularly important for apps such as self-driving cars and medical diagnostics, where errors can have serious consequences.

Cases were also shown in which AI research actually progressed through games between 2010 and 2024, when reinforcement learning and deep learning improved dramatically. In the early 2010s, Google DeepMind took on the challenge of developing algorithms using Atari games and DQN (Deep Q-Network). As a result, an algorithm was created that demonstrated superhuman performance when playing over 50 Atari games.

In the mid to late 2010s, Microsoft developed an AI training project "Project Malmo" using "Minecraft" . Additionally, OpenAI's AI learning platform "Universe" has a very general-purpose UI, making it possible to scale up the game and use it for research purposes.

Also, in the late 2020s, the AI system “OpenAI Five” for “Dota 2” will appear, and the AI agent “AlphaStar” developed by DeepMind will become a top player in “StarCraft II” . AI began to be used even in complex games, such as winning games. During this period, Mufarek focused on a single environment with a customized action space, and created a customized research platform by modifying the game's source code and implementing special APIs for the AI agent. He explained that he had done so.

In 2017, the machine learning model "Transformer" announced by Google expanded the versatility of AI, including summarizing dialogue sentences, writing poetry, and analyzing data using large-scale language models (LLMs). This was made possible through chatbots. With further generalization, it has become possible to generate images, audio, and video using AI.

However, Mufarek points out the limitations of such large-scale AI models. In other words, large-scale AI models have no physicality, so they only exist in the digital realm and cannot operate in the physical realm. Therefore, in order to utilize AI in the physical domain, it is necessary to give it physicality through physical sensors, such as in Softbank's Pepper and Waymo's self-driving cars.

The next chapter of AI research: SIMA

According to Mufarek, DeepMind has advanced research on SIMA in order to overcome the above-mentioned limitations of AI models. The goal is to ``develop an AI agent that can be conditioned by language.'' In other words, it not only plays games autonomously, but also allows humans to use natural language to tell them what they want them to do. The aim was to create an AI agent that can perform the following.

The hypothesis established to achieve this goal is that ``If an AI agent can learn something in one environment and use that skill to do something in another environment, then AI will become generalized.'' will proceed.'' In other words, instead of preparing a dedicated AI agent for each game title, when a human touches a new game, a single AI agent can carry over operations such as characters and cameras from the previous game. This means making it a reality.

To this end, DeepMind has partnered with several game companies to create a learning portfolio for AI agents. Specifically, the AI agent was trained by recording human gameplay of games such as ``No Man's Sky,'' ``Valheim,'' ``Teardown,'' and ``Goat Simulator.'' Furthermore, it seems that SIMA was able to be realized by giving text-based instructions.

SIMA training

An introduction was also given to how SIMA's learning pipeline was constructed. According to Mufarek, by first onboarding the game and research environment, SIMA will be able to play the game just like a human, without having access to source code or special APIs.

Additionally, onboarding for games and research environments will be done in cooperation with the game's developer. This is to clarify who is responsible for how the data used in the game and SIMA project is handled.

According to Mufarek, the SIMA project required a diverse and non-violent learning portfolio. For this reason, we selected a variety of game titles, including those that are visually natural, industrial, realistic, science fiction, or from a first-person or third-person perspective. It also incorporates open world and sandbox elements to allow SIMA to take various actions through complex mechanisms.

SIMA uses a general-purpose interface, which is said to be in order to create a general-purpose AI agent. SIMA first receives goals and instructions from humans in the form of text written in natural language, and then recognizes them in real time. Then, just like humans, they play games using a controller or keyboard and mouse.
Mufarek explained that by using such a general-purpose interface, SIMA can be incorporated into any game without customization.

Additionally, two methods were used to create SIMA training data. One is for a single person to play the game, watch the video, and annotate important points using natural language. The second method involves teams of two people, with one person giving instructions in natural language and the other person following them, filming a gameplay video and adding annotations.
The SIMA data set is the addition of keyboard and mouse operation data.

These datasets include skills necessary for SIMA gameplay, such as ``creating objects'' and ``driving a car'' in the game. As a result of collecting these skills for all titles, the total number is huge, but it is still not enough for the SIMA project.
Mr. Mufarek said that the higher the quality of data and annotation, the more useful it will be for improving SIMA, and that he will continue to make such efforts.

Once the dataset is ready, SIMA learning training can finally begin. The technique used here is ``conditioned behavioral cloning,'' which involves learning by imitating human play.
At its core is an architecture that supports pre-trained models, but since Gemini did not yet exist when it was developed, it uses Classifier-Free Guidance (CFG) to prioritize verbal instructions over visual input. It was revealed that the company helped the children learn to understand natural language and helped them understand natural language well.

In the phase to evaluate SIMA's results, a challenge set was created to measure performance on various tasks. A task has three elements: the first is the "initial state" where SIMA starts its actions, the second is the "goal/instruction" that SIMA must follow, and the third is "the initial state" that determines whether or not the task has been accomplished. success criteria."

SIMA also uses ``Ground Truth,'' which programmatically determines whether a task has been completed successfully, ``Optical Character Recognition (OCR),'' which provides feedback on actions taken based on changes in text on the screen, and human It was also introduced that evaluation will be done from three perspectives: ``human evaluation,'' which involves checking the video and confirming whether the task was completed successfully.

SIMA early research results and limitations of this approach

Early research results of the project revealed that SIMA can complete tasks commonly performed in a variety of games, such as "moving forward" and "opening a menu."

They were also able to successfully complete tasks that could have different meanings from game to game, such as taking off a spaceship in ``No Man's Sky'' or piloting a boat in ``Teardown.''

On the other hand, whether the players were able to complete the tasks specific to each game was evaluated using three separately prepared methods.
One is ``Specialist,'' which is trained on data from a single game and evaluated in the same environment, and this is considered 100% performance as the baseline for evaluation.
The second is ``SIMA,'' which trains data from 10 games and then tests and evaluates it in the environment of one of the games.
The third one is ``Zero-Shot,'' which trains data from 9 out of 10 titles and tests and evaluates it in the game environment of the remaining 1 title.

As a result, SIMA demonstrated higher performance than Specialist when learning all 10 titles, and performance close to Specialist even with Zero-Shot.
In other words, Mr. Mufarek was very satisfied because he was able to confirm that ``an AI agent can learn something in one environment and use that skill to do something in another environment.'' .

However, the goal of this project is to "develop an AI agent that is conditioned by language." Therefore, when learning and testing was performed without natural language annotations, SIMA's performance deteriorated significantly.
For the first time, the hypothesis that ``training a single agent in many large-scale environments results in transfer of learning and generalization'' was proven.

SIMA's performance in each title was also shown. According to Mr. Mufarek, the difference in generalization between titles is due to the difference in the amount of specific knowledge required to execute the task

If you add instructions using CFG to SIMA, you can get higher performance than without it. However, once a certain threshold is exceeded, performance seems to drop.

Based on the above results, Mr. Mufarek says that ``SIMA has been a truly wonderful success,'' but that it is ``far from perfect.'' This is because the task completion rate is greatly affected by the environment, and is not at all comparable to human play.
However, he said that this is what motivates him to do SIMA research going forward.

Future developments

Finally, Mr. Mufarek indicated the future development of the SIMA project. It is said that this will be next-generation simulation-based AI agent research. It is the foundation of AI research using games, which has been conducted for many years, and it seems that there is still a lot of work to be done.

Until now, we have been researching learning to improve the performance of AI agents, but for example, due to updates to "StarCraft II", AlphaStar's performance has deteriorated.
Mufarek said, ``It's not realistic to have the AI agent retrain every time the game is updated,'' and believes that by making SIMA more general-purpose, the AI agent will be able to perform well even when new features are added to the game. spoke.

Also, SIMA is good at tasks that can be completed in a short time, such as "gathering firewood" and "setting the firewood on fire," but it is not always good at tasks that require planning, multiple steps, and reasoning, such as "building a house." That's not the case.
However, now it seems that Gemini can be a powerful support for SIMA. For example, Gemini can become a director and divide a long task like ``building a house'' into short tasks and hand them over to SIMA. Ta.

Mr. Mufarek reiterated that while the SIMA project is very exciting and promises great versatility, it has not yet become a fully general-purpose AI agent. If that happens, further developments will become possible.''

以上是Google DeepMind 用於 3D 虛擬環境的通用 AI 代理「SIMA」是什麼？ [CEDEC 2024]的詳細內容。更多資訊請關注PHP中文網其他相關文章！