The robot dog walks steadily on the yoga ball, and its balance is quite good:
It can handle various scenes, whether it is a flat sidewalk , or even challenging lawns can hold:
Even if a researcher kicks a yoga ball, the robot dog will not tip over:
The robot dog that deflates the balloon can also maintain balance:
The above demonstrations are all at 1x speed and have not been accelerated.
-
Paper address: https://eureka-research.github.io/dr-eureka/assets/dreureka-paper.pdf
- Project homepage: https://github.com/eureka-research/DrEureka
- Paper title: DrEureka: Language Model Guided Sim-To-Real Transfer
This research was jointly created by researchers from the University of Pennsylvania, NVIDIA, and the University of Texas at Austin, and is completely open source. They proposed DrEureka (Domain Randomized Eureka), a new algorithm that utilizes LLM to implement reward design and domain randomized parameter configuration, which can simultaneously achieve simulation-to-reality transfer. The study demonstrates the DrEureka algorithm's ability to solve novel robotic tasks, such as quadruped robot balancing and walking on a yoga ball, without the need for iterative manual design. In the abstract section of the paper, the researchers stated that transferring strategies learned in simulations to the real world is a promising strategy for large-scale acquisition of robot skills. However, simulation-to-reality approaches often rely on the manual design and tuning of task reward functions and simulation physical parameters, which makes the process slow and labor-intensive. This paper examines the use of large language models (LLMs) to automate and accelerate simulation-to-realistic design. Jim Fan, one of the authors of the paper and a senior scientist at NVIDIA, also participated in this research. Previously, Nvidia established an AI laboratory, led by Jim Fan, specializing in embodied intelligence. Jim Fan said: "We trained a robot dog to balance and walk on a yoga ball. This was completely done in simulation, and then zero-sample migration Go to the real world and run directly without fine-tuning ##The yoga ball walking task is particularly difficult for the robot dog because we cannot accurately simulate the bouncy ball surface. Easily search for a large number of simulated real configurations and allow the robot dog to control the ball on various terrains and even walk sideways!
## Generally speaking, the transfer from simulation to reality is This is achieved through domain randomization, a tedious process that requires roboticists to eye every parameter and manually adjust it. Cutting-edge LLMs like GPT-4 have a lot of built-in physical intuition, including friction, damping, stiffness, gravity. etc., with GPT-4, DrEureka can skillfully adjust these parameters and explain its reasoning well."The DrEureka process is as follows, which accepts task and safety instructions and environment source code, and runs Eureka to generate regularization reward functions and strategies. It then tests the strategy under different simulation conditions to construct a reward-aware physical prior, which is fed to an LLM to generate a set of domain randomization (DR) parameters. Finally, the policy is trained using the synthesized reward and DR parameters for actual deployment. Eureka Reward Design. The reward design component is based on Eureka because of its simplicity and expressiveness, but this paper introduces some improvements to enhance its applicability from simulation to real-world environments. The pseudocode is as follows: #Reward aware physics prior (RAPP, reward aware physics prior). Security reward functions can regulate policy behavior to fix environmental choices, but are not sufficient by themselves to achieve simulation-to-reality transfer. Therefore, this paper introduces a simple RAPP mechanism to limit the basic scope of LLM. LLM is used for domain randomization. Given the RAPP range for each DR parameter, the final step of DrEureka instructs LLM to generate domain randomization configurations within the limits of the RAPP range. See Figure 3 for the specific process: This research uses Unitree Go1 for experiments. Go1 is a small quadruped robot with 12 degrees of freedom in its four legs. In the quadrupedal locomotion task, this paper also systematically evaluates the performance of DrEureka policies on several real-world terrains and finds that they remain robust and outperform policies trained using human-designed reward and DR configurations. For more information, please refer to the original paper.
The above is the detailed content of Walking the 'dog' on the yoga ball! Eureka, selected as one of NVIDIA's top ten projects, has made a new breakthrough. For more information, please follow other related articles on the PHP Chinese website!