June 24, 2024

Introducing Alter3: The Latest Humanoid Robot Powered by GPT-4

In recent years, there has been significant progress in integrating large language models (LLMs), such as GPT-4, with robotics systems. This combination of advanced language models and robotics has the potential to revolutionize the field of robotics and create more intelligent and capable robots. One such example is the humanoid robot Alter3, developed by researchers at the University of Tokyo and Alternative Machine.

Alter3 utilizes GPT-4 as its backend model, which receives natural language instructions and maps them to robot actions. The model uses an “agentic framework” to plan a series of actions needed to achieve the desired goal. This planning stage involves determining the necessary steps for the robot to execute the action. Once the action plan is generated, a coding agent generates the commands required for the robot to perform each step.

One interesting aspect of Alter3’s implementation is its ability to adapt its behavior to the robot’s API. Since GPT-4 has not been trained specifically on the programming commands of Alter3, the researchers use its in-context learning capability. The model is provided with a list of commands and examples of how each command can be used, enabling it to map the steps of the action plan to the appropriate API commands for execution.

However, language alone is not always sufficient to describe precise physical actions. The researchers have incorporated a mechanism to receive human feedback for corrections and improvements. By providing instructions such as “Raise your arm a bit higher,” the model can make the necessary adjustments to the action sequence. This feedback loop between humans and the model enhances the performance of Alter3 and allows for more accurate execution of desired behaviors.

Alter3 has been tested on various tasks, including everyday actions like taking a selfie and drinking tea, as well as mimicry motions like pretending to be a ghost or a snake. The researchers have found that GPT-4’s extensive knowledge about human behaviors and actions enables more realistic behavior plans for humanoid robots. Notably, the model can infer emotions from text and reflect them in Alter3’s physical responses, even when emotional expressions are not explicitly stated.

Integrating foundation models like GPT-4 into robotics control systems is gaining popularity in the field. Companies like Figure, valued at $2.6 billion, utilize OpenAI models to understand human instructions and carry out actions in the real world. With the advancement of multi-modality in foundation models, robotics systems are better equipped to reason about their environment and make informed decisions.

Alter3 falls into the category of projects that leverage off-the-shelf foundation models as reasoning and planning modules in robotics control. This approach allows the code to be used for other humanoid robots as well. However, other projects, such as RT-2-X and OpenVLA, use specialized foundation models designed to directly produce robotics commands. While these models yield more stable results and generalization to various tasks and environments, they require technical skills and are costly to develop.

It’s worth noting that these projects often focus on the higher-level tasks of understanding language instructions and planning actions. However, there are still significant challenges in creating robots that can perform more basic tasks, such as grasping objects, maintaining balance, and navigating their surroundings. These tasks often require data that is currently limited or non-existent, making them more difficult to tackle.

In conclusion, the combination of large language models and robotics systems holds great promise for the future of robotics. Alter3’s utilization of GPT-4 showcases the potential of leveraging language models in robotics control. By incorporating human feedback and adapting behavior based on API commands, Alter3 demonstrates improved performance in executing desired actions. As the field progresses, more advanced foundation models and specialized robotics commands are expected to enhance the capabilities and intelligence of robots. However, challenges still remain in tackling the fundamental tasks of robotics, requiring further research and development to overcome existing limitations.

Source link