Admin

Google’s Chatbot-Powered Robot: A Key Player in an Emerging Revolution

Bigger Revolution, Google DeepMind's Chatbot-Powered Robot



The advancements in artificial intelligence (AI) and robotics have opened up a world of possibilities for enhancing the capabilities of robots. One such development is the use of large language models, which enable robots to understand and navigate their environment more effectively. Recently, Google DeepMind announced an upgrade to its wheeled robot, incorporating the latest version of its Gemini large language model.

This Google helper robot, working in a cluttered open-plan office in Mountain View, California, is able to perform tasks such as acting as a tour guide or an informal office helper. When given a command like “Find me somewhere to write,” the robot is able to use Gemini to both understand the command and locate a suitable location, such as a pristine whiteboard in the building. Gemini’s ability to handle both video and text, along with its capacity to analyze previously recorded video tours of the office, allows the robot to navigate its environment using commonsense reasoning.

The integration of Gemini with an algorithm that generates specific actions for the robot further enhances its capabilities. For example, the robot can turn or take other actions in response to commands and the visual input it receives. According to a paper outlining the project, the robot proved to be up to 90 percent reliable at navigating, even when given tricky commands such as “Where did I leave my coaster?” This demonstrates the significant improvement in the naturalness of human-robot interaction and the increased usability of the robot.

This development highlights the potential for large language models to extend beyond the confines of web browsers and apps and enter the physical world to perform useful tasks. While chatbots and other AI systems have predominantly operated in the virtual realm, recent advancements have allowed them to handle visual and auditory input as well. Google DeepMind’s upgraded Gemini model was able to make sense of an office layout through a smartphone camera, showcasing the potential for these models to enhance robots’ capabilities in various domains.

The interest in leveraging language models to enhance robotic abilities is not limited to Google DeepMind. Academic and industry research labs are actively exploring the use of vision language models to improve robots’ perception and understanding of their surroundings. The International Conference on Robotics and Automation, a prominent event for robotics researchers, features numerous papers that utilize vision language models. This indicates the growing recognition of the potential impact of language models in advancing robotics.

Investment in AI robotics startups is also on the rise as entrepreneurs and investors see the potential for these technologies to revolutionize various industries. Physical Intelligence, a startup founded by former Google researchers involved in the Gemini project, secured $70 million in funding. The company aims to combine large language models with real-world training to equip robots with general problem-solving abilities. Similarly, Skild AI, founded by roboticists at Carnegie Mellon University, also secured $300 million in funding as it works towards the same goal.

The integration of language models with robotics has significantly transformed robots’ capabilities. In the past, robots relied on pre-programmed maps and carefully designed commands to navigate their environment successfully. However, large language models contain valuable information about the physical world. Moreover, newer versions of these models, known as vision language models, are trained on images and videos in addition to text, allowing them to answer perception-based questions. The Gemini model used by Google’s robot, for example, can parse visual instructions and follow a sketch on a whiteboard to find a new destination.

Moving forward, the researchers plan to test the system on different types of robots and further enhance Gemini’s ability to understand complex questions. For instance, a user with empty Coke cans on their desk might ask, “Do they have my favorite drink today?” Gemini should be able to make sense of such queries and provide an appropriate response. The integration of language models with robotics holds immense potential for revolutionizing a wide range of industries and environments, from office assistance to healthcare and manufacturing.

In conclusion, Google DeepMind’s upgrade to its wheeled robot, incorporating the latest version of the Gemini large language model, demonstrates the significant progress being made in enhancing robots’ capabilities. The integration of language models enables robots to understand and navigate their environment more effectively, greatly improving the naturalness of human-robot interaction. This development opens up new possibilities for robots to perform useful tasks in various domains, showcasing the potential of large language models to extend beyond web browsers and apps. With ongoing research and investment in AI robotics startups, the future looks promising for the integration of language models and robotics, paving the way for a new era of intelligent and capable robots.



Source link

Leave a Comment