Google PaLM-E AI model enables robots to understand natural language and execute tasks

Google PaLM-E AI model enables robots to understand natural language and execute tasks

Google’s Robotics team has revealed a new artificial intelligence (AI) model that could enable robots to understand and execute instructions given in natural language. The AI model is based on Google’s existing large language model (LLM) called “PaLM.”

The platform is dubbed PaLM-E and combines vision with ChatGPT-style AI models for natural language processing (NLP). Mobile robots can observe their environment through a camera and act accordingly without needing preprocessed scene representations. In simpler terms, PaLM-E can “understand” what it sees in its environment simply by looking at it.

More technically, Google notes in its research paper that “The main architectural idea of PaLM-E is to inject continuous, embodied observations such as images, state estimates, or other sensor modalities into the language embedding space of a pre-trained language model. This is realized by encoding the continuous observations into a sequence of vectors with the same dimension as the embedding space of the language tokens.”

This allows it to understand visual information in the same way it processes language.

What makes PaLM-E remarkable is that it can react to environmental changes and complete complex multi-step tasks requiring both navigation and manipulation. For example, it could be given the instruction “I spilled my drink, can you bring me something to clean it up?” and would then plan a sequence of actions including “1. Find a sponge, 2. Pick up the sponge, 3. Bring it to the user, 4. Put down the sponge” to complete the task.

The researchers also noted that PaLM-E exhibits “positive transfer,” meaning it can take knowledge and skills acquired from prior tasks and apply them to new ones, leading to higher performance than single-task robot models. Furthermore, they found it also can analyze a sequence of inputs consisting of both language and visual information, as well as “multi-image inference,” where multiple images are used to predict something.

Additionally, they noticed that the larger the language model is, the more it maintains its language capabilities when training on visual language and robotics tasks.

“Our evaluations show that PaLM-E, a single large embodied multimodal model, can address a variety of embodied reasoning tasks, from a variety of observation modalities, on multiple embodiments, and exhibits positive transfer,” concludes Google.

That said, Google Robotics isn’t the only organization exploring neural networks for robotic control. Microsoft recently released a paper called “ChatGPT for Robotics,” similar to Google’s research.

Article Topics

 |   |   |   |   | 

Comments

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Sponsored Links

Avassa: Empowers companies to bridge the gap between modern containerized applications development and operations and distributed edge infrastructure. https://avassa.io/

DataBank: We believe there is a different edge to be served - the “middle edge" - that will become the first step for many in their journey to the edge. https://www.databank.com/

Latitude.sh: Where the power of bare metal meets the flexibility of the cloud. Deploy physical servers across 23 global locations in as little as 5 seconds. https://www.latitude.sh/

Zenlayer: A massively distributed edge cloud service provider operating over 270 PoPs around the world, with expertise in fast-growing emerging markets. https://www.zenlayer.com/

OnLogic: A global industrial PC manufacturer and solution provider focused on hardware for IoT and edge AI, OnLogic designs highly-configurable computers engineered for reliability. https://www.onlogic.com/

Featured Company

Latest News