Has AGI Just Taken Another Step? Google Updates Gemini Robotics Models to Help Robots Perceive Environments and Complete Complex Tasks

Key Points

Release (Sept 25, 2025): DeepMind unveiled Gemini Robotics 1.5 (a visual-language-action VLA) and Gemini Robotics‑ER 1.5 (a high-performance VLM), marking a coordinated update aimed at robot decision-and-execution stacks.
Complementary roles: Gemini Robotics‑ER 1.5 is used to perceive and plan (and can call external tools like Google Search), while Gemini Robotics 1.5 converts those plans into low-level motor actions, exposes internal reasoning, and enables cross-robot transfer of action policies.
Platform implications: The updates reinforce a model-first strategy that could reduce per-platform retraining, accelerate deployment across bodies, and advance the pursuit of “physical AGI.”
Ecosystem & signals to watch: Edge compute like Jetson Thor (NVIDIA) and analyst perspectives from Zhongjin Gongsi 中金公司 (CICC) and Huatai Zhengquan 华泰证券 matter; major players include Google (DeepMind), OpenAI, Meta, 英伟达, and Chinese firms such as 华为, Baidu 百度, and 科大讯飞.

Gemini Robotics is at the center of this update and deserves attention from investors and builders alike.

DeepMind released two upgraded models—Gemini Robotics 1.5 and Gemini Robotics‑ER 1.5—that work together as a robot’s decision-and-execution system, enabling richer perception, multi-step planning, and cross-robot learning.

What Google released

On September 25, DeepMind introduced two coordinated model releases designed specifically for robots:

Gemini Robotics 1.5 — a visual-language-action (VLA) model that converts visual input and instructions into motion and action commands.
It reasons before acting and exposes that reasoning process, which helps robots evaluate and execute complex multi-step tasks more reliably.
The model can also transfer learned actions from one robot to another, letting different robot platforms “learn” from each other without creating a separate model per hardware type.
Gemini Robotics‑ER 1.5 — a high-performance visual-language model (VLM) focused on planning and logical decision-making in physical environments.
It has advanced spatial understanding, interacts in natural language, estimates task success probability and progress, and can natively call web tools (for example, Google Search) to gather information and craft detailed multi-step plans.

Resume Captain

Your AI Career Toolkit:

AI Resume Optimization
Custom Cover Letters
LinkedIn Profile Boost
Interview Question Prep
Salary Negotiation Agent

Get Started Free

How the two models work together

DeepMind describes the two models as complementary components of a robot’s “execution and decision” system.

In practice, a robot first uses Gemini Robotics‑ER 1.5 to perceive and interpret its environment, form a plan, and—if needed—look up external information via search.

Those high-level, language-formatted plans are then passed to Gemini Robotics 1.5, which integrates visual inputs with language instructions to generate low-level motor actions and carry out each step.

Because Gemini Robotics 1.5 exposes its internal reasoning and supports transferring action policies between different robots, developers and manufacturers won’t need to train a separate motion model from scratch for each robot body.

That capability could significantly increase robotic generality and accelerate real-world deployment across different hardware platforms.

Why this matters for “physical AGI”

DeepMind framed the releases as progress toward an “era of physical agents”—systems that can perceive, plan, use tools, and act in the physical world to solve multi-step tasks.

If robots can reliably think ahead, call external knowledge, and then translate plans into actions on diverse hardware, they move closer to the long-term goal some call physical AGI (general artificial intelligence capable of operating in the real world).

Industry observers note this strategic shift: rather than building custom robot bodies and control stacks, Google is increasingly focused on providing a “robot OS”-style stack—powerful, general models that many robot manufacturers could adopt, analogous to how Android and other platforms scaled across phones.

Find Top Talent on China's Leading Networks

Post Across China's Job Sites from $299 / role, or
Hire Our Recruiting Pros from $799 / role

- - - - - - - -

Qualified Candidate Bundles
Lower Hiring Costs by 80%+
Expert Team Since 2014

Get 25% Off
Your First Job Post

Context in the broader robotics ecosystem

The Gemini Robotics updates arrive as other companies invest in robot-focused compute and models.

On August 25, NVIDIA introduced Jetson Thor, a next-generation compute platform aimed at physical-AI and robotics developers; NVIDIA positioned it as the “brain” for research and industrial robot systems.

Startups and incumbents are also racing to develop end-to-end robot brains.

Figure (Figure) has developed an end-to-end robot AI model called Helix that maps rich semantic knowledge from visual-language models directly to actions.
Dyna Robotics (Dyna Robotics) is working on models that help robots learn and improve in real-world environments; its CEO Lindon Gao has said the company focuses on data-driven learning rather than hand-coded task instructions, with the goal of unlocking “physical AGI.”

Analysts at China International Capital Corporation (Zhongjin Gongsi 中金公司, CICC) have argued that only a few robot companies with full-stack capabilities will likely break through to what they call “embodied intelligence.”

CICC sees large models trained for robotics as the key to overcoming traditional control bottlenecks and achieving more general, embodied intelligence.

Huatai Securities (Huatai Zhengquan 华泰证券) similarly assesses that the recent wave of interest in embodied intelligence is driven by breakthroughs in large models.

These models set the ceiling for humanoid robot generalization, and the firms that control both model development and integration into bodies may form the core commercialization winners.

Major players in this space include Google (DeepMind), OpenAI, Meta, NVIDIA (Yingweida 英伟达), and, on the Chinese side, established tech firms such as Huawei (Huawei 华为), Baidu (Baidu 百度), and iFlytek (Keda Xunfei 科大讯飞)—plus a growing set of startups targeting a “general robot brain.”

Takeaways and implications

Model-first strategies: Google’s releases reinforce a platform approach—provide a general, powerful “brain” so multiple robot makers can build different bodies and capabilities on top of the same intelligence stack.
Cross-platform transfer: The ability to transfer actions between robot bodies reduces the need for per-platform retraining and could speed real-world adoption.
Tools-in-the-loop planning: Native access to search and other external tools during planning lets robots incorporate up-to-date knowledge and produce longer, more reliable task plans.
Competition and consolidation: As model capability becomes the bottleneck for robot generalization, companies that combine model R&D, systems integration, compute, and long-term resource commitment may define the market’s winners.

Whether these model updates mark a definitive step toward AGI is still open to debate.

But Gemini Robotics 1.5 and Gemini Robotics‑ER 1.5 represent a clear, practical advance in giving robots richer perception, planning, and action capabilities in the physical world.

What investors, founders, and builders should watch next

Adoption vs. integration: Watch whether robot manufacturers integrate these models directly or wait for third-party wrappers and toolchains to simplify adoption.
Cross-robot transfer in practice: Track early demos and papers showing successful policy transfer between different hardware platforms without retraining.
Compute and edge constraints: Monitor how Jetson Thor and other edge compute platforms are paired with VLA/VLM stacks in real deployments.
Tooling and safety: Look for patterns in tool use (search, APIs) during planning and the safety guardrails companies apply as models gain action authority.
Commercialization paths: Follow which business models win—open platform, cloud-hosted robot brains, or vertically integrated product plays.

Quick checklist for founders and product leads

Evaluate whether your robot needs a model-level integration or a lightweight adapter layer for Gemini Robotics‑style APIs.
Plan for observation and imitation data collection to enable cross-platform transfer and continuous improvement.
Consider partnerships with compute providers (for example, NVIDIA) and model vendors to accelerate go-to-market.
Design user experiences that expose model reasoning when helpful, improving debugging and operator trust.

Linking and research opportunities

For deeper technical reads and product signals, link directly to the DeepMind announcement and NVIDIA Jetson Thor materials.

Reference these pages when building investor decks, technical whitepapers, and competitive maps.

Use the references below as authoritative anchors for your research and linking strategy.

Quick semantic keywords to include on related pages: robot OS, embodied intelligence, visual-language-action model, VLM, cross-robot policy transfer, physical AGI, robot brain, edge compute for robotics.

For investors and builders tracking robot brains, Gemini Robotics is a major development to follow closely.