- Building AI Agents
- Posts
- Google's SIMA 2 makes a leap towards AGI
Google's SIMA 2 makes a leap towards AGI
Plus: Anthropic foils an agent-powered Chinese hacking campaign, a course on the leading agent orchestration framework, and more
Edition 137 | November 17, 2025
LLM agents require significantly less prompting than some humans I know.
Welcome back to Building AI Agents, your biweekly guide to everything new in the field of agentic AI!
In today’s issue…
Google advances towards general embodied intelligence
Anthropic foils an agent-powered Chinese hacking campaign
A course on the leading agent orchestration framework
How to build an AI stock trading agent
Microsoft cuts HR and IT tickets by 40% with a single agent
…and more
🔍 SPOTLIGHT

Source: Google DeepMind
On Thursday, Google DeepMind announced Scalable Instructable Multiworld Agent (SIMA) 2, a general AI system intended to be able to navigate virtual environments like those found in video games. Unlike most agents which have come before, SIMA 2 can understand natural language instructions, reason about abstract goals, describe its plans step-by-step, and learn new tasks across a wide range of 3D worlds without specific retraining, making it a major step towards the holy grail of AI, artificial general intelligence (AGI).
Before now, companies pursuing AGI have mainly taken two paths: reinforcement learning (RL) agents and large language models (LLMs).
RL agents are trained from scratch to play virtual games like chess, go, and StarCraft, learning their rules in the process, in the hope that a similar approach could eventually be taken to agents in the real world. DeepMind—which became Google DeepMind when it was acquired in 2014—was a pioneer of this strategy, making breakthroughs with agents like AlphaGo, AlphaZero, and MuZero. RL’s major drawback, though, is that agents learn a sort of intuitive, animal-like understanding of their environments that doesn’t use explicitly verbal reasoning like humans. RL agents can learn to be very good at Minecraft, for example, but they can’t explain their thought process.
LLMs took the world by storm in the 2020s by being able to communicate with humans, do verbal reasoning, and encode massive amounts of knowledge about the real world—all of the things RL agents can’t do. By simply training to predict the next word in a text, LLMs can implicitly pick up all kinds of human-like skills, including the ability to solve a problem by “thinking” step-by-step. Unfortunately, unlike RL agents, they are only trained to work with text, not interactive environments, so they lack the ability to “see” 3D worlds or move around in them.
Naturally, it was only a matter of time before someone combined the two approaches. In March of last year, DeepMind released the first SIMA, which used an LLM-like training process of learning to imitate actual human actions taken in nine different video games to accomplish what had previously only been done with RL agents: navigating virtual environments. The result was an agent that had learned over 600 skills across all the games, showing a glimmer of general intelligence.
With SIMA 2, DeepMind put in the last missing piece in the puzzle by integrating its Gemini LLM into the agent, allowing it to explicitly reason its way through the environments it encounters, explain what it is doing in natural language, and take instructions from humans, resulting in an agent that feels more like a virtual person than an animal intelligence running on crude instinct.
Coming on the heels of DeepMind’s recent breakthrough in humanoid robotics, it’s clear that Google is pulling out ahead of the pack in turning AI agents from purely text-based automations to ones that are learning to operate in the real world. AGI isn’t here yet, but it suddenly feels a lot closer.
Always keep learning and building!
—Michael
