- Building AI Agents
- Posts
- OpenAI's computer use agent arrives
OpenAI's computer use agent arrives
Plus: a free course on building with Anthropic, how agents can help businesses now, and more

Welcome back to Building AI Agents, your biweekly guide to everything new in the AI agent field!
“ai is coming for your jobs” i’d like to see ai have a few too many drinks at tonight’s Christmas party leading to an incredibly awkward Monday Morning stand-up meeting
— Trung Phan (@TrungTPhan)
8:28 PM • Dec 21, 2024
Some enterprising research team needs to come up with a new eval for this skill. LawsuitBench?
In today’s issue…
OpenAI’s agent is finally here
Perplexity releases an agentic assistant
Automation across 7,000 apps with Zapier’s agent
Humanity’s last exam
…and more
🔍 SPOTLIGHT

Source: OpenAI
Remember those “Ph.D.-level super-agents” OpenAI leaks teased last week? They’re here.
On Thursday, OpenAI released its much-anticipated Operator agent, which uses computer vision capabilities to perform tasks for a user by directly taking over their computer. Operator, whose existence and approaching January launch were hinted at by a leak back last November and another a week ago, will first be available to users of OpenAI’s $200/month “Pro” tier, though the company says it will eventually make it accessible to users on the Plus, Team, and Enterprise tiers as well.
The model which powers Operator, dubbed Computer-Using Agent (CUA), consists of a version of GPT-4o fine-tuned through reinforcement learning to operate on graphical user interfaces (GUIs), giving it the ability to interact with most pieces of computer software, as humans can, rather than being restricted to text-based interfaces like many other AI agents. According to OpenAI, CUA achieves state-of-the-art results on OSWorld, a benchmark of agents’ ability to complete a wider range of tasks on a computer, as well as WebArena and WebVoyager, which focus specifically on web browsing.
While Operator’s apparent abilities are impressive, OpenAI is a latecomer to the already-booming field of computer agents. Anthropic’s challenger, Claude Computer Use, was released all the way back in October, with Google following two months later with Mariner. The field’s origins can be traced even further back to the startup Adept, which launched in April 2022 (and was the subject of yours truly’s first piece of writing on AI agents), though it has become largely a non-entity after being gutted by Amazon.
Operator’s release gives OpenAI a needed reputational boost just days after Chinese reasoning model DeepSeek R1 shocked the AI community by surpassing OpenAI’s o1 on a variety of benchmarks and dramatically undercutting its price. But at the end of the day, OpenAI is still playing catch-up, and Anthropic is not resting on its laurels—its CEO, Dario Amodei, recently stated that Computer Use was only the first step towards a more powerful agentic AI worker called a “virtual collaborator” the company plans to release. Smaller startups such as Browserbase and Browser Use are also getting in on the action, open-sourcing browser-using agents powered by their technology.
With these competitors hot on OpenAI’s heels, Operator is unlikely to remain the leading computer use agent for long—if it ever was.
If you find Building AI Agents valuable, forward this email to a friend or colleague!
🤝 WITH THE AI REPORT
There’s a reason 400,000 professionals read this daily.
Join The AI Report, trusted by 400,000+ professionals at Google, Microsoft, and OpenAI. Get daily insights, tools, and strategies to master practical AI skills that drive results.
📰 NEWS

Source: Perplexity
AI-powered web browser startup Perplexity released an Android app which allows users to perform such actions as booking a dinner, finding a song, or calling a ride with agentic assistance.
🛠️ USEFUL STUFF

Source: DeepLearning.ai
This course on DeepLearning.ai, taught by Anthropic’s Head of Curriculum Colt Steele, introduces developers to different aspects of building with the company’s models, and how they come together to power Computer Use.
Zapier, which builds interfaces for thousands of web apps which are widely used for workflow automation, rolled out its own Zapier agents which run on top of its technology.
Many websites are making life more difficult for agent builders by restricting access to their APIs and blocking scrapers. Hyperbrowser allows agents to bypass many of these restrictions with effective headless web browsing.
Already a significant player in LLM applications with its Workflows framework, LlamaIndex released a further extension called AgentWorkflows which allows users to build multi-agent systems.
💡 ANALYSIS

Source: Bloomberg
The author of this piece by Bloomberg argues that broadly generalist agents are—at least for the time being—overhyped, but businesses can derive immediate value by automating individual tasks with agents.
NVIDIA describes a new type of enterprise system called an AI query engine, in which all the company’s data is centralized in one location from which agents can pull in order to intelligently answer any query by its workers.
This piece lays out 5 different capability levels for AI agents, ranging from chat interfaces to entire enterprises built around high-level agentic management, of which we are currently only at the 3rd.
The networking equipment company foresees a future in which a heterogeneous ecosystem of agents from different companies are interconnected via common communication protocols; in other words, an agentic version of the current internet paradigm.
🧪 RESEARCH

Source: Created by the author using Dall-E 3
With LLMs blowing through one supposedly difficult benchmark after another, a group of researchers at the Center for AI Safety and Scale AI assembled Humanity’s Last Exam, a multi-modal eval with over 3,000 extremely challenging expert knowledge and reasoning questions. The benchmark succeeds at stumping today’s state-of-the-art models on most of its questions—at least for now.
Unlike many computer use agents which are simply existing multimodal LLMs repurposed to reason based on computer screenshots, UI-TARS is an entirely new model trained specifically to operate on computer UIs, outperforming Claude and GPT-4o on GUI agent tasks.
MedAgentBench is a new evaluation designed to assess LLMs’ capabilities in question-answering based on electronic medical records.
Thanks for reading! Until next time, keep learning and building!
What did you think of today's issue? |
If you have any specific feedback, just reply to this email—we’d love to hear from you
Follow us on X (Twitter), LinkedIn, and Instagram