- Building AI Agents
- Posts
- OpenAI's operator loses to an obscure competitor
OpenAI's operator loses to an obscure competitor
Plus: build web-browsing agents with just a description, a free course on evaluating AI agents, and more

Welcome back to Building AI Agents, your biweekly guide to everything new in the AI agent field!
Today I was sent the following cool demo:
Two AI agents on a phone call realize they’re both AI and switch to a superior audio signal ggwave
— Georgi Gerganov (@ggerganov)
4:11 PM • Feb 24, 2025
A very cool development for those of us who are nostalgic for the sound of dial-up internet. Also possibly for those who are nostalgic for Terminator movies.
In today’s issue…
A universal toolkit for building agents with Llama
Agents have a clear mission—but a hazy business model?
An obscure startup’s browser agent beats OpenAI
Can we trust software engineering agents?
…and more
🔥 IN CASE YOU MISSED IT
Readers’ favorite items from last week
📰 NEWS

Source: Anthropic
Claude 3.7 Sonnet is the company’s latest frontier model, capable of both reasoning and instant response modes, and outperforming all other available models on a benchmark of agentic tool-calling tasks.
Anthropic also released a research preview of Claude Code, an advanced coding agent which it claims can complete over 45 minutes of human work in moments.
YC-backed startup CopyCat launched last week, providing a platform that allows businesses to describe repetitive tasks they want automated in natural language and have an AI agent automatically build a working prototype.
If you find Building AI Agents valuable, forward this email to a friend or colleague!
🛠️ USEFUL STUFF

Source: DeepLearning.AI
This course by Andrew Ng’s DeepLearning.AI education platform shows how to build a system to systematically monitor agent performance with Arize AI’s tech.
Llama Stack is a framework that offers standard building blocks for inference with Meta’s Llama models, agents, tools, retrieval augmented generation (RAG), and more, allowing complex agentic systems to be easily built on top of Llama.
💡 ANALYSIS

Source: VentureBeat
An excellent overview of the browser automation agent space, focusing particularly on a comparison between OpenAI’s Operator and Convergence’s Proxy that comes out badly for OpenAI.
Agentic AI is all the rage in the business world, but its performance is still uneven, this piece argues—and even where it succeeds, it can have unintended consequences for users’ business models.
This article focuses on some of the major hurdles that must be overcome to realize AI agents’ potential, making the case that blockchain technology could potentially address all of them.
A new survey finds significant optimism among car owners about the ability of AI agents to help alleviate the annoyances of car buying and maintenance.
🧪 RESEARCH

Created by the author using Dall-E 3
Trust, not capabilities, is the fundamental limitation on adoption of software engineering agents, the authors of this paper find.
They offer recommendations on how such SWE agents can increase trust, and predict the emergence of a single, unified software engineering agent that performs all the tasks of a human software engineer.
With AI agents capable of acting out specific roles being increasingly used for social science research, this paper provides guidelines, drawn from nearly 1,700 publications on the subject, for designing effective evaluation methods for roleplaying agents.
Thanks for reading! Until next time, keep learning and building!
What did you think of today's issue? |
If you have any specific feedback, just reply to this email—we’d love to hear from you
Follow us on X (Twitter), LinkedIn, and Instagram