Building AI Agents
Posts
OpenAI's operator loses to an obscure competitor

OpenAI's operator loses to an obscure competitor

Plus: build web-browsing agents with just a description, a free course on evaluating AI agents, and more

Michael Cunningham
February 27, 2025

Welcome back to Building AI Agents, your biweekly guide to everything new in the AI agent field!

Today I was sent the following cool demo:
Two AI agents on a phone call realize they’re both AI and switch to a superior audio signal ggwave
— Georgi Gerganov (@ggerganov)
4:11 PM • Feb 24, 2025

A very cool development for those of us who are nostalgic for the sound of dial-up internet. Also possibly for those who are nostalgic for Terminator movies.

In today’s issue…

A universal toolkit for building agents with Llama
Agents have a clear mission—but a hazy business model?
An obscure startup’s browser agent beats OpenAI
Can we trust software engineering agents?

…and more

🔥 IN CASE YOU MISSED IT

Readers’ favorite items from last week

📰 NEWS

Source: Anthropic

Anthropic releases advanced new model and coding agent

Claude 3.7 Sonnet is the company’s latest frontier model, capable of both reasoning and instant response modes, and outperforming all other available models on a benchmark of agentic tool-calling tasks.

Anthropic also released a research preview of Claude Code, an advanced coding agent which it claims can complete over 45 minutes of human work in moments.

CopyCat launches browser agents for businesses

YC-backed startup CopyCat launched last week, providing a platform that allows businesses to describe repetitive tasks they want automated in natural language and have an AI agent automatically build a working prototype.

If you find Building AI Agents valuable, forward this email to a friend or colleague!

🛠️ USEFUL STUFF

Source: DeepLearning.AI

A free course on evaluating AI agents

This course by Andrew Ng’s DeepLearning.AI education platform shows how to build a system to systematically monitor agent performance with Arize AI’s tech.

A universal toolkit for building Llama agents

Llama Stack is a framework that offers standard building blocks for inference with Meta’s Llama models, agents, tools, retrieval augmented generation (RAG), and more, allowing complex agentic systems to be easily built on top of Llama.

💡 ANALYSIS

Source: VentureBeat

Why OpenAI’s Operator loses to an obscure competitor

An excellent overview of the browser automation agent space, focusing particularly on a comparison between OpenAI’s Operator and Convergence’s Proxy that comes out badly for OpenAI.

Agents have a clear mission, but a hazy business model

Agentic AI is all the rage in the business world, but its performance is still uneven, this piece argues—and even where it succeeds, it can have unintended consequences for users’ business models.

3 things we need to fix before AI agents go mainstream

This article focuses on some of the major hurdles that must be overcome to realize AI agents’ potential, making the case that blockchain technology could potentially address all of them.

Car owners are bullish on AI agents

A new survey finds significant optimism among car owners about the ability of AI agents to help alleviate the annoyances of car buying and maintenance.

🧪 RESEARCH

Created by the author using Dall-E 3

Will AI agents dominate software engineering—and can we trust them?

Trust, not capabilities, is the fundamental limitation on adoption of software engineering agents, the authors of this paper find.

They offer recommendations on how such SWE agents can increase trust, and predict the emergence of a single, unified software engineering agent that performs all the tasks of a human software engineer.

Evaluating LLM-based role-playing agents

With AI agents capable of acting out specific roles being increasingly used for social science research, this paper provides guidelines, drawn from nearly 1,700 publications on the subject, for designing effective evaluation methods for roleplaying agents.

Thanks for reading! Until next time, keep learning and building!

What did you think of today's issue?

If you have any specific feedback, just reply to this email—we’d love to hear from you