Building AI Agents
Posts
Agents that build agents

Agents that build agents

A new paper paves the way for agents to design themselves, Salesforce goes "agent-first", and more

August 26, 2024

📝 Reader survey

Thanks for being a loyal reader of Building AI Agents! To help improve the newsletter, I'd love to get your feedback with a brief, 5-question survey on what content you like and what changes you might want to see. Your input would be incredibly valuable!

🔍 Spotlight

Building AI Agents is a newsletter written by humans for humans, but the latter part of that formulation may have to be revised: will the builders of AI agents soon be…AI agents?

In a new paper, researchers at the University of British Columbia, the Vector Institute, and the Canadian Institute for Advanced Research (CIFAR) introduce the field of Autonomous Design of Agentic Systems (ADAS), which seeks to automate the creation of new AI agent architectures by assigning it to large language models (LLMs)—in effect, allowing AI agents to build other agents. Noting that the prevailing trend in machine learning has been towards algorithms discovered autonomously rather than hand-coded, they propose Meta Agent Search, in which a “meta agent” iteratively creates novel agents, evaluates them, adds them to an archive, and uses the archived agents as the basis for future agent designs. The meta agent utilizes a relatively simple algorithm, implemented within just 100 lines of code, but derives its true power from its ability to theoretically create any possible agent architecture, as it implements them using Python, which is Turing-complete. Through this method, they find that meta agents are capable of creating progeny which can beat state-of-the-art hand-designed solutions on multiple reasoning benchmarks, even after the meta agents are assigned to new domains or are switched to a different LLM.

Meta Agent Search represents the first significant and successful effort to allow AI agents to build their own kind, though it did not originate the concept—in our August 12th issue, Building AI Agents featured LangGraph Engineer, a prototype system to create a LangGraph application based on a specification provided by the user. However, unlike Meta Agent Search, it does not fill in the logic of the application, merely a potential structure. The authors cite other precedents—all notably published in the last 12 months—but point out that none allow the creator agent full control over the prompts, tools, and control flow of its creations in a freeform environment as Meta Agent Search does.

Self-improving AI has long been a trope of science fiction, but the recent surge in AI capabilities marked by LLMs has led some to speculate that it could occur in reality, leading to a chain reaction with potentially disastrous consequences. The authors of the paper address this concern directly, stating that they are merely informing the world of the fact that ADAS algorithms are possible with today’s technology, and that they hope their work will catalyze research into making them safe. Regardless of the ultimate uses ADAS is put to, its introduction represents a watershed moment in the development of agentic systems.

📰 News

Salesforce becomes an “agent-first” company

Following up on its launch of the Einstein Service Agent last month, Salesforce announced two additional software agents to coach salespeople via roleplay and assist them with sales development tasks, respectively. CEO Marc Benioff emphasized this focus on agentic AI by declaring that Salesforce was being reimagined as an “agent-first” enterprise software company.

AutoGen creator Chi Wang leaves Microsoft

Chi Wang, creator of the popular agent framework AutoGen, announced that he would be departing Microsoft after 15 years. He will continue to work on AutoGen in the new GitHub organization autogen-ai, though it is not clear if this will be his primary project.

AI agents can engage in blackmail

AI safety researcher Simon Lermen demonstrated that a jailbroken version of the open-source LLM Command R+ could formulate a plan to blackmail a target, find potentially incriminating information through a web search, and write an email demanding money.

🧪 Research

MegaAgent: A Practical Framework for Autonomous Cooperation in Large-Scale LLM Agent Systems

Cutting-edge agent systems are moving from static teams of agents with pre-set workflows to dynamic ones in which new agents can be automatically instantiated and given specialized functions to perform. MegaAgent follows this trend by allowing a “boss” agent to break a task down into sub-tasks and deputize “admin” agents to generate groups of worker agents to carry them out.

BLADE: Benchmarking Language Model Agents for Data-Driven Science

Automation of scientific discovery via LLM agents is the subject of intense research and excitement, but it is unclear to what extent current agents can contribute original scientific analyses. This paper proposes a new benchmark—BLADE—to evaluate these capabilities, finding that existing LLM’s analyses are basic and often conflict with the ground truth.

Language Agents as Optimizable Graphs

The observation that agentic workflows can be conceptualized as computational graphs is the basis of many agent frameworks such as LangGraph and Flowise. The authors of this paper—including legendary AI researcher Jürgen Schmidhuber—formalize this by rendering agents as optimizable graphs, finding that it improves performance across multiple benchmarks.

🛠️ Useful stuff

A podcast with the founder of LangChain

Harrison Chase, founder and co-CEO of LangChain, discusses the definition of AI agents, as well as the art, challenges, and opportunities of building them.

Fine-Tuning Small Language Models for AI Agent Tool Utilization

This article walks through a fine-tuning of Microsoft’s small language model Phi-2 for function calling on a low-end GPU using LoRA, demonstrating that the creation and hosting of customized language models for agentic applications is possible even with limited hardware.

An open-source platform for evaluating agent capabilities

Model Evaluation and Threat Research (METR), an organization dedicated to AI safety research, open-sourced their agent evaluation platform Vivaria. The software allows users to run arbitrary agents within a controlled environment, save the run results for later analysis, and annotate them.