Building AI Agents
Posts
Claude gets function calling

Claude gets function calling

Building agents with Anthropic's models just got significantly easier with the general availability of tool use for the Claude API

June 03, 2024

🔍 Spotlight

On Thursday, large language model (LLM) provider Anthropic announced the general availability of function calling, also known as tool use, for its Claude family of models. A beta version of function calling has been in testing with a select pool of customers since April, but its full rollout brings Claude’s capabilities into line with rival OpenAI’s GPT family, which has supported it since June of last year. Tool use enables models to pass commands and data to specialized functions which carry out specific actions, such as retrieving information from databases, performing calculations, and sending instructions to other software programs.

The ability to interact with the outside world via function calls is a core component of LLM agents. Unlike traditional LLM usage, in which the model simply acts as a generator of text, agentic workflows involve the ability to exchange commands and data with other systems, facilitating the use of language models as reasoning engines for intelligent automation of useful tasks. In their blog post announcing the upgrade, Anthropic provided examples of agentic use cases that function calling had enabled for customers, including AI tutors, browser automation, and document parsing. Anthropic’s announcement is a significant leap forwards for those building agents with Claude.

📰 News

McKinsey and Microsoft team up on agents

McKinsey & Company, one of the Big Three consulting firms, announced a new collaboration with Microsoft to build AI agents using Microsoft’s newly announced Copilot Studio platform. McKinsey estimates that agents and other AI technologies could automate up to 70% of the work hours in today’s global economy.

Maven AGI raises $28 million to build customer support agents

Maven AGI launched out of stealth mode with $28 million in funding to support the development of its generative AI-based customer support agents. Maven’s technology has reportedly already achieved impressive results for several major enterprise customers.

GaiaNet secures $10 million for decentralized, open source agents

GaiaNet announced a $10 million seed funding round. The company aims to build agents running on a decentralized network of edge nodes for use cases such as education, including a partnership with the University of California, Berkeley inked earlier this year.

🧪 Research

Large Language Models Can Self-Improve At Web Agent Tasks

Obtaining sufficient examples of multi-step task completion for fine-tuning remains a challenge for web browsing agents. In this paper, the authors find that tuning LLMs on synthetic training data generated by the models themselves can improve performance on the WebArena benchmark.

Executable Code Actions Elicit Better LLM Agents

This paper introduces CodeAct, a framework for enhancing LLM agents by allowing them to generate and execute Python code. The authors find that this improves model performance on multiple benchmarks over existing text- and JSON-based agents. They also train CodeActAgent, a fine-tuned LLM tailored to CodeAct and integrated with a Python interpreter.

Devil’s Advocate: Anticipatory Reflection for LLM Agents

A variety of methods have been proposed and tested for LLM agents to plan ahead when carrying out tasks, with most focusing on generating a series of actions to take and reflecting afterwards on any mistakes made. This paper adds anticipatory reflection, or predicting possible failure modes (the “devil’s advocate”), finding that it significantly improves results on WebArena over prior methods.

Adaptive In-conversation Team Building for Language Model Agents

The authors of this paper propose Captain Agent, a multi-agent strategy implemented using AutoGen. Unlike many existing multi-agent methods which employ a fixed team of agents, Captain Agent has the ability to dynamically scale the team as needed by delegating subtasks to different groups of agents.

MMCTAgent: Multi-modal Critical Thinking Agent Framework for Complex Visual Reasoning

Many visual LLMs still struggle to answer queries about complex visual scenes. This paper introduces MMCTAgent, a framework that allows multimodal LLMs to employ an agentic workflow featuring problem decomposition and multi-step reasoning to solve visual queries.

🛠️ Useful stuff

AutoGen design course by DeepLearning.AI

A short intro course on the AutoGen multi-agent framework taught by its creators Chi Wang and Qingyun Wu, and hosted by Andrew Ng’s DeepLearning.AI platform.

Stress testing tool for agentic applications

Agentic Security is an open-source library for stress testing LLM-based applications via adversarial prompts.

💡 Analysis

Hi, AI: Our Thesis on AI Voice Agents

Voice agents capable of having rich conversations with users in real-time are on the rise, particularly after OpenAI and Google’s recent annoucements of GPT-4o and Project Astra, respectively. This piece by venture capital firm Andreessen Horowitz explores the promise and challenges of agents in customer service applications, including business-to-business (B2B) and business-to-consumer (B2C) interactions.

The promise and the reality of gen AI agents in the enterprise

Another discussion of the use of AI agents for customer service—in this case, a transcript of an interview with a pair of McKinsey partners.

AI Agents: Hype vs. Reality

A pessimistic take on the current state of agent capabilities, in which the author argues that existing LLMs are not up to the job of carrying out complex processes autonomously, and use cases should be restricted to automating simple tasks.