Claude gets function calling

Building agents with Anthropic's models just got significantly easier with the general availability of tool use for the Claude API

🔍 Spotlight

On Thursday, large language model (LLM) provider Anthropic announced the general availability of function calling, also known as tool use, for its Claude family of models. A beta version of function calling has been in testing with a select pool of customers since April, but its full rollout brings Claude’s capabilities into line with rival OpenAI’s GPT family, which has supported it since June of last year. Tool use enables models to pass commands and data to specialized functions which carry out specific actions, such as retrieving information from databases, performing calculations, and sending instructions to other software programs.

The ability to interact with the outside world via function calls is a core component of LLM agents. Unlike traditional LLM usage, in which the model simply acts as a generator of text, agentic workflows involve the ability to exchange commands and data with other systems, facilitating the use of language models as reasoning engines for intelligent automation of useful tasks. In their blog post announcing the upgrade, Anthropic provided examples of agentic use cases that function calling had enabled for customers, including AI tutors, browser automation, and document parsing. Anthropic’s announcement is a significant leap forwards for those building agents with Claude.

đź“° News

McKinsey & Company, one of the Big Three consulting firms, announced a new collaboration with Microsoft to build AI agents using Microsoft’s newly announced Copilot Studio platform. McKinsey estimates that agents and other AI technologies could automate up to 70% of the work hours in today’s global economy.

Maven AGI launched out of stealth mode with $28 million in funding to support the development of its generative AI-based customer support agents. Maven’s technology has reportedly already achieved impressive results for several major enterprise customers.

GaiaNet announced a $10 million seed funding round. The company aims to build agents running on a decentralized network of edge nodes for use cases such as education, including a partnership with the University of California, Berkeley inked earlier this year.

đź§Ş Research

Obtaining sufficient examples of multi-step task completion for fine-tuning remains a challenge for web browsing agents. In this paper, the authors find that tuning LLMs on synthetic training data generated by the models themselves can improve performance on the WebArena benchmark.

This paper introduces CodeAct, a framework for enhancing LLM agents by allowing them to generate and execute Python code. The authors find that this improves model performance on multiple benchmarks over existing text- and JSON-based agents. They also train CodeActAgent, a fine-tuned LLM tailored to CodeAct and integrated with a Python interpreter.

A variety of methods have been proposed and tested for LLM agents to plan ahead when carrying out tasks, with most focusing on generating a series of actions to take and reflecting afterwards on any mistakes made. This paper adds anticipatory reflection, or predicting possible failure modes (the “devil’s advocate”), finding that it significantly improves results on WebArena over prior methods.

The authors of this paper propose Captain Agent, a multi-agent strategy implemented using AutoGen. Unlike many existing multi-agent methods which employ a fixed team of agents, Captain Agent has the ability to dynamically scale the team as needed by delegating subtasks to different groups of agents.

Many visual LLMs still struggle to answer queries about complex visual scenes. This paper introduces MMCTAgent, a framework that allows multimodal LLMs to employ an agentic workflow featuring problem decomposition and multi-step reasoning to solve visual queries.

🛠️ Useful stuff

A short intro course on the AutoGen multi-agent framework taught by its creators Chi Wang and Qingyun Wu, and hosted by Andrew Ng’s DeepLearning.AI platform.

Agentic Security is an open-source library for stress testing LLM-based applications via adversarial prompts.

đź’ˇ Analysis

Voice agents capable of having rich conversations with users in real-time are on the rise, particularly after OpenAI and Google’s recent annoucements of GPT-4o and Project Astra, respectively. This piece by venture capital firm Andreessen Horowitz explores the promise and challenges of agents in customer service applications, including business-to-business (B2B) and business-to-consumer (B2C) interactions.

Another discussion of the use of AI agents for customer service—in this case, a transcript of an interview with a pair of McKinsey partners.

A pessimistic take on the current state of agent capabilities, in which the author argues that existing LLMs are not up to the job of carrying out complex processes autonomously, and use cases should be restricted to automating simple tasks.