The first agent IDE

LangChain releases LangGraph Studio, a company seeks to build universal AI employees, and more

🔍 Spotlight

Code frameworks for building AI agents are a dime a dozen these days, but this week saw the launch of something new: the first integrated development environment (IDE) purpose-built for creating agent systems.

Since the dawn of LLMs’ mainstream popularity at the beginning of 2023, LangChain has been one of the most widely used tools to facilitate their use in complex applications, and LangGraph, its standalone agent orchestration framework, has been a leader in the agent subfield. LangGraph enables users to define agent workflows as connected graphs of components such as LLM calls, databases, human input, and more.

Now, LangGraph has released an open beta of LangGraph Studio, an IDE specifically built around AI agents. Using it, agents can be constructed in code then visualized in graph form with a GUI, which allows developers to run only specific components one at a time, debugging them individually without needing to rerun the entire workflow.

LangGraph studio is currently free to use, requiring only that users sign up for a free LangSmith account. While the wording of LangChain’s announcement hints that the program may be put behind a paywall later in its development cycle, the rapid proliferation of agent frameworks makes it likely that open-source alternatives will emerge relatively soon. The launch of the first agent IDE is a significant indicator of the field’s maturation, as well as the demand for specialized tools to support the growing body of agent developers.

đź“° News

Ema, a startup founded by Google employees that recently emerged from stealth, has raised $36 million in a Series A round, bringing its total fundraising haul to $61 million. Unlike many agent startups which seek to automate a narrow industry or role, Ema has a more ambitious goal of providing AI employees capable of handling a wide range of tasks, from customer support to sales to legal compliance.

In this X thread, the co-founder of AgentOps recaps the impressive range of features launched by agent companies just this week.

Semantic Kernel is a framework for connecting LLMs to various APIs in order to build AI-powered applications. Now, the SDK is releasing autonomous agents as an experimental feature, allowing multi-agent systems to be built and integrated with its workflows.

Lagent is the latest addition to the family of agent frameworks, providing a more minimalistic interface relative to established packages like AutoGen and CrewAI.

Most AI agents are currently powered by frontier LLMs run in large, centralized datacenters by major providers such as OpenAI. Code Metal is a new startup which has raised over $16 million to enable agent workflows to be run on edge computing devices instead.

đź§Ş Research

Fine-tuning of LLMs on successful agent trajectories has shown positive results in improving performance, but collecting such data manually can be labor-intensive. This paper introduces AgentGen, a method for automated generation of novel agent trajectories. Agents employing Llama-3 8B fine-tuned using AgentGen can surpass GPT-3.5 and sometimes even GPT-4 in performance.

Massive Multitask Language Understanding (MMLU) is a popular benchmark for ranking large language models’ capabilities. The authors of this paper propose an analogue for agents—Massive Multitask Agent Understanding (MMAU)—which evaluates agents on several common application domains, such as tool use, coding, and mathematics.

AppWorld is a new framework for evaluating LLM agents’ ability to perform everyday digital tasks in nine commonly-used apps such as Amazon and Venmo. It provides an execution environment for the agents, as well as a benchmark suite of 750 tasks requiring complex, multi-step reasoning and interactive code generation.

Persona agents, which are assigned to take on the personality of a given person or profession in order to play a role, have been the subject of increasing interest in a variety of fields from therapy to video games. PersonaGym provides a new benchmark for testing how well persona agents’ responses to a range of questions conform to their assigned personalities.

🛠️ Useful stuff

The author of this piece implements a simple agent in both AutoGen and LangGraph and compares the pros and cons of each of the two popular agent frameworks.

After unsuccessfully experimenting with Directed Acyclic Graphs (DAGs) for agent systems, LlamaIndex has rolled out a new method of building agents using event-driven workflows. The linked blog post provides an interesting and thoughtful analysis of the drawbacks of rendering agents as DAGs.

SWE-Bench is a leading benchmark for evaluating coding agents such as Devin. This tutorial shows how to test an agent on SWE-Bench using LangSmith.

Microsoft and Stanford University recently launched Trace, a tool for optimizing AI agents similarly to how the underlying neural networks can be optimized using AutoDiff. Now, Trace has been released as an open-source Python library.

đź’ˇ Analysis

There has been a recent boom in AI devices and wearables like the HumaneAiPin and Rabbit R1, while big tech companies have simultaneously demoed assistant agents such as Google’s Project Astra. The two trends are likely to converge, the author of this piece believes, with the imminent launch of agent-integrated wearable devices by companies such as Meta.

A new report by the consulting firm Capgemini expects autonomous AI agents in the workplace to arrive in 2025, with 82% of companies surveyed for the report planning to integrate AI agents into their operations within 1-3 years.