The return of AutoGPT

The agent framework that started it all gets an update, Salesforce releases agent LLMs, and more

🔍 Spotlight

If AI agents were aircraft, then AutoGPT would be the Wright Flyer—the brilliant, harebrained contraption that provided the first tantalizing glimpse of great things to come.

Released on March 30, 2023, just two weeks after GPT-4, the first large language model (LLM) marginally capable of handling complex agentic tasks, AutoGPT was the original LLM agent, allowing users to give it a job to perform and using carefully architected recursive LLM calls to (attempt to) carry it out. It was an overnight sensation, becoming the top trending repository on GitHub and garnering significant attention in the tech community for its creator, Toran Bruce Richards and his company Significant Gravitas.

AutoGPT even raised eyebrows in the wider world for being used to create ChaosGPT, a malicious agent tasked by a cheeky creator with manipulating and destroying humanity. ChaosGPT, however, never got far, limiting its destructive activities to insulting humanity on Twitter. In this, it was similar to many of the other early uses AutoGPT was put to—it was a fun toy, but not particularly good at accomplishing genuinely useful tasks. Nevertheless, it provided the first critical proof-of-concept that LLMs could act as agents if wrapped in the proper architecture, leading to a host of new agent frameworks such as AutoGen and CrewAI which have since largely eclipsed AutoGPT.

Now, Richards and the AutoGPT team are back with a major update, adding a frontend and backend, respectively called AutoGPT Builder and AutoGPT Server. Server, the core of the new system, allows users to structure agents as “blocks”—modular components which have inputs, outputs, and a function that they apply to transform the former into the latter. To make this process easier, the new AutoGPT now offers a low-code interface in which blocks are rendered as visual objects which can be connected together to form workflows.

This scheme reflects AutoGPT converging on an interface which has become ubiquitous for the wave of low-code agent frameworks which have become the rage in the last several months, where agent components are broken into blocks which can be assembled into larger systems via a drag-and-drop interface. Popular platforms Flowise and Langflow both use a nearly identical setup, with Microsoft’s AutoGen reportedly working on one as well, and more appearing regularly—see Laminar and TribeAI in the Useful stuff section.

Although AutoGPT is no longer leading the pack but instead playing catch-up, the new update demonstrates that the venerable framework that launched the agent field still has some life in it.

đź“° News

Google has released Project Oscar, a platform which provides assistant agents to help open-source software projects track issues and bugs. The developers say Oscar is aimed at automating menial tasks while leaving the more enjoyable coding to humans.

Thoughtful AI, an Austin-based startup building AI agents specializing in healthcare revenue management, raised $20 million in a Series A funding round. The new capital will support development of the company’s agent platform, which performs claims processing, patient eligibility verification, and payment posting for healthcare providers.

Salesforce added to its Einstein AI platform with a new Service Agent dedicated to helping enterprises with customer service tasks.

đź§Ş Research

Automation of data science tasks such as interacting with data platform GUIs is a promising application of LLM agents using multimodal LLMs. However, this paper casts a critical eye on the state of the field, proposing a benchmark of visual tasks that current state-of-the-art agents struggle to solve.

The authors of this paper propose AIOpsLab, a standardized, modular framework for using AI agents as engineers capable of diagnosing and fixing faults in cloud compute platforms.

As the use-cases for LLM agents have exploded, so too have concerns about their vulnerability to adversarial attack. This paper introduces a new method of slipping malicious instructions into agents’ long-term memory which can later be triggered by a hacker using a specific instruction, finding that it is highly effective against common agents in critical domains such as autonomous driving and healthcare.

🛠️ Useful stuff

Salesforce AI Research has open-sourced a suite of small LLMs, with parameter counts ranging from 1 billion to 7 billion, optimized for function calling and explicitly intended for use in agent systems.

YC-funded startup Laminar has launched, offering customers the ability to render LLM agents as graphs, export them for production, and perform validation and orchestration.

The popular low-code agent platform Langflow has been updated to add a host of new features, most notably integration of agent framework CrewAI.

Tools which allow users to build agents with little or no code writing abound, such as Flowise, AutoGen Studio, and the aforementioned Langflow and newcomer Laminar. Open-source project TribeAI represents the latest entrant to this increasingly crowded field.

đź’ˇ Analysis

The influential 2023 article The Rise of the AI Engineer introduced the concept of the AI Engineer, who builds intelligent workflows on top of large language models and other generative AI tools. Meet the AI Agent Engineer proposes the existence of a subtype—AI Agent Engineers—who specialize in creating agentic systems.

Web agents which navigate browser UIs to accomplish internet tasks are a major use-case in the agent field, but their performance on benchmarks such as WebArena is shaky at best. The authors of this post do a deep analysis of their failure modes and find that many are readily correctible.

Since the first LLM-based agents were developed in the spring of 2023, innumerable startups have been founded to commercialize them. This article features interviews with the founders of five of them, with each giving their perspectives on the present and future of the agent business.

This piece addresses eight of the most difficult challenges and failure modes holding back agent development, from high cost to infinite loops to poor reliability, and proposes easy-to-implement solutions.