Building AI Agents
Posts
Amazon raids agent startup Adept for tech and talent

Amazon raids agent startup Adept for tech and talent

Amazon strips the embattled company, Metaculus hosts an agentic forecasting tournament, an AI agent runs for office, and more

July 01, 2024

🔍 Spotlight

Amazon has acquired a substantial portion of the agent-building startup Adept’s talent and licensed their technology, in a deal which may give the troubled company a fresh start or may leave it moribund.

Adept was founded in 2022 by David Luan, former VP of engineering at OpenAI, and Niki Parmar and Ashish Vaswani, two Google alumni who contributed to the legendary Attention is All You Need paper that launched the transformer revolution. The company has sought to build computer control agents based on multimodal LLMs, interacting with desktops and other graphical interfaces to accomplish tasks as humans would.

Adept released several open-source multimodal LLMs as demonstrations of its technology and achieved a valuation of at least $1 billion. However, it has also been dogged by rumors of infighting, culminating in the departure of Parmar and Vaswani. It reportedly faced difficulty raising enough funding to train its expensive bespoke LLMs, forcing it to explore a possible sale to Meta or Microsoft.

The new deal with Amazon entails Luan and a host of other Adept employees jumping ship, reportedly leaving just around 20 of the startup’s 100-odd employees in the rump company. Some of its technology, such as models, agentic data, web iteraction software, and infrastructure will also be licensed to Amazon. The new, leaner Adept will reportedly focus on enabling agentic AI, though possibly not by training large, costly foundation models as previously planned. As AI startup Inflection was effectively strip-mined and left for dead by a similar deal with Microsoft earlier this year, the long-term future of this mission is questionable.

Meanwhile, the employees acquired by Amazon will be joining the company’s AGI organization, where they will form the core of a new AGI Autonomy team, potentially pursuing a similar mission to that of Adept, but now with the substantially greater financial firepower and stability that the tech giant affords.

📰 News

An AI-powered forecasting tournament

The collaborative forecasting site Metaculus is hosting a series of quarterly tournaments in which users can build LLM-based agents to try to predict the likelihood of various global events. The first of the bot-only competitions, beginning July 8, will feature $30,000 in prizes.

Amazon prepares an agent rival to ChatGPT

Amazon is reportedly working on a new AI system called Metis to draw users away from ChatGPT. Unlike its OpenAI rival, however, Metis is expected to incorporate agentic capabilities, acting as a virtual assistant rather than just a chatbot.

An AI agent runs for office in Wyoming

An LLM agent named VIC—short for “Virtually Integrated Citizen”—has been shut down by OpenAI for violating its policy against political campaigning after attempting to run for mayor of Cheyenne, Wyoming. VIC was created by a local resident as a protest against a denied public records request.

🧪 Research

Octo-planner: On-device Language Model for Planner-Action Agents

Small language models (SLMs) hold the promise of reduced computational cost relative to flagship LLMs, even running successfully on edge devices such as mobile phones, but they are often unable to accomplish difficult agentic tasks. Octo-planner is a fine-tuned version of Microsoft’s Phi-3 Mini SLM designed specifically for planning, which performs well and fits successfully into edge devices.

Embodied Question Answering via Multi-LLM Systems

Methods for using multi-agent systems to improve question-answering relative to single agents is an active area of research. This paper uses a single fine-tuned central answer model (CAM) to aggregate the best answer from an ensemble of question-answering agents, finding that it improves performance on a benchmark of embodied household vision tasks without requiring expensive communication between agents.

TravelPlanner: A Benchmark for Real-World Planning with Language Agents

A variety of companies are developing AI agents to plan users’ trips for them, but evaluating their performance is non-trivial. The authors of this paper introduce TravelPlanner, a benchmark for successful agent-based travel planning, finding that existing agents perform quite poorly.

Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding

Tree-of-lens (ToL) is a new method to allow multimodal LLMs to read graphical user interfaces (GUIs) by generating descriptions of regions of different sizes around the point of interest, leveraging a hierarchical tree to provide context and layout information.

💡 Analysis

What is an agent, according to LangChain’s CEO?

Inspired by Andrew Ng’s writing on the subject, and following up on his recent podcast interview, LangChain CEO Harrison Chase tackles the question of how to define “agents”.