Building AI Agents
Posts
AI agent teams can find and exploit software vulnerabilities

AI agent teams can find and exploit software vulnerabilities

New research highlights both the power and risks of LLM-based agents

June 10, 2024

🔍 Spotlight

A new study by researchers at the University of Illinois at Urbana-Champaign demonstrated for the first time that AI agents based on large language models (LLMs) can identify and exploit previously undiscovered vulnerabilities in targeted software. This finding offers the disquieting possibility that agents could be used at scale to autonomously attack vulnerable systems, but could also hold the promise of making software more secure via automated penetration testing.

Prior research by the same group had shown that simple agents could exploit known security flaws in websites when given descriptions of the vulnerabilities (“one-day” exploits). However, when asked to find attacks that had not previously been discovered (“zero-day” exploits), they fell flat.

Now, with a more sophisticated multi-agent workflow using a hierarchical planner and manager capable of spawning task-specific expert agents, the researchers discovered that zero-day exploits could be identified with nearly the same effectiveness as a simpler agent with prior knowledge of the vulnerability. Existing open-source vulnerability scanners, by contrast, failed to discover any of the exploits.

Back-of-the-envelope cost calculations by the authors revealed that their method was already cheaper than hiring a human cybersecurity expert to find the vulnerabilities, with the gap expected to widen rapidly as LLM costs fall further. Naturally, this raises the specter of armies of automated hackers continuously probing every piece of public-facing software for weaknesses. Conversely, the proliferation of these vulnerability-finding bots could serve to make software more secure by enabling cheap, automated security audits.

In both cases, however, this new research highlights the significant changes approaching as LLM-based agents continue to improve.

📰 News

Tektonic AI raises $10M to build GenAI agents for automating business operations

Seattle-based startup Tektonic has emerged from stealth with $10 million in seed funding, aiming to use LLM agents to facilitate the automation of business processes.

Snowplow Launches AI Agent Event Collection and Analytics on the Snowflake AI Data Cloud

A new application by behavioral data analytics company Snowplow will allow enterprises to collect analytics on their customer-facing AI agents, tracking their impact on subsequent user behavior.

🧪 Research

AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls

Tends of thousands of web APIs exist, allowing the retrieval of a tremendous quantity and variety of information. This article introduces AnyTool, an self-reflecting agent based on GPT-4 capable of intelligently selecting from a pool of over 16,000 APIs in order to answer user questions.

Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration

There is a growing interest in “computer control agents” which can accomplish tasks using desktop or mobile computer GUIs as a human does, but the necessity of combining sequential image and text input data presents a challenge for single-agent systems. Using a novel multi-agent setup, the authors demonstrate significant improvements in task completion on mobile device control.

Large Language Model-Enabled Multi-Agent Manufacturing Systems

The authors of this paper introduce a multi-agent system to translate natural language instructions into actionable tasks for computer numeric control (CNC) machines.

Soft Self-Consistency Improves Language Model Agents

Self-consistency (SC) is a method for LLMs to select from possible actions or text responses based on majority voting, but it performs poorly on multi-step tasks. This paper proposes a continuous scoring method called “soft self-consistency” (Soft-SC), finding that it leads to improvements in success rates on various agent benchmarks.

On the Effects of Data Scale on Computer Control Agents

The authors demonstrate that the usual method of building effective computer control agents—fine-tuning on curated human demonstration data—can be made more effective simply by scaling up the datasets used for fine-tuning. However, this performance boost does not seem to generalize well to out-of-domain tasks.

🛠️ Useful stuff

Course on AI agents in LangGraph

Another course in an ongoing series by Andrew Ng’s DeepLearning.AI covering many of the major agent frameworks—see also AutoGen and crewAI.

Building AI Agents: Lessons Learned over the past Year

A collection of useful insights on the failure modes and best practices of building agent systems, gleaned from a year of experience by the co-founder of an agent startup.

The agent development life cycle

The traditional software development lifecycle (SDLC), intended for fast and deterministic software, does not work well for the slow and stochastic nature of AI agents, the authors argue. Their new Sierra Agent SDK is intended to facilitate the building and deployment of agent systems in this new paradigm.

💡 Analysis

Agents aren’t all you need

A 10-step agent workflow with a 10% error rate per step will fail 65% of the time, the authors point out in this piece on the challenges of building agent systems. Nevertheless, they give some pointers on how to overcome these challenges, starting with breaking down tasks into defined steps.

The Rise of AI Agent Infrastructure

As agent usage expands rapidly, an infrastructure ecosystem is growing up around them. This article gives an overview of the companies and tools being used by agent builders.

From gen AI 1.5 to 2.0: Moving from RAG to agent systems

Simple LLM conversations were GenAI 1.0, with RAG and vector databases being 1.5. 2.0 will be agent systems, the authors argue in this piece on the progression of LLM use cases.