Our predictions for 2025

Plus: how to build web-browsing agents, upgrading enterprise tech stacks for the agentic era, and more

In partnership with

Welcome back to Building AI Agents, your biweekly guide to everything new in the AI agent field!

I hope you’ve been enjoying the holidays and finding plenty of H100 GPUs in your stocking. Sometimes, though, it’s the simple things that matter the most—and that’s as true of agent frameworks as it is of gifts

In today’s issue…

  • Building AI Agents’ predictions for the new year

  • Alibaba’s new reasoning model

  • Build AI agents that can browse the web

  • Existing enterprise systems aren’t ready for agents

…and more

🔍 SPOTLIGHT

Source: Created by the author using Dall-E 3

2024 was the year AI agents went mainstream.

From the buggy, clunky, expensive passion projects of hackers they began the year as, agents have now been embraced by tech giants, SaaS firms, academic researchers, and more. At the same time, their forms have proliferated, moving beyond text to encompass multiple modalities and becoming more capable of taking actions in the real world.

Here, I give my predictions for the trajectory of the agent field in 2025 as it begins to make its impacts felt outside the AI community.

1) Interest in agents continues to surge

This year has seen an explosion of interest in AI agents, with Google searches rising by an order of magnitude. Popular agent frameworks such as CrewAI have rapidly grown their user base, and agents are becoming a larger fraction of LLM-based applications. More anecdotally, I’ve observed a significant increase in AI agents’ name recognition—while few people even within the AI community recognized the term a year ago, many professionals, including those not directly involved in AI, do so now.

I expect this trend to continue—improving agent capabilities and pressure to reduce costs will drive business leaders to increasingly integrate them into their operations, leading to a virtuous cycle of growing investment leading to rising capabilities, in turn intensifying adoption.

2) The enterprise agent race heats up and SaaS firms pivot to agents

Large tech companies such as Microsoft, Google, and Salesforce are locked in a fierce competition to dominate the lucrative enterprise agent market, with Meta hinting at its own ambitions in the space. Leaders of these corporations have joined the chorus of voices prophesying that agents will displace traditional software-as-a-service (SaaS) due to their greater versatility. Unsurprisingly, SaaS and robotic process automation (RPA) firms are quickly pivoting to agents to avoid being left behind and shifting to outcome-based pricing, a tacit admission that agents will likely reduce headcounts, leading to lower revenue for firms which stick to seat-based pricing.

In 2025, I predict the enterprise agent market will converge towards a similar ecosystem as the cloud services market, with several dominant firms—analogous to Amazon AWS, Microsoft Azure, and Google Cloud Platform—controlling much of the market share, with many smaller providers offering a range of more specialized services.

3) Agents become economic actors

Another significant trend this year was the dawn of agents as autonomous economic actors capable of transacting with humans, rather than simply completing backend tasks. Major companies such as Stripe, as well as a host of startups, are developing integrations to enable AI agents to send and receive payments. In the short run, this technology will likely be used primarily to facilitate simple use-cases such as over-the-phone purchases. In the coming year, however, I expect to see agents begin to be granted more financial autonomy; for instance, the ability to identify human contractors to perform necessary jobs, hire them, and release payment upon completion—though likely with careful human oversight for the time being.

4) Visual and browser agents improve

While the first AI agents such as AutoGPT were text-only, the rise of multimodal LLMs has led to the emergence of a new class: so-called computer control agents, which interface with graphical user interfaces (GUIs) to complete tasks in a similar way to humans, dramatically expanding their range of utility. Some of these agents are limited to web browsers, while others—theoretically—can perform any action available to a PC user. Anthropic, OpenAI, and Google have all released or promised multimodal agent capabilities. In 2025, these agents will become increasingly prominent and begin to take on the character of “digital workers”, the ultimate goal of the agent field.

5) Agents’ mistakes bring them into the public eye

While AI agents have tremendous promise as a labor force multiplier, they remain prone to error, both due to the limitations of LLMs themselves and the agentic systems built around them. It is inevitable that some will make mistakes, and the growing responsibility organizations are entrusting them with will correspondingly increase the magnitude of the fallout. I predict that, in the coming year, at least one error made by an agent will have serious enough ramifications to become a notable news story, bringing AI agents to public attention—in a negative light. This risk, both of the direct damage caused by agents’ failures and of adding to already intense regulatory scrutiny, underscores the critical need for careful architecting and safeguards.

Tomorrow, the AI agent field caps off a year of spectacular growth in capabilities, investment, and mainstream interest. Though the next 365 days promise to continue this frenetic pace, they will do so with an additional facet: the agentic systems planned and promised in 2024 will begin to enter production and make contact with the real world. Although I am confident that they will deliver considerable benefits—otherwise I would not have a newsletter called Building AI Agents—risks also abound, and AI agent builders can accept nothing less than excellence.

2025 will be a wild ride.

If you find Building AI Agents valuable, forward this email to a friend or colleague!

🤝 WITH WRITER

Writer RAG tool: build production-ready RAG apps in minutes

RAG in just a few lines of code? We’ve launched a predefined RAG tool on our developer platform, making it easy to bring your data into a Knowledge Graph and interact with it with AI. With a single API call, writer LLMs will intelligently call the RAG tool to chat with your data.

Integrated into Writer’s full-stack platform, it eliminates the need for complex vendor RAG setups, making it quick to build scalable, highly accurate AI workflows just by passing a graph ID of your data as a parameter to your RAG tool.

📰 NEWS

Source: Alibaba Cloud

The team behind Qwen, Alibaba’s family of LLMs, has continued China’s catch-up in reasoning models with QVQ, a highly performant multimodal reasoning LLM built on Alibaba’s existing Qwen2-VL-72B model. China’s other premier LLM shop, DeepSeek, launched its own DeepSeek-V3 model the next day, though without native multistep reasoning capabilities.

The speech synthesis startup launched its Flash model, advertising speech generation at an ultra-low latency of 75 ms, intended to facilitate applications such as real-time phone agents.

🛠️ USEFUL STUFF

Source: GitHub

A library which allows users to build multimodal agents that browse the web and take actions by recognizing GUI elements.

A set of guides and examples for using Google’s new Gemini 2.0 Flash model to build LLM-based systems.

rtrvr.ai is a Chrome extension intended to perform complex tasks in-browser in response to natural language queries.

💡 ANALYSIS

The yellow brick road to enterprise AI | Source: SiliconANGLE

A deep-dive guide to the challenges companies will face in integrating AI agents into their operations, which argues that the existing enterprise software stack will have to be radically transformed to make data accessible to agents in a way that allows them to fulfill their potential.

Wayne Hamadi, one of the original engineers of the first modern AI agent, AutoGPT, tells how one pernicious bug nearly crippled the project, and the lesson it should teach today’s agent builders.

With AI agents showing increasing capabilities as autonomous hackers, the author of this piece discusses the risks they pose to the web and one startup’s efforts to build systems which resist agentic attacks.

An overview of the inroads AI agents have made into the cryptocurrency ecosystem, predicting that 2025 will see rapid growth at the intersection of the two technologies.

🧪 RESEARCH

Training paradigm for Aguvis | Source: arXiv

The authors of this paper introduce Aguvis, a unified agentic system built on a custom visual LLM with planning and reasoning, designed to act as an autonomous GUI-navigating agent.

This paper introduces a framework for using LLMs to autonomously make and test improvements to agentic systems.

Thanks for reading! Until next time, keep learning and building!

What did you think of today's issue?

Login or Subscribe to participate in polls.

If you have any specific feedback, just reply to this email—we’d love to hear from you

Follow us on X (Twitter), LinkedIn, and Instagram