China catches up

Plus: how to choose the right agent framework, a deep dive on visual agents, and more

Welcome back to Building AI Agents, your biweekly guide to everything new in the AI agent field!

Building AI agents is actually the easy part. Making them interface with your company’s thousand pieces of legacy software that don’t talk to each other, not so easy.

In today’s issue…

  • Chinese LLMs spark excitement—and concern

  • How to choose the right agent framework

  • Will vertical-specific agents consume SaaS?

  • A deep dive on the new wave of visual agents

…and more

🔍 SPOTLIGHT

Created by the author using Dall-E 3

The dragon has long been a symbol of China, and in the Chinese calendar, 2024 is the Year of the Dragon. In the world of AI, however, the Year of the Dragon may have just begun.

In September, OpenAI released a preview of its new o1 large language model (LLM), which achieved an impressive performance increase over existing models by generating outputs via a multi-step reasoning process. Unlike the company’s prior launches, o1 was not quickly imitated by its large US rivals, though there are reports that Google may be working on their own version.

Chinese companies, on the other hand, have been hot on OpenAI’s heels. On November 20, Hangzhou-based lab DeepSeek announced its own reasoning model, DeepSeek-R1, with a technical report that claimed high performance metrics—sometimes eclipsing o1—on multiple benchmarks requiring complex reasoning such as MATH and Codeforces. The next day, Chinese tech giant Alibaba open-sourced Marco-o1, which combines Monte Carlo Tree Search (MCTS) with chain-of-thought reasoning to solve sophisticated problems. Last Thursday, Alibaba followed up with QwQ-32B-Preview, the latest in its flagship Qwen series of models, widely considered to be the best Chinese LLMs. Of particular interest to AI agent builders, QwQ is open-source under a license which permits commercial applications.

Up to this point, most agentic applications, particularly in the West, have been powered by proprietary models developed by OpenAI, Anthropic, or Google, or by Meta’s open-source Llama series, and commercial agents have been dominated by US tech firms such as Microsoft and Salesforce. Now, however, many builders are turning to new Chinese LLMs due to their lower cost and superior coding performance. Seeking to capitalize on this new surge in agentic uses for its models, Alibaba released Qwen-Agent, a framework for developing multi-agent systems using the Qwen series. Although relatively simple, Qwen-Agent facilitates tool calling, code writing, RAG, and internet browsing via Chrome, making it suitable for a wide range of applications.

The rising use of Chinese LLMs in important agentic systems raises concerns that these models may themselves be agents of a different sort. Research has shown that language models can act as “sleeper agents”, behaving normally under most conditions but activating deceptive or malicious behavior in response to a pre-set keyword. The tight control that the Chinese government exerts over domestic LLMs is illustrated by QwQ’s responses to sensitive questions—it refuses to address the 1989 Tiananmen Square massacre, and declares that Taiwan is a part of China, making it reasonable to speculate that more sinister exploits may be buried in its weights.

Western AI builders would be wise stay on their toes, or every year may soon be the Year of the Dragon.

If you find Building AI Agents valuable, forward this email to a friend or colleague!

Refer one person and we’ll send you a set of weekly bonus links to additional AI agent news for a month! See here for a sample

🤝 WITH SYNTHFLOW

Build Smarter, Faster: AI Voice Agents for Every Industry

Dream of a calling assistant that works tirelessly, taking calls 24/7 and managing tasks like real-time booking and lead qualification? With Synthflow’s collection of AI Agent templates, tailored to industries such as real estate and healthcare, you can launch your assistant fast. Plus, you can customize and publish your own templates, opening the door to earning commissions while helping others get started!

📰 NEWS

Source: Wikipedia

The prestigious conference on AI and machine learning will hold a panel and networking poster session on AI agents, agentic systems, and more at its annual meeting in Vancouver on December 12.

The two companies will collaborate in using Google Cloud’s Vertex agent platform to develop an industrial Internet-of-Things (IoT) powered by AI agents.

🛠️ USEFUL STUFF

This piece gives a detailed breakdown of the pros and cons of 3 major agent frameworks—CrewAI, LangGraph, and Swarm—and the best use cases for each.

A quick demo by AgentOps co-founder Alex Reibman on using the company’s AgentStack and AgentOps software to build and monitor a simple agent.

Agentic Interfaces and Agent Graph System are two new technologies by agent startup xpander.ai which dramatically improve LLM performance when calling APIs and tools relative to vanilla agents.

f1 is an AI system by LLM inference startup Fireworks AI, which uses a compound system built entirely on open LLMs to outperform flagship models by OpenAI, Anthropic, and Meta. The agent can be accessed as a free preview via Fireworks’ platform.

Open Canvas is a web app that gives users agentic assistance in writing and editing documents, intended as an improved, open-source rival to OpenAI’s Canvas.

💡 ANALYSIS

Created by the author using Dall-E 3

The author argues that vertical agents—ones built to tackle specific sectors—will turn the existing software-as-a-service (SaaS) industry on its head, potentially spawning innumerable billion-dollar startups.

The TED conference hosted a panel on AI agents with personnel from major industry players such as OpenAI, MultiOn, and AgentOps, of which a 5-minute highlight reel is available.

An exposition on Anthropic’s new Model Context Protocol and its significance for the AI agent ecosystem, drawing parallels to the SOA protocols which revolutionized communications within enterprise software systems.

Executives from a wide range of companies offer their perspectives on the risks and challenges posed by agentic AI.

🧪 RESEARCH

Source: arXiv

This paper provides a comprehensive roundup of the new wave of visual agents, including their history, present capabilities, emerging applications, obstacles, and more.

The authors of this study introduce a unified benchmark for evaluating LLMs on embodied decision-making tasks and a software framework for testing models’ performance on it.

Thanks for reading! Until next time, keep learning and building!

What did you think of today's issue?

Login or Subscribe to participate in polls.

If you have any specific feedback, just reply to this email—we’d love to hear from you

Follow us on X (Twitter), LinkedIn, and Instagram