Building AI Agents
Posts
Big Tech is now all-in on agents

Big Tech is now all-in on agents

Plus: a $1 million prize for agentic software engineers, how DeepMind is building the next generation of agents, and more

Michael Cunningham
December 16, 2024

In partnership with

Welcome back to Building AI Agents, your biweekly guide to everything new in the AI agent field!

Asking an LLM to sanity-check its own output before sending it to the user is always a good idea. Raw outputs, even from the best models, can be…quirky

Don't try to joke with ChatGPT; sometimes the responses can be wild.
Source : r/ChatGPT
— AshutoshShrivastava (@ai_for_success)
5:46 AM • Dec 14, 2024

In today’s issue…

A roundup of Big Tech’s 2024 agent race
$1 million for the best software engineering agent
DeepMind’s Oriol Vinayls on the next generation of agents
Building generalist embodied agents with multimodal LLMs

…and more

🔍 SPOTLIGHT

Source: PickPik

If the AI agent race was a poker game, the latter half of 2024 would consist of one tech giant after another looking at the table and saying “all in”.

With Google’s launch of Gemini 2.0 last week leaning heavily on agentic AI and coming accompanied by a plethora of new agents, nearly every Big Tech company—and many of the prominent large language model (LLM) startups—have now made significant investments in AI agents.

As labor costs represent the majority of most companies’ spending, the most immediately valuable use that many see for agentic technology is reducing the number of human hours worked. Consequently, most of the leading technology companies have launched their own enterprise agent platforms, which provide agents as a service to save workers time—or replace them entirely. Microsoft Copilot, released in 2023 to take advantage of OpenAI’s cutting-edge models, has now evolved into Copilot Agents, which integrate into the Microsoft 365 enterprise suite. The company is currently locked in a fierce competition with Salesforce, which has arguably gone further than any of its rivals in centering its entire business around agents with its Agentforce platform. Meanwhile, after testing the waters with Vertex AI Agent Builder, Google launched its own contender for the enterprise AI crown with Agentspace just last Friday—more on that in News. Amazon and IBM, though less assertive than their competitors so far, provide Bedrock Agents and watsonx.ai, respectively, allowing customers to build agentic applications and integrate them with their systems.

Most of these platforms are low-code/no-code, providing high-level, abstracted interfaces, and all are proprietary, preventing agent builders from freely using them outside of their providers’ walled gardens, or remixing them into new software with modified capabilities. Alone among the tech giants, Microsoft has also made an open-source play with its AutoGen multi-agent framework, recently forked into AG2 by several members of its founding team who departed the company. OpenAI has released its own open-source option, Swarm, but it is intended as an experimental educational tool rather than for full production.

Though the agent contest within Big Tech has been in full swing for the better part of a year now, the past several months have seen the rapid rise of a particularly powerful subtype: multimodal computer use agents. While most AI agents interact with the world purely through text, there is growing interest in endowing them with the ability to directly control computers via graphical user interfaces as humans do, bringing them closer to the Holy Grail of fully autonomous workers. Anthropic kicked off the recent sprint for preeminence in visual agents with its Computer Use announcement in October, followed quickly by a (possibly intentional) leak from OpenAI that it would release a similar system in January called Operator. Not to be outdone, Google responded last week with Project Mariner, which leverages the company’s ubiquitous Chrome browser.

The only large tech companies not to have made significant public announcements of agentic capabilities are Meta and Apple, though with the former launching a business-to-business AI team and the latter’s new push into AI with Apple Intelligence, this may soon change. Elon Musk’s xAI, which has rapidly grown to prominence with its increasingly capable Grok models, remains a significant wild card as well.

2024 will be remembered as the year in which the tech world bet big on AI agents, with trillions of dollars in enterprise value potentially on the line. In 2025, we may find out who will go home with the pot, and the stakes could not be higher.

If you find Building AI Agents valuable, forward this email to a friend or colleague!

🤝 WITH SYNTHFLOW

Create, Publish & Earn with Synthflow AI Voice Agents Marketplace

Discover templates for routine/repetitive tasks like lead qualification and managing appointments.
Publish your own Voice AI solutions to help businesses thrive—and earn commissions.
Access custom actions that automate CRM updates, appointment scheduling, and more.

Build Your AI Agent Now

📰 NEWS

Source: Google Cloud

Google launches Agentspace for enterprise AI

Following major competitors such as Microsoft and Salesforce into the enterprise agent game, Google released its own platform for building agents which automate business tasks and integrate into a wide range of third-party software.

CrewAI and Cloudera team up

Data warehouse provider Cloudera is integrating agent framework CrewAI into its platform, enabling its customers to build sophisticated automations on top of their data.

Lyzr releases Agent Studio

Enterprise agent platform Lyzr launched its Agent Studio, providing another option for creating, testing, and deploying agents for business processes.

🛠️ USEFUL STUFF

Source: Kaggle

A $1 million prize for software engineering agents

Databricks and Perplexity co-founder Andy Konwinski is sponsoring a Kaggle competition with a $1 million prize for the first team to exceed 90% on a new version of favorite software engineering agent benchmark SWE-bench.

How to build a GraphRAG agent with Neo4j

A tutorial by leading graph database provider Neo4j demonstrating how to build an agent capable of answering questions using data obtained from both knowledge graphs and vector databases, doubling the power of traditional RAG.

Build real-time conversational agents with TEN Agent

TEN Agent is a framework for quickly building agents which take in multimodal data such as voice, video, and images, and respond in real-time.

A virtual workstation for agents

AgentStation provides an API that gives agents access to many of the same tools that human knowledge workers use, such as web browsers, code execution, Zoom calls, and more.

Bring humans into the loop with interrupt

Interrupt is a new feature in LangGraph which enables users to provide asynchronous inputs to running agents at any time rather than using Python’s built-in input function, which comes with a host of problems.

💡 ANALYSIS

Oriol Vinayls with interviewer Hannah Fry | Source: YouTube

DeepMind’s Oriol Vinayls on LLM agents

A longform interview with legendary AI researcher and Gemini co-Tech Lead Oriol Vinayls, in which he describes the work DeepMind is doing to move past the narrow agents of yesteryear towards more generally capable ones powered by LLMs.

(Another) 2024 State of AI Agents report

Following in the footsteps of LangChain and Menlo Ventures, Langbase surveyed 3,400 agent builders and stakeholders—half of them in C-suite roles—for its report, identifying their agentic needs, challenges, and concerns.

Andrew Ng explores the rise of AI agents

A talk by AI researcher and educator Andrew Ng on agents: what they are, how they work, and why they are the most important emerging trend in AI.

What needs to happen for agents to take off

Ece Kamar of Microsoft’s AI Frontiers Lab addresses the challenges that must be overcome to make AI agents trustworthy, productive components of organizations.

🧪 RESEARCH

Some of the tasks accessible by embodied agents | Source: arXiv

From multimodal LLMs to Generalist Embodied Agents

This paper provides a process for turning multimodal LLMs into Generalist Embodied Agents (GEAs) capable of successfully solving problems within a variety of visual and physical environments.

DroidSpeak: enhancing cross-LLM communication

DroidSpeak is a newly-proposed framework for allowing fine-tuned LLMs to share context directly with each other using intermediate outputs of their layers.

Thanks for reading! Until next time, keep learning and building!

What did you think of today's issue?

If you have any specific feedback, just reply to this email—we’d love to hear from you