- Building AI Agents
- Posts
- Big Tech is now all-in on agents
Big Tech is now all-in on agents
Plus: a $1 million prize for agentic software engineers, how DeepMind is building the next generation of agents, and more

Welcome back to Building AI Agents, your biweekly guide to everything new in the AI agent field!
Asking an LLM to sanity-check its own output before sending it to the user is always a good idea. Raw outputs, even from the best models, can be…quirky
Don't try to joke with ChatGPT; sometimes the responses can be wild.
Source : r/ChatGPT— AshutoshShrivastava (@ai_for_success)
5:46 AM • Dec 14, 2024
In today’s issue…
A roundup of Big Tech’s 2024 agent race
$1 million for the best software engineering agent
DeepMind’s Oriol Vinayls on the next generation of agents
Building generalist embodied agents with multimodal LLMs
…and more
🔍 SPOTLIGHT

Source: PickPik
If the AI agent race was a poker game, the latter half of 2024 would consist of one tech giant after another looking at the table and saying “all in”.
With Google’s launch of Gemini 2.0 last week leaning heavily on agentic AI and coming accompanied by a plethora of new agents, nearly every Big Tech company—and many of the prominent large language model (LLM) startups—have now made significant investments in AI agents.
As labor costs represent the majority of most companies’ spending, the most immediately valuable use that many see for agentic technology is reducing the number of human hours worked. Consequently, most of the leading technology companies have launched their own enterprise agent platforms, which provide agents as a service to save workers time—or replace them entirely. Microsoft Copilot, released in 2023 to take advantage of OpenAI’s cutting-edge models, has now evolved into Copilot Agents, which integrate into the Microsoft 365 enterprise suite. The company is currently locked in a fierce competition with Salesforce, which has arguably gone further than any of its rivals in centering its entire business around agents with its Agentforce platform. Meanwhile, after testing the waters with Vertex AI Agent Builder, Google launched its own contender for the enterprise AI crown with Agentspace just last Friday—more on that in News. Amazon and IBM, though less assertive than their competitors so far, provide Bedrock Agents and watsonx.ai, respectively, allowing customers to build agentic applications and integrate them with their systems.
Most of these platforms are low-code/no-code, providing high-level, abstracted interfaces, and all are proprietary, preventing agent builders from freely using them outside of their providers’ walled gardens, or remixing them into new software with modified capabilities. Alone among the tech giants, Microsoft has also made an open-source play with its AutoGen multi-agent framework, recently forked into AG2 by several members of its founding team who departed the company. OpenAI has released its own open-source option, Swarm, but it is intended as an experimental educational tool rather than for full production.
Though the agent contest within Big Tech has been in full swing for the better part of a year now, the past several months have seen the rapid rise of a particularly powerful subtype: multimodal computer use agents. While most AI agents interact with the world purely through text, there is growing interest in endowing them with the ability to directly control computers via graphical user interfaces as humans do, bringing them closer to the Holy Grail of fully autonomous workers. Anthropic kicked off the recent sprint for preeminence in visual agents with its Computer Use announcement in October, followed quickly by a (possibly intentional) leak from OpenAI that it would release a similar system in January called Operator. Not to be outdone, Google responded last week with Project Mariner, which leverages the company’s ubiquitous Chrome browser.
The only large tech companies not to have made significant public announcements of agentic capabilities are Meta and Apple, though with the former launching a business-to-business AI team and the latter’s new push into AI with Apple Intelligence, this may soon change. Elon Musk’s xAI, which has rapidly grown to prominence with its increasingly capable Grok models, remains a significant wild card as well.
2024 will be remembered as the year in which the tech world bet big on AI agents, with trillions of dollars in enterprise value potentially on the line. In 2025, we may find out who will go home with the pot, and the stakes could not be higher.
If you find Building AI Agents valuable, forward this email to a friend or colleague!
🤝 WITH SYNTHFLOW
Create, Publish & Earn with Synthflow AI Voice Agents Marketplace
📰 NEWS

Source: Google Cloud
Following major competitors such as Microsoft and Salesforce into the enterprise agent game, Google released its own platform for building agents which automate business tasks and integrate into a wide range of third-party software.
Data warehouse provider Cloudera is integrating agent framework CrewAI into its platform, enabling its customers to build sophisticated automations on top of their data.
Enterprise agent platform Lyzr launched its Agent Studio, providing another option for creating, testing, and deploying agents for business processes.
🛠️ USEFUL STUFF

Source: Kaggle
Databricks and Perplexity co-founder Andy Konwinski is sponsoring a Kaggle competition with a $1 million prize for the first team to exceed 90% on a new version of favorite software engineering agent benchmark SWE-bench.
A tutorial by leading graph database provider Neo4j demonstrating how to build an agent capable of answering questions using data obtained from both knowledge graphs and vector databases, doubling the power of traditional RAG.
TEN Agent is a framework for quickly building agents which take in multimodal data such as voice, video, and images, and respond in real-time.
AgentStation provides an API that gives agents access to many of the same tools that human knowledge workers use, such as web browsers, code execution, Zoom calls, and more.
Interrupt is a new feature in LangGraph which enables users to provide asynchronous inputs to running agents at any time rather than using Python’s built-in input function, which comes with a host of problems.
💡 ANALYSIS

Oriol Vinayls with interviewer Hannah Fry | Source: YouTube
A longform interview with legendary AI researcher and Gemini co-Tech Lead Oriol Vinayls, in which he describes the work DeepMind is doing to move past the narrow agents of yesteryear towards more generally capable ones powered by LLMs.
Following in the footsteps of LangChain and Menlo Ventures, Langbase surveyed 3,400 agent builders and stakeholders—half of them in C-suite roles—for its report, identifying their agentic needs, challenges, and concerns.
A talk by AI researcher and educator Andrew Ng on agents: what they are, how they work, and why they are the most important emerging trend in AI.
Ece Kamar of Microsoft’s AI Frontiers Lab addresses the challenges that must be overcome to make AI agents trustworthy, productive components of organizations.
🧪 RESEARCH

Some of the tasks accessible by embodied agents | Source: arXiv
This paper provides a process for turning multimodal LLMs into Generalist Embodied Agents (GEAs) capable of successfully solving problems within a variety of visual and physical environments.
DroidSpeak is a newly-proposed framework for allowing fine-tuned LLMs to share context directly with each other using intermediate outputs of their layers.
Thanks for reading! Until next time, keep learning and building!
What did you think of today's issue? |
If you have any specific feedback, just reply to this email—we’d love to hear from you
Follow us on X (Twitter), LinkedIn, and Instagram