- Building AI Agents
- Posts
- OpenAI and Google Go All-In on Agents
OpenAI and Google Go All-In on Agents
The dueling announcements of OpenAI's GPT-4o and Google's Astra show the two AI giants' growing focus on agent capabilities

🔍 Spotlight
On May 13, OpenAI announced the release of GPT-4o (Omni), an improved version of GPT-4 with native multimodality, enabling it to receive any combination of text, audio, image, and video inputs, and respond by generating any combination of those formats except video, as well as carry on real-time voice conversations with users. In a live demonstration, GPT-4o interacted with OpenAI researchers, solving math problems, translating between languages, interpreting data charts, and more.
In its annual I/O developer conference the next day, Google unveiled its Project Astra, a similar multimodal LLM assistant capable of processing live voice and video inputs and answering complex questions about the scenes it was being shown, such as interpreting computer code and remembering where it had seen an object. Additionally, the company demoed a host of other agentic products integrating with email, web search, and the Chrome browser.
In a subsequent interview with CNBC, Google CEO Sundar Pichai confirmed that Astra was part of a broader push by Google into the agent space, with Google’s Gemini language models and search both set to increasingly incorporate agentic capabilities such as multi-step reasoning. The two events were widely recognized as heralding a newfound focus on agents by both AI providers.
đź“° News
The International Conference on Learning Representations, a major machine learning conference, hosted a workshop on LLM agents at its annual meeting. The Best Paper Award was claimed by AutoGen, a Python library developed by Microsoft to facilitate multi-agent workflows.
Artisan AI, a UK-founded and San Francisco-based software company, raised $7.3 million to develop its AI agents, called Artisans. Artisan, already backed by Y Combinator, is seeking to upend the enterprise SaaS market by replacing top go-to market software with its custom agent solutions.
The US robotic process automation (RPA) company UiPath invested $35.2 million in Holistic, a French startup building multi-agent AI systems to automate commercial processes. Holistic, founded this year, is already valued at $370 million.
A video game AI startup called Altera raised $9 million in seed funding, including from former Google CEO Eric Schmidt’s First Spark Ventures fund. Altera aims to create game-playing agents that behave in a more human-like manner than the scripted NPCs of today.
đź§Ş Research
A team of researchers at UC Berkeley, UIUC, and NYU, including Meta’s Yann LeCun, demonstrated that reinforcement learning (RL) can be an effective method for fine-tuning visual language models (VLMs) to complete multi-step visual tasks. By iteratively training the VLMs to select the best action to take in a visual game using chain of thought (CoT) reasoning, the researchers found that even small 7 billion parameter LLMs could outperform flagship commercial models such as GPT-4V and Gemini.
Hallucination and sycophancy, or a tendency to agree with whatever perspective is offered by the user, are both common failure modes for large language models. This paper demonstrates that LLM agents assigned to act as people of different nationalities in discussions have difficulty maintaining a consistent persona, and have a tendency to abandon their original views and conform with group consensus.
Researchers developed a sophisticated hospital simulation system, in which LLMs act as patients with respiratory diseases and the doctors assigned to treat them. Using an evolutionary strategy the researchers termed MedAgent-Zero, in which the doctor agents’ prompts were continuously updated based on their experience without changing the model weights, the researchers found that the “doctors” were capable of achieving state-of-the-art performance on the respiratory diseases portion of the MedQA dataset.
🛠️ Useful stuff
A brief introductory course on multi-agent systems by crewAI. Endorsed by Andrew Ng, AI pioneer and proponent of LLM agents.
A Python library which allows users to build LLM-based agents and connect them with tools such as function calling, memory, and databases. It represents a potential alternative to existing frameworks such as AutoGen and LangChain.
A Python SDK to facilitate agent development and production, analogous to MLOps tools such as Weights and Biases and MLflow. According to the developers, it integrates with the major LLM providers’ APIs and agent frameworks including CrewAI, LangChain, and AutoGen.
An LLM-based system which uses an agentic workflow to execute a task given by a user by interfacing with the Win32api on any device running a compatible version of Microsoft Windows.
đź’ˇ Analysis
The legal system is not ready for the arrival of AI agents, the author argues in this first issue of a two-part series. Unreliable systems which act autonomously raise questions about where liability lies when they go astray.
Personal AI assistants which learn every detail of our lives and personalities will soon be ubiquitous. This creates a substantial risk, the author worries, that these assistants will facilitate highly persuasive AI enabled marketing, propaganda, and disinformation.