- Building AI Agents
- Posts
- Vision agents take off
Vision agents take off
Plus: the crypto world embraces AI agents and Microsoft and Salesforce battle for dominance in agent tech

Welcome back to Building AI Agents, your biweekly guide to everything new in the AI agent field!
With Google working on an AI assistant called Jarvis—see below—and Microsoft already having used Cortana, it’s only a matter of time before we run out of AI agent names to borrow from fiction, forcing someone to use HAL.
In today’s issue…
Computer-controlling agents are here
Agents are the new big thing in crypto
Microsoft—Salesforce agent race heats up
Entrapping agentic hackers
…and more
🔍 SPOTLIGHT

Source: Anthropic via YouTube
Building AI Agents aims to bring the future to your computer screen. Now, the AI agents of the future watch that screen with you.
The past week has seen a surge of interest in what are known as computer control agents—programs which accomplish tasks by taking a live feed of a computer screen as an input and outputting actions such as scrolling, clicking, and keyboarding to accomplish tasks. Such systems promise to go beyond the constraints of many existing LLM-based agents, which can only interact with the world through text interfaces, limiting their use to applications which expose an API.
Agent startup Adept, founded in 2022 by OpenAI and Google alumni, made one of the first significant efforts to build computer control agents, releasing several vision language models capable of screen interaction. However, the company’s fortunes took a turn for the worse and it seems to be dormant after being raided by Amazon, highlighting the difficulties associated with building agents capable of full computer control. Other efforts in the space were slow to gain traction, with Microsoft’s set-of-mark prompting generating some interest but no commercial products.
However, nearly a year later, Microsoft has returned to the space with OmniParser, a more sophisticated method of processing UI screenshots into structured elements on which vision language models can perform actions. Then, weeks later, Anthropic amazed the AI community with “computer use”, a new, publicly-available feature allowing its Claude models to directly control a user’s machine. On Saturday, The Information reported based on inside sources that Google is working on a similar visual agent designed to use the Chrome browser—given the timing, it is tempting to speculate that the leak may have been deliberate and intended to steal some of Anthropic’s thunder. Similar leaks from within OpenAI indicate that the company is working on similar capabilities. Despite Adept’s struggles, other startups are following in its footsteps, with YC-backed Autotab launching a beta for computer control agents as well.
The past weeks and months have seen a wave of innovation in the AI agent field, with industry heavyweights such as Microsoft, Salesforce, Google, and IBM making major announcements. The simultaneous re-emergence of its computer control subfield potentially points to the ultimate long-term destiny of agentic technology—and even of labor itself—in which work is increasingly offloaded to AI agents which can see and act in the digital world just as humans do.
If you find Building AI Agents valuable, forward this email to a friend or colleague!
🤝 PRESENTED BY 1440
Receive Honest News Today
Join over 4 million Americans who start their day with 1440 – your daily digest for unbiased, fact-centric news. From politics to sports, we cover it all by analyzing over 100 sources. Our concise, 5-minute read lands in your inbox each morning at no cost. Experience news without the noise; let 1440 help you make up your own mind. Sign up now and invite your friends and family to be part of the informed.
📰 NEWS

Source: Wikipedia
Brian Armstrong, CEO of cryptocurrency exchange Coinbase, has become a major booster of AI agents, with the company releasing a new tool dubbed Based Agent, allowing users to build an agentic crypto trader. The company’s Coinbase Ventures VC arm has recently shifted significant investment toward AI agent crypto startups, while a bizarre saga involving an agent promoting a cryptocurrency on social media has led to an agentic AI craze in the crypto community.
CrewAI Inc., maker of the eponymous leading AI agent framework, raised $18 million in Series A funding, fresh off its recent partnership with IBM.
🛠️ USEFUL STUFF

Source: CopilotKit
CopilotKit has released its CoAgents software into public beta, acting as a middle layer between AI agents built in LangGraph and user-facing frontends.
The company open-sourced Bee Agent Framework, a new platform for building AI agents written in TypeScript.
💡 ANALYSIS

Source: Flickr
A detailed recap of the intense—and sometimes acrimonious—race between Microsoft and Salesforce to dominate the AI agent field.
An interview with Yohei Nakajima, creator of BabyAGI, one of the first AI agents, on his career and creations, particularly BabyAGI 2o, a recent effort to create a self-building autonomous agent.
This blog post by the LangChain team argues that effectively communicating an LLM’s task to it is the most crucial failure point for agentic applications, and makes the case that the company’s tech allows users to effectively overcome this hurdle.
The Salesforce CEO discusses why he is so bullish on the company’s push into the agent space, gives examples of industries it could potentially transform—and takes shots at his Microsoft rival’s Copilot offerings.
🧪 RESEARCH

Source: arXiv
This paper argues that agent-based information retrieval techniques utilizing LLMs such as agentic RAG represent a paradigm shift over traditional methods used by search engines and other common software.
The authors of this paper created a honeypot to attract hackers and included prompt injections to identify attempts perpetrated by LLMs, finding 6 attacks potentially attempted by AI agents.
Thanks for reading! Until next time, keep learning and building!
If you have any specific feedback, just reply to this email—we’d love to hear from you
Follow us on X (Twitter), LinkedIn, and Instagram