- Building AI Agents
- Posts
- Automated hackers could destroy the web—or save it
Automated hackers could destroy the web—or save it
PLUS: Microsoft’s vision agent model tops leaderboards and dueling opinions on the value of AI agents for businesses

Welcome back to Building AI Agents, your biweekly guide to everything new in the AI agent field!
Unlike certain questionably-built AI agents, this newsletter is never going to give you up, never going to let you down
In today’s issue…
Google’s hacker agent finds hole in billions of systems
The first agent-related model to reach #1 on Hugging Face
Is now really the right time to invest in agentic AI?
Towards “self-driving labs” for biomedical research
…and more
🔍 SPOTLIGHT

Source: Wikimedia Commons
Armies of AI hackers rampaging through the web sounds like science fiction, but—like a lot of sci-fi around artificial intelligence—it is seeming much less fictional these days.
On Friday, Google’s Big Sleep project, a collaboration between its Project Zero cybersecurity research team and Google DeepMind, announced that Big Sleep’s AI agent had discovered a previously unknown vulnerability in SQLite, the world’s most widely used database engine. Given that SQLite is found in billions of electronic devices around the world, including virtually all smartphones, Google’s research raises the sobering possibility that agentic hackers could identify and exploit flaws in innumerable critical software platforms. While the Big Sleep team immediately notified SQLite’s developers, who quietly fixed the bug, other builders of hacker agents may not be so scrupulous.
The threat of automated snooping for software vulnerabilities has been a source of growing concern as large language models have begun to automate a wide range of tasks previously possible only for humans. In April, Meta researchers introduced CyberSecEval 2, a benchmark which included an evaluation for an AI agent’s ability to suggest successful attacks against target code, but found that the tested LLMs’ exploits were seldom successful.
However, in June, the Project Zero team tested a more powerful suite of agents—dubbed Project Naptime—on CyberSecEval 2 and achieved considerable improvements, raising the success rate from less than 30% to as much as 100% for some models. That same month, researchers at the University of Illinois at Urbana-Champaign found that multi-agent hacker teams could identify and exploit known real-world zero-day vulnerabilities.
Now, with Big Sleep’s proof-of-concept that AI agents can find fresh weaknesses in software used by billions of people around the world, the prospect of a wave of attacks enabled by agentic hackers seems very real. Luckily, there is a corollary to this threat—the same systems used by bad actors to wreak havoc on digital infrastructure can also be used by cybersecurity engineers to check it for vulnerabilities, potentially fixing countless bugs before they have the chance to be exploited. At least one YC-backed startup has launched specifically with the mission of creating AI agents to red-team codebases for security gaps.
The likely result will be an arms race between cybersecurity personnel and malicious hackers to build and deploy increasingly sophisticated agents to probe systems for attack vectors. Organizations whose cybersecurity teams adapt to this new world could find their software more secure than ever, tested constantly for the slightest weakness by armies of AI agents. What will happen to those who do not adapt is left to the reader’s imagination.
If you find Building AI Agents valuable, forward this email to a friend or colleague!
🤝 WITH CONVERGENCE AI
Meet your own personal AI Agent, for everything…Proxy
Imagine if you had a digital clone to do your tasks for you. Well, meet Proxy…
Last week, Convergence, the London based AI start-up revealed Proxy to the world, the first general AI Agent.
Users are asking things like “Book my trip to Paris and find a restaurant suitable for an interview” or “Order a grocery delivery for me with a custom weekly meal plan”.
You can train it how you choose, so all Proxy’s are different, and personalised to how you teach it. The more you teach it, the more it learns about your personal work flows and begins to automate them.
📰 NEWS

Source: All Hands
OpenHands (formerly OpenDevin) has set a new record on the automated software engineering evaluation SWE-Bench at 53%. Its former namesake Devin gave a state-of-the-art performance of 13.86% just 7 months ago.
After several major funding rounds, startup SpotAI announced the launch of Video AI Agents, which turns enterprises’ video surveillance cameras into virtual team members. The company now processes twice as much video daily as is uploaded to YouTube.
With pre-LLM virtual assistants like Alexa and Siri widely seen as underwhelming, Amazon’s CEO Andy Jassy hinted that the company will release an agentic version which will be capable of taking actions for its user.
🛠️ USEFUL STUFF

Source: Microsoft
The company’s new model enabling AI agents to “see” a computer screen is now the #1 trending model on Hugging Face—the first agent-related model to do so. Microsoft, Anthropic, OpenAI, and Google are all racing to find an edge in the vision agent arena.
A group of heavy-hitters in the LLM and agent space are hosting a hackathon this weekend in San Francisco focused on computer vision agents.
This article gives an in-depth tutorial on how to build an agentic system to summarize scientific research and generate an entire slide deck on it.
Browserbase is a startup which provides a headless browser for agents to navigate the web as humans do, a crucial ability as websites increasingly block traditional web scrapers.
💡 ANALYSIS

Source: Financial Times
A report by the Big Four firm’s AI Institute charting the present and future of agents, which argues that business leaders must move now to stay ahead of the curve as they increasingly automate enterprise processes.
A contrary take which makes the case that companies should not be too hasty to throw money at agentic solutions when simpler options are available.
Existing user interfaces and user experience (UI/UX) are unsuited for the peculiarities of generative AI, according to the author, who offers pointers on building more user-friendly systems.
Only 30% of family businesses survive long enough to be passed on to the next generation, but this piece argues that number can be improved by employing new automations such as phone agents.
🧪 RESEARCH

Source: Cell
An extensive overview of the applications “AI scientists” will facilitate in biomedical research, up to and eventually including entire self-driving labs.
Thanks for reading! Until next time, keep learning and building!
What did you think of today's issue? |
If you have any specific feedback, just reply to this email—we’d love to hear from you
Follow us on X (Twitter), LinkedIn, and Instagram