- Building AI Agents
- Posts
- Browser agents are going mainstream
Browser agents are going mainstream
Plus: Salesforce’s Agentforce surge, OpenAI’s security reality check, why fewer tools beat smarter prompts, and more...
Edition 147 | December 29, 2025
We thought it was English class. It was Prompt Engineering 101.
Welcome back to Building AI Agents, your biweekly guide to everything new in the field of agentic AI!
In today’s issue…
Browser agents quietly go mainstream
Salesforce hits 6,000 Agentforce customers
OpenAI admits prompt injection is unsolvable
Vercel proves fewer tools can make AI agents faster and more reliable
Firecrawl launches an agent for scraping the modern web
…and more
🔍 SPOTLIGHT

Source: Building AI Agents - Nano Banana
Browser agents are quietly going mainstream.
Chrome is integrating Gemini natively. OpenAI released Atlas, a full browser agent that can see your screen, navigate pages, and take actions on your behalf. Even Firefox now lets you connect Gemini, ChatGPT, or other models directly into the browser.
To push myself to use these, I set OpenAI’s Atlas as my default browser for the past month, and I've started using it in ways I didn't expect.
I basically treat it like a personal assistant watching my screen at all times (even now as I write this). If I'm comparing two online courses, I say "remember this one," navigate to the second course, and ask "what are the differences between these two?" and “compare review sentiment”. If I'm reading a dense article, I highlight a paragraph and say "can you clarify this?" or "give me an example that simplifies this." No copying & pasting and no opening a separate chat window. The browser agent sees what I see.
This has fundamentally changed how I work. I use it to extract key points from long PDFs before deciding whether to read the whole thing. I use it to summarize recipes so I don't have to scroll through walls of ads. I ask it for ingredient substitutions while I'm still on the page. I've even used it to analyze marketing material and ad copy, asking it to pull out the persuasion techniques or compare messaging across competitors.
What's different about browser agents isn't the intelligence. It's the integration. Traditional chatbot windows require you to context-switch: copy the text, open a new tab, paste it in, explain what you're looking at. Browser agents remove that friction entirely. The context is already there. You just talk.
This is the same shift we've seen with coding agents. Tools like Claude Code, Cursor, and Codex work because they're embedded in the environment where developers already spend their time. Browser agents do the same thing for everyone else. Your browser is where you read, research, shop, and learn. Now there's an agent that can see all of it.
We're still early. Atlas isn't perfect, it sometimes loses context on complex pages, and the "always watching" model raises real questions about privacy and security. OpenAI acknowledged this week that prompt injection attacks on browser agents are "unlikely to ever be fully solved." The tradeoff for this utility is an expanded attack surface. That's worth understanding before you give an agent access to your browsing.
But I can see where this will quickly go. Agents are moving from tools you visit to assistants that are just there. Browser today. Desktop and mobile next. The interface is disappearing.
If you haven't tried a browser agent yet, I would recommend it. Use Gemini in Chrome, connect a model in Firefox, or download Atlas from OpenAI. Start with something simple: summarize this page, compare these two tabs, explain this paragraph. You’ll see how it changes the way you work.
Keep learning and building!
—AP

