- Building AI Agents
- Posts
- The rise of talking AI
The rise of talking AI
Plus: Agent evals are a scam, 7 AI terms you need to know, and more
Edition 117 | September 8, 2025
still can't believe they did this without any agentic workflows
— Turner Novak 🍌🧢 (@TurnerNovak)
9:17 PM • Sep 4, 2025
Fun fact, if GPT-5 really is 635 billion parameters as some have estimated, you would need about 160,000,000 Apollo Guidance Computers just to hold it in RAM. Moore’s law is pretty cool.
Welcome back to Building AI Agents, your biweekly guide to everything new in the field of agentic AI!
If we haven’t already, we’d love to hear your feedback on whether you prefer this Building AI Agents format to our old one. Let us know in the poll at the very bottom, below Quote of the Day. Thanks!
In today’s issue…
A new generation of AI agents learns to speak
OpenAI’s tips for prompting its new speech model
China is prepping an agent to rival OpenAI
Agent evals are a scam
7 AI terms you need to know
…and more
🔍 SPOTLIGHT

It’s time to talk about talking agents.
As covered in last Thursday’s issue, OpenAI just released gpt-realtime, its new model for conversational agents. gpt-realtime is a speech-to-speech model, meaning that it takes human speech as an input, reasons through an answer, and responds with a realistic, computer-generated voice of its own. Right off the bat, its capabilities are pretty impressive: improved intelligence and instruction following compared with prior models, plus the ability to accept images as input and use MCP servers.
The most interesting thing about the announcement, though, is what it says about AI agents’ readiness for the human world. Up to this point, most of the agents that have delivered real value are business automations performing tasks like summarizing documents, debugging code, and performing data analyses. For most of these, the agent chugs away silently in the background, interfacing only with a trained human user—if anyone at all.
gpt-realtime, though, is the latest example of a new class of models that interact with humans the way we interact with each other: through speech. So far, the overwhelmingly dominant application of this type of agent has been customer service calls—injecting intelligence into the otherwise dumb decision trees that have taught everyone to just say “I WANT TO SPEAK TO A HUMAN” over and over again.
But the possibilities are so much greater. Think of all the desk work you perform on a daily basis that involves clicking mindlessly at a screen. Here are a few pulled randomly from my to-do list: categorize my transactions from last month, move funds from my checking account to investments, make a list of essential agent tools covered in previous issues for members of our community (shameless plug)—and buy a new bathmat. It would be really nice to have a full-time personal assistant to do these for me, but unless Jeff Bezos suddenly feels the need to drop a couple million to buy an AI agent newsletter for the Washington Post, that’s a bit out of my budget.
The promise of voice agents is an interface through which we can interact with a new automated economy, one where all of these tasks are handled by intelligent agents. Where you can just tell your computer “download my transactions from all of my accounts, add them to my transaction tracker, and put them in the categories shown. If any of them are ambiguous, ask me.” Multiply that kind of efficiency across every task that sucks up our time as human beings, and you have a whole new world.
We don’t have J.A.R.V.I.S., HAL 9000, or Cortana just yet. But voice agents bring us one step closer.
Always keep learning and building!
—Michael
🤖 LEARN TO BUILD AGENTS
You’ll learn to build your first AI agent in just 30 minutes, no coding needed. Get step-by-step guides, full agent courses, and plug-and-play templates to help you every step of the way. Ready to go beyond basic ChatGPT?
Lock in your lifetime price before it increases again at 85 members!
Not sure? Join The Building AI Agents Community on a 7-day free trial.
Cancel anytime.