- Building AI Agents
- Posts
- The Turing test has fallen. Does it matter?
The Turing test has fallen. Does it matter?
Plus: AutoGen becomes the #1 AI agent framework, how PwC is rolling out agents to 75k employees, and more

Welcome back to Building AI Agents, your biweekly guide to everything new in the AI agent field!

Thank you to the >100 agent builders, business executives, students, and more who showed up to Building AI Agents’ colab event with CrewAI in Chicago last Thursday—and a huge shout-out to Microsoft for lending us the space!
Attendees heard from a lineup of incredible speakers building agent tech, a panel discussion featuring leaders from some of the largest companies in the world…
…and got a sneak preview of our new enterprise agent consultancy—coming soon!
In today’s issue…
LLMs pass the Turing test
AutoGen becomes the #1 agent framework
Learn to use LLMs with OpenAI Academy
Build payment agents with PayPal’s MCP server
How PwC is rolling out AI agents to 75,000 employees
…and more
🔍 SPOTLIGHT

LLMs just passed the canonical test for general intelligence, and the world just kept tur(n)ing.
The Turing test, named for its eponymous inventor, the famous computer scientist Alan Turing, was proposed in 1949 as a simple but elegant way of determining if a computer system could “think” the way humans can. In it, a human evaluator carries on parallel text chats with two “people”—a real person and the computer. If the evaluator is unable to determine which is the human, then the system can be said to be intelligent.
Since then, passing the Turing test has become widely thought to be synonymous with artificial general intelligence (AGI), the Holy Grail of the AI field, and the likely start of a wholesale transformation of society as intelligent systems replace human labor. While many claims of beating weaker versions of the test have been made over the years, no one had yet convincingly shown that any AI system could beat Turing’s original formulation.
That is, until last week, when researchers at UC San Diego published a paper demonstrating that OpenAI’s top-of-the-line GPT-4.5 model, released in February, could fool the evaluator into believing it was the human 73% of the time. Meta’s Llama-3.1 won 56% of the time, though this was not statistically significant. GPT-4o, an older model, and ELIZA, a chatbot from the 1960s that was one of the first systems to be mistaken for a human, performed substantially worse.
So we’ve just entered a new epoch of world history, right? With AGI achieved, aren’t every worker’s days numbered as human-level AI agents take all of our jobs? Naturally, this was the breathless reaction of parts of the online influencer crowd, but the reality is more complex.
The Turing test measures the ability of a system to pretend to be human for a short period—5 minutes, in this case—but not to form and use long-term memories, critical for any application which could truly act as a full-blown humanlike employee. Text, too, is just one of the many interfaces through which humans interact with the world. Turing’s formulation of the test does not require the ability to see, to speak and listen, or, critically, to take actions in the real world.
Agentic AI is just starting to bring these capabilities within reach, as LLMs become augmented with memory, speech, tools, and more. Companies are saving hundreds of millions of dollars by streamlining their operations with agents, and use cases which were out of reach even a few short years ago are now ripe for automation. But today’s systems are still far off from full autonomy across the breadth and depth of tasks that humans are capable of.
The fall of the Turing test is certainly an important moment in the history of artificial intelligence—an indication of how far the field has advanced, and the incredible capabilities that are available even now. But it is not the Holy Grail, not yet.