A new study says advanced chatbots are no longer just sounding human. In some controlled tests, they were judged more human than the people sitting across from them.
The old Turing test has always had a simple attraction: put a machine and a person behind a screen, let someone question them, then see who can pass as human. For decades, that made the test feel more like a philosophical dare than a business problem. That is changing now.
According to new research from UC San Diego published in the Proceedings of the National Academy of Sciences on May 19, 2026, large language models can pass a standard three-party Turing test when they are prompted to behave like ordinary people. The strongest result came from GPT-4.5, which was judged to be the human participant 73% of the time after five-minute conversations. That was not just better than chance. It was better than the real humans it was being compared against.
The researchers, Cameron Jones and Benjamin Bergen, tested four systems: ELIZA, GPT-4o, LLaMa-3.1-405B and GPT-4.5. Participants spoke at the same time with another human and with one model, then picked which one they believed was the real person. LLaMa-3.1 reached 56% with the same humanlike persona prompt, which put it roughly in line with the humans. ELIZA and GPT-4o, by contrast, came in at 23% and 21% in the baseline results.
That detail matters. The study does not say every model automatically sounds human in every setting. Without the persona prompts, GPT-4.5 and LLaMa-3.1 performed much worse, at 36% and 38%. In other words, the breakthrough was not only raw intelligence. It was instruction, framing and social style. The models needed to be told how to act like people before people reliably treated them that way.
For founders, this is the point worth sitting with. The Turing test is not a clean measure of consciousness, understanding or wisdom. It is a measure of whether a person can tell what they are interacting with. That makes it deeply practical in a market now filling with AI sales agents, customer support bots, recruiting assistants, tutors and autonomous workflow tools.
If an AI system can maintain a casual, convincing persona for five minutes, it can probably handle a surprising amount of front-line customer interaction. It can apologize, ask follow-up questions, mirror tone, make small talk and avoid the stiff phrasing that used to give bots away. For enterprise software vendors, this study will become useful sales material, especially when pitching AI agents that sit directly in front of customers rather than behind internal dashboards.
There is a catch. Passing as human is commercially useful only until it becomes a trust problem. The same traits that make an agent feel natural in a support chat also make it harder for a consumer to know whether there is a real employee, a scripted automation or a persuasive synthetic persona on the other side. That matters in banking, healthcare, insurance, education and any market where decisions carry personal consequences.
The study also replicated part of the result in longer 15-minute games, where two persona-prompted models achieved pass rates of 56% and 59%. That weakens the easy objection that the result was just a short-chat trick. It also tells businesses that longer interactions are not automatically enough to expose the machine. People are judging humanness through social cues, typing style and emotional texture, not by running a formal intelligence exam.
Trust infrastructure becomes the next market
This is where the opportunity moves beyond the model companies. If customers cannot reliably spot AI, verification becomes a product category. Companies will need clearer disclosure, stronger identity checks, better agent labeling and reliable logs showing when a human was involved. That is less glamorous than another benchmark leaderboard, but it may become more valuable.
Regulators are already moving in that direction. The European Commission opened feedback in May 2026 on AI Act transparency guidance, with obligations due from August 2, 2026 that require people in the EU to be informed when they are interacting with AI systems in many direct-use cases. A study like this gives those rules a sharper commercial edge. Disclosure is no longer a theoretical consumer protection issue. It is a response to systems that may be socially convincing enough to erase the obvious difference.
Investors should read the findings with some discipline. This is not a clean declaration that GPT-4.5 thinks like a person, and it does not settle the debate over machine cognition. What it does show is that conversational fluency has become good enough to reshape customer experience, fraud risk and product design at the same time. That is a large enough market signal.
The next phase will not be about whether AI can sound human in a lab. It will be about whether companies can deploy humanlike systems without confusing the people they serve. The winners will be the teams that treat disclosure, consent and verification as part of the product, not as legal text pasted on after launch.
Also read: Singapore is putting AI agents on a government register • AI designed vaccines are moving from theory into human trials • OpenAI makes ChatGPT memory more active and harder to ignore