Thinking Machines is betting AI will win by listening better

Mira Murati's Thinking Machines Lab has given its clearest product signal yet: AI that stays present while people talk, move, interrupt, and change their minds.

Thinking Machines Lab is not trying to make the next chatbot feel a little faster. It is trying to make the whole idea of taking turns with software feel outdated. On May 11, the company introduced a research preview of interaction models, a class of AI systems designed to process audio, video, and text continuously while thinking, responding, and acting in real time.

That sounds technical, but the point is simple. Most AI products still behave like email with a smarter recipient. You write or speak, the model waits, then it answers. If you interrupt, point at something, change direction, or need the model to watch what is happening while it talks, the interface begins to show its limits. Thinking Machines wants to remove that bottleneck.

According to Thinking Machines' own research preview, the model handles interaction natively rather than through external scaffolding. That is the important part. Existing real-time systems often rely on separate components to detect when someone has stopped speaking or when the model should jump in. Thinking Machines is arguing that collaboration should be part of the model itself, not a layer glued around it after the fact.

The preview names TML-Interaction-Small, a 276 billion parameter mixture-of-experts model with 12 billion active parameters. Its architecture breaks live interaction into time-aligned 200 millisecond micro-turns, so input and output can move in the same stream. The model can listen while speaking, watch while reasoning, and keep the conversational thread open while another system works in the background.

That second system matters. Thinking Machines describes a fast interaction model paired with an asynchronous background model that can handle deeper reasoning, tool use, browsing, and longer-horizon work. In practice, this means the user does not have to wait in silence while the AI goes off to do something difficult. The interaction model remains present, takes new input, and folds the background result into the conversation when it is useful.

For founders, this is not a small interface tweak. A useful AI collaborator inside a product design review, a sales call, a coding session, or a customer interview has to keep up with messy human work. People do not always know what they want upfront. They think out loud. They point. They correct themselves. They notice a problem halfway through. If an AI system can stay with that flow, it becomes closer to an operating partner than a prompt box.

The examples Thinking Machines shared are deliberately everyday: tracking animal mentions in a story, translating speech in real time, warning someone when posture changes, generating a chart while a conversation continues. None of those examples alone is a company. Together, they show where Murati's team thinks the market is going. The next AI race may not be only about which model scores highest on intelligence tests. It may be about how much human context can reach the model before the user gives up.

The benchmarks need scrutiny

The company's numbers are striking. On FD-bench v1, TML-Interaction-Small reports 0.40 seconds of turn-taking latency, compared with 1.18 seconds for GPT-realtime-2.0 in minimal mode and 0.57 seconds for Gemini-3.1-flash-live-preview in minimal mode. On FD-bench v1.5, Thinking Machines reports an average interaction quality score of 77.8, ahead of 46.8 for GPT-realtime-2.0 and 54.3 for Gemini's live preview baseline.

Those figures are useful, but they should not be treated like a public verdict yet. The model is not broadly available. The benchmark framing comes from Thinking Machines. Some of the newer tests around time awareness, verbal cues, and visual proactivity are internal or adapted by the company. That does not make the work weak. It means the claims need the same thing any frontier AI claim needs: outside access, repeatable testing, and real users trying to break the smooth demo.

The Verge also noted that people cannot try the interaction models yet, with a limited research preview planned in the coming months and a wider release expected later this year. That gap matters because real-time AI is brutal in the wild. Latency depends on networks, hardware, context length, safety filters, and the simple reality that users behave less neatly than test clips.

Still, the direction is credible because it fits the broader weakness in current AI tools. Autonomy gets the headlines, but many valuable workflows still need a human in the loop. A founder reviewing a prototype does not always want an agent to disappear for twenty minutes and return with a finished answer. Sometimes the advantage is staying in motion together, making small corrections before they become expensive mistakes.

Thinking Machines is also making this move from a position of unusual attention. Murati, the former OpenAI CTO, founded the company in 2025 and has already turned it into one of the most watched AI labs. Its first public product, Tinker, focused on model training and fine-tuning. Interaction models point more directly toward a product surface ordinary teams can understand.

The next thing to watch is whether Thinking Machines can turn a research preview into a dependable workflow. If it can, the competitive question changes. The best AI product may not be the one that gives the longest answer or performs the most autonomous task. It may be the one that listens well enough to keep the human in the room.

Also read: OpenAI turns Daybreak into its cybersecurity answer to Mythos • ServiceNow may fund its AI security push with $4 billion in debt • South Korea is turning worker know-how into robot training data