Mira Murati is making AI collaboration the product

Thinking Machines Lab is betting that the next big AI fight will be about how humans work with models, not only how large the models become.

Mira Murati has finally put a sharper shape around Thinking Machines Lab, and it is not the usual promise of a bigger chatbot. The company's new push is built around interaction models, a class of AI systems designed to listen, watch, speak and keep working at the same time.

That sounds simple until you compare it with how most AI products still behave. You type or talk. The system waits. Then it answers. Thinking Machines wants to break that rhythm by making the model stay present while the user is still speaking, correcting, showing a screen, changing direction or asking for help in the middle of a task.

In its May 11 research post, the company introduced TML-Interaction-Small, a 276-billion-parameter mixture-of-experts model with 12 billion active parameters. The model processes audio, video and text in 200 millisecond micro-turns, which lets it treat conversation as a continuous stream rather than a neat sequence of turns. Thinking Machines says the system reaches 0.40-second turn-taking latency, fast enough to feel closer to a live exchange than a voice assistant waiting for permission to talk.

As TechCrunch reported this week, the company is not opening the model to the public yet. A limited research preview is expected in the next few months, with a wider release planned later in 2026. That matters because the demos and benchmarks may be impressive, but the real test will be whether the experience holds up when developers, workers and companies try to use it outside controlled conditions.

Murati's move is a startup strategy as much as a technical one. Thinking Machines is not trying to win attention only by saying it has a model with more parameters or better scores on a familiar leaderboard. It is trying to make the way people collaborate with AI the main product advantage.

This is a smart place to look. The market already has strong general-purpose models from OpenAI, Google, Anthropic and Meta. For a new frontier lab, saying the model is smarter is a difficult message unless users can feel the difference immediately. A model that can interrupt at the right moment, follow a meeting while someone shares a screen, translate while someone is still speaking or warn a developer when it sees a bug on screen creates a different kind of proof.

That is why the full-duplex design matters. In a normal turn-based system, the model often needs outside scaffolding to decide when the user is done. Voice-activity detection, routing layers and other tools make the experience feel real time, but they are not the model itself. Thinking Machines is arguing that interaction should be native to the model, so interruptions, silence, overlap and visual cues are part of the context.

If that works, it could create a new category of work tools. Not a chatbot in a sidebar. Not an agent that disappears for five minutes and returns with a result. Something closer to a colleague who stays in the loop while the work is happening, then hands off deeper reasoning to background systems when the task calls for it.

The Agent Hype Is The Foil

The timing is important. Much of the AI market has spent the past year talking about autonomous agents that can plan, browse, code, buy, schedule and execute with less human involvement. That idea is powerful, but it also creates anxiety for companies that do not want software making fragile decisions without supervision.

Thinking Machines is leaning into a different promise. Keep the human close. Let the model move faster, but make it easier for the user to steer, interrupt and correct. For business software, that may be more useful than full autonomy. Most real work is messy. People change their minds, add context halfway through and notice things that were not written into the original prompt.

There is also a practical reason this could matter. In legal review, sales calls, design critique, customer support and software development, latency changes behavior. If an AI assistant responds after the moment has passed, people stop using it as a collaborator and start treating it like a search box. If it can respond while the work is still moving, the product has a chance to become part of the workflow itself.

Still, there is room for skepticism. Thinking Machines has published its own benchmarks and says TML-Interaction-Small performs well on FD-bench v1.5 and Audio MultiChallenge, but outside researchers have not had broad access to test the model. The company also acknowledges limits around connectivity, long real-time sessions, safety and scaling larger models into low-latency settings.

That is not a small caveat. Full-duplex AI is only useful if it is fast, reliable and socially tolerable. An assistant that interrupts too often will feel rude. One that waits too long will feel ordinary. One that misses visual or audio context in a workplace setting could create more friction than it removes.

The bigger question is whether Thinking Machines can turn a research preview into a durable product wedge before larger labs copy the interface. Murati has the credibility, the team and, based on recent compute commitments, the ambition to compete at frontier scale. But in AI, a clever interaction model becomes a company only when users build habits around it.

That is what to watch next. If Thinking Machines can make human-in-the-loop AI feel less like prompt engineering and more like working with someone who is paying attention, the startup will have found a real opening. If not, interaction models may become another feature absorbed into the same platforms it is trying to outmaneuver.

Also read: The U.S. and China are moving AI safety into power politics • Meta's Louisiana AI campus puts a price on public incentives • STT GDC's India IPO plan puts AI infrastructure in the market