AI researchers are testing life after the Transformer

The Transformer still runs modern AI, but the people who helped create it are now openly debating what comes next.

The most interesting argument in AI right now is not whether large language models work. They clearly do. It is whether the architecture underneath them has become so successful that it is now slowing the search for something better.

That question moved from research papers into public view after Pathway staged a San Francisco debate this month around Transformers versus post-Transformer systems. The cast made it hard to dismiss as theater. Lukasz Kaiser, one of the authors of the 2017 paper "Attention Is All You Need," defended the Transformer side. Llion Jones, another co-author of the same paper and now CTO of Sakana AI, argued from the post-Transformer side. Mathias Lechner of Liquid AI and Adrian Kosowski of Pathway joined the discussion around alternative approaches, including Pathway's Dragon Hatchling architecture.

As The Neuron reported in its May 19 account of the event, the crowd ultimately gave the night to Team Transformer. That is less important than the argument itself. When the people who helped build the foundation of the current AI boom are publicly debating how to move past it, entrepreneurs and investors should pay attention.

The Transformer became the default architecture because it solved a practical problem. Before it, sequence models relied heavily on recurrence or convolution, which made training slower and harder to scale. The Transformer used attention so tokens could compare themselves with other tokens directly, and the result fit modern accelerators unusually well.

That fit mattered. AI progress is not only a story about clever ideas. It is also a story about chips, memory, software libraries, training pipelines, cloud contracts and teams that know how to make the whole thing run. Once Transformers proved they could scale, the rest of the industry arranged itself around them.

This is why replacing them is hard. A new architecture does not merely need to look better in a paper. It has to be obviously better once the costs of tooling, talent, hardware support and deployment risk are counted. That is the burden facing every post-Transformer startup.

Still, the complaints are real. Long context remains expensive. Inference cost is a constant pressure. Models do not naturally update themselves from new experience without retraining or fine-tuning. Reasoning often has to be spelled out one token at a time, which works well enough to sell products but does not look like the final form of machine intelligence.

What Pathway is trying to prove

Pathway is using this moment to push Dragon Hatchling, or BDH, a post-Transformer architecture described in an arXiv paper first posted in September 2025. The company presents it as a biologically inspired model built around locally interacting neuron particles, scale-free network structure and Hebbian working memory. In plain English, it is an attempt to make AI systems reason and adapt through internal dynamics rather than depending only on the familiar Transformer stack.

The claim is not that BDH has already replaced GPT-class systems in the market. It has not. The more practical claim is that current architectures may be leaving too much efficiency and adaptability on the table. Pathway says BDH can show Transformer-like scaling in experiments from 10 million to 1 billion parameters, while offering sparse positive activations and stronger interpretability.

That is exactly the kind of pitch founders like to hear and engineers instinctively challenge. The startup world is full of architectures that sound elegant at small scale and then run into the wall of production reality. But the better question is not whether BDH wins tomorrow. It is whether architectures like BDH, Liquid AI's work on liquid neural networks, state space models and hybrid systems are starting to chip away at the assumption that attention-heavy Transformers are the only serious path forward.

For startups, this debate has a direct commercial edge. If Transformers remain dominant, the winners are likely to be companies that squeeze more efficiency out of inference, memory, retrieval, routing and specialized chips. If a post-Transformer approach starts beating the cost curve, the map changes. Infrastructure bets, developer tools, model serving platforms and enterprise AI roadmaps would all have to adjust.

This is where entrepreneurs should be careful. The market does not reward novelty by itself. Customers buy reliability, latency, price and measurable capability. A founder building around a new architecture has to show why it makes the product better now, not only why it is philosophically more satisfying than the incumbent.

The next race is about the curve

The Transformer is not weak. It has scale, talent, benchmarks and hardware on its side. Every major lab knows how to improve it, compress it, serve it and wrap it in products people will pay for. That installed base is a serious advantage.

But dominant architectures often look permanent until they do not. The first serious challenger may not arrive as a dramatic replacement. It may show up as a cheaper model for long-context workloads, a more adaptive system for agents, or a hybrid design that keeps attention where it helps and removes it where it is wasteful.

The practical takeaway is simple. Watch the evidence, not the branding. If post-Transformer systems can bend the cost and capability curve faster than the existing stack, capital will move quickly. If they cannot, the Transformer will keep its title for the same reason it won it in the first place: it works.

Also read: Sweden's self-driving bus crash puts autonomous transit on notice • NuExtract3 gives startups a smaller path to document AI • TSMC workers are testing the price of the AI chip boom