Sakana AI launches Fugu Ultra and its orchestration model matches Fable and Mythos without training a single frontier model

Sakana AI's Fugu Ultra is a useful reminder that the next AI race may not be won only by training bigger models. But you should read the launch as a routing claim first, not as proof that a small Tokyo startup has magically beaten every frontier lab.

Sakana AI says it released Fugu and Fugu Ultra today, June 22, with a direct pitch to anyone tired of building on a single model provider. The company says Fugu Ultra scored 54.2 on SWE-Pro, 95.1 on GPQA-Diamond, and 93.2 on LiveCodeBench v6, putting it around the same range as Claude Opus 4.6, Gemini 3.1, and GPT-5.4 on those tests. Those are big numbers. They also need to be treated carefully, because benchmark leadership is only as useful as the conditions behind it.

The company behind the claim is not some anonymous wrapper shop. Sakana AI was founded in Tokyo in 2023 by David Ha, Llion Jones, and Ren Ito. Jones was one of the authors of the 2017 paper 'Attention Is All You Need,' the transformer paper that sits under much of the current AI industry. Reuters reported in 2024 that Japanese megabanks and Nvidia were among the investors in Sakana's roughly 30 billion yen funding round. That matters here because Sakana has always sold a different idea of progress: collective intelligence over one giant model.

Fugu follows that idea closely. Instead of training a new frontier model from scratch, Sakana says it trained a 7-billion-parameter orchestrator whose job is to decide which external model should handle each part of a problem. It can call a pool of third-party systems, delegate sub-tasks, check outputs, and synthesize one answer through an OpenAI-compatible endpoint. Fugu Ultra is the heavier version, aimed at longer, harder work such as Kaggle-style analysis, paper reproduction, patent search, and cybersecurity.

That is the part you should pay attention to. A small router beating a frontier model outright would be a wild claim. A small router getting strong benchmark results by choosing the right large model at the right moment is more believable, and more interesting. It suggests the scarce skill may not only be model training. It may be knowing when not to use the same model for everything.

Sakana says the technical work is tied to two ICLR 2026 papers, Trinity and the Conductor, which describe learned systems for coordinating multiple model experts. That academic framing helps, because orchestration is an easy word to abuse. Plenty of products already call a few APIs in sequence and present the result as agentic intelligence. Fugu's stronger claim is that the routing itself is learned, tested, and improved as a system. If that holds up in customer use, it is a real product difference, not just a nicer control panel.

The timing is doing Sakana a favor. The Verge and other outlets reported this month that Anthropic's Mythos and Fable models became entangled in US export-control fights, with access disrupted for foreign users and clients. That is not an abstract policy problem if your bank, ministry, telecom company, or manufacturer has built a workflow around one model family. Access can change faster than procurement teams can rewrite their stack.

Fugu does not remove that risk. It routes around it. If one provider cuts access, the orchestrator can point work somewhere else. If a new model enters the pool and performs better on one kind of task, Sakana says it can be folded in without forcing customers to rebuild the application around a new API. Frankly, that is a more concrete version of sovereign AI than most of the slogans attached to the term.

The Japan angle is also real. A Tokyo company selling model orchestration to Japanese enterprises has a cleaner local story than another US cloud vendor asking companies to trust a foreign frontier stack. Sakana's own history, its Japanese investor base, and the country's focus on domestic AI capability all support that positioning. You do not have to pretend the technology is nationalistic to see why the buyer preference matters.

Pricing is where the hard questions begin. Fugu Ultra is listed at $5 per million input tokens and $30 per million output tokens, with rates doubling for contexts above 272,000 tokens. Subscription plans run from $20 to $200 a month, and Sakana says subscribers before the end of July 2026 get a free second month at their tier. The API is not available in the EU or EEA while the company works on GDPR compliance, which immediately limits the rollout.

The missing number is the full cost of all the calls underneath the answer. If Fugu Ultra reaches its best scores by dispatching work across several expensive frontier models, enterprise buyers will ask what the real bill looks like after the orchestration layer, the underlying inference, and long-context surcharges are counted together. They should. A clever router can save money by avoiding the wrong model, but it can also hide a very expensive chain of calls behind one neat response.

Sakana has made a serious argument, not just a loud launch claim. The old assumption was that the next leap comes from the next bigger training run. Fugu Ultra points to another path: small learned systems coordinating larger specialists. You do not have to train Mythos to compete with it. You have to know when to ask something else.

Also read: Bain is vibecoding replicas of software acquisition targets and the results are rewriting M&A • Tencent's WeChat AI Agent Is a Bet That the Super App Swallows the AI App • AWS just made enterprise AI agent infrastructure a solved problem and startups should be worried