MiniMax M3 gives Chinese AI labs a new frontier coding test

MiniMax M3 is not just another model release. It is a test of whether Chinese AI labs can compete on coding agents, long context and multimodal work at the same time.

MiniMax has officially released M3, a new frontier model aimed squarely at developers building coding agents, long-running automation and multimodal workflows. The Shanghai AI lab is pitching it as a rare combination: frontier coding performance, a one-million-token context window and native support for image and video input.

That matters because the model race has moved beyond chatbot fluency. The most valuable frontier models are now being judged by whether they can work inside a codebase, use tools, keep context over long sessions and recover when a task goes wrong. In that world, MiniMax M3 is less about another leaderboard screenshot and more about whether a non-US lab can become part of the daily developer stack.

According to MiniMax's release materials, M3 is available now through MiniMax Code, token plans and API services, while the company says it will publish a technical report and open-source the corresponding model weights over the next 10 days. That timing is important. Developers can test the hosted model immediately, but the real open-weight claim will not be fully proven until the weights and report are actually public.

MiniMax says M3 scores 59.0% on SWE-Bench Pro, 66.0% on Terminal-Bench 2.1, 34.8% on SWE-fficiency, 28.8% on KernelBench Hard and 74.2% on MCP Atlas. It also says M3 beats GPT-5.5 and Gemini 3.1 Pro on SWE-Bench Pro while approaching Claude Opus 4.7, and that it reaches the top score on Claw-Eval, an end-to-end autonomous agent benchmark.

Those numbers should be treated with the right amount of caution. MiniMax discloses that several results were run on its own infrastructure, often using agent scaffolding such as Claude Code, Mini-SWE-Agent or Terminus. That does not make the results useless. It does mean buyers should wait for independent runs before making procurement decisions.

The harder point is that coding benchmarks are changing. DeepSWE, for example, has been pushing attention toward original long-horizon software engineering tasks rather than short fixes or public patches. Its public summary now shows GPT-5.5 leading at 70%, Claude Opus 4.8 at 58%, GPT-5.4 at 56% and Claude Opus 4.7 at 54%. M3 is not yet on that official DeepSWE board, so the clean comparison with the newest Claude release still has to wait.

For founders and engineering teams, that distinction is practical. A model that looks strong on SWE-Bench Pro may still behave differently when asked to make a multi-file change across a messy internal repository. The only useful question is whether it can hold the request, inspect the right files, make the change, verify it and stop before creating extra work for the human reviewer.

Long context becomes a cost problem

M3's one-million-token context window is useful, but the more interesting claim is how MiniMax gets there. The company says its MiniMax Sparse Attention architecture cuts per-token compute at one-million-token context to one-twentieth of the prior generation, with more than 9 times faster prefilling and more than 15 times faster decoding.

That is the real business angle. Long context sounds impressive, but most companies do not need to dump a whole repository, a year of tickets and every design document into a prompt. They need a model that can keep enough state to work over time without making latency and inference costs painful. If MSA performs as described outside MiniMax's own tests, it could make long-context agents more economical for ordinary developer workflows.

MiniMax is also tying M3 to MiniMax Code, its agent product. The company says the tool can break complex work into multi-stage workflows, use a producer and verifier loop, and support computer use through the model's native multimodal capabilities. That puts MiniMax in the same strategic lane as Anthropic's Claude Code, OpenAI's coding agents and Google's Gemini developer tools. The product is no longer just the model. It is the model plus the harness around it.

The open-weight promise also changes the commercial story. MiniMax listed on the Hong Kong Stock Exchange on January 9, 2026, according to a Chinasoft International filing, and it is now trying to prove that a listed Chinese AI company can sell model access, grow developer adoption and still feed the open-source ecosystem. That is a difficult balance. Open weights can build trust and distribution, but hosted APIs and agent subscriptions are where recurring revenue usually lives.

The next few weeks will show how much of M3's launch survives contact with outside testing. Developers will look for the weights, the technical report, reproducible benchmarks and real pricing at scale. If those pieces arrive cleanly, MiniMax will have done more than ship another capable model. It will have made the frontier race harder for every lab selling coding agents as the next big enterprise workflow.

Also read: AI detectors are turning ordinary student writing into evidence. • Washington is closing an offshore route for Nvidia AI chips • Tesla faces a trust problem inside its own self-driving AI team