Google DeepMind shows AI can now solve real research math

Google DeepMind’s latest math agent has moved AI from polished benchmark solving into genuine research territory, resolving nine open Erdős problems at a cost of only a few hundred dollars per problem.

The important part is not that an AI system did well on a hard math test. We have seen that story before. The important part is that Google DeepMind’s AlphaProof Nexus has produced machine-checkable proofs for problems that were still open in the Erdős catalog, a living collection of unsolved questions tied to one of the most prolific mathematicians of the twentieth century.

That changes the discussion. Competition math is difficult, but it is designed to have a clean answer. Open research problems are different. They may have resisted specialists for decades, they may require a new construction, and they do not come with the comfort of knowing that a solution is waiting at the end of the page. This is why solving 9 of 353 open Erdős problems matters more than another benchmark score.

According to the arXiv preprint published on May 21, 2026, the system also proved 44 of 492 open conjectures from the Online Encyclopedia of Integer Sequences and is being deployed in areas including combinatorics, optimization, graph theory, algebraic geometry, and quantum optics. That breadth is the signal investors and research leaders will notice. This is not just a clever tool for one corner of mathematics. It is a model for how AI systems might start contributing to knowledge work where verification is hard but possible.

AlphaProof Nexus is interesting because it does not simply ask a language model to write a convincing proof in natural language. That approach can sound impressive and still be wrong. Instead, the system combines model-generated ideas with Lean, a formal proof assistant that checks whether each logical step holds.

That distinction matters. A normal LLM reasoning chain can wander into subtle errors, especially in mathematics, where one missing condition can ruin an entire argument. A symbolic solver, on the other hand, is usually powerful inside a narrow formal system but brittle when the problem requires higher-level mathematical invention. DeepMind’s approach sits between those worlds. The language model proposes directions, the formal system rejects what fails, and the agent keeps searching until a proof passes.

For businesses, this is the lesson hiding inside the math. The best near-term AI systems may not be the ones that talk most fluently. They may be the ones attached to a hard verifier. In software, that verifier could be a test suite or compiler. In chip design, it could be simulation. In drug discovery, it could be a lab workflow or a validated screening pipeline. Mathematics is simply the cleanest example because proof checking can be made brutally precise.

The cost figure makes the result sharper. A few hundred dollars per problem is not cheap in consumer terms, but it is tiny compared with the cost of expert research time. Even if most attempts fail, the economics start to look very different when an agent can explore formal proof space continuously and cheaply.

Why this matters beyond mathematics

The immediate market implication is clear. AI-for-science companies have been promising that frontier models can accelerate discovery, but many of those claims are still difficult to measure. A formal proof is different. It either checks or it does not. That gives DeepMind a cleaner demonstration than most research-agent pitches can offer.

This will put pressure on startups building tools for scientists, engineers, and quantitative teams. It is no longer enough to sell a chatbot for researchers. The stronger product is an agent that can generate a hypothesis, test it against a reliable verification layer, and produce an output that a domain expert can inspect without starting from zero.

Academic research will feel this too. Mathematicians are not about to be replaced by a proof engine, and the nine solved problems represent only a small slice of the Erdős catalog. But the role of the human researcher may start shifting. Instead of spending all their time pushing through technical proof details, researchers may increasingly frame problems, judge significance, clean up results, and decide which machine-generated paths are worth turning into publishable work.

There is also a credibility issue to watch. AI math announcements have already become crowded, and some earlier claims in the field have mixed genuine progress with overstatement. Formal verification helps, but it does not answer every question. A proof can be correct and still be mathematically uninteresting. It can solve a narrow variant rather than the version people care about most. That is why community review will remain important.

Still, this result raises the floor. DeepMind has shown that a frontier AI system can do more than retrieve known patterns or solve contest problems with hidden answers. It can autonomously contribute to open mathematical research when paired with the right verification machinery.

The next phase will be less about whether AI can solve isolated problems and more about whether these agents can become useful research infrastructure. If the answer is yes, the market for AI research tools will move quickly from assistants that summarize papers to systems that help create new ones.

Also read: The ECB wants banks to treat AI cyber risk as urgent. • SpaceX is asking IPO investors to price it like an AI platform • Nvidia is watching Huawei turn China into a closed AI chip market.