Google DeepMind Lets AI Rewrite Its Own Algorithms, Beating Human Experts

Google DeepMind's AlphaEvolve uses large language models to automatically discover new game theory algorithms that match or outperform decades of human-designed approaches.

For years, the algorithms powering competitive AI in imperfect-information games like poker have been built the old-fashioned way: researchers relying on intuition, trial-and-error, and painstaking manual refinement. Google DeepMind just demonstrated that a large language model can do that job better, and faster, by rewriting its own code through evolutionary search.

The system, called AlphaEvolve, takes an entirely different approach to algorithm design. Rather than tweaking numeric parameters, it mutates actual source code. A population of candidate algorithms starts from a standard implementation, and at each generation the LLM, specifically Gemini 2.5 Pro, proposes modifications to a parent algorithm's code. Promising variants survive and reproduce. Weak ones are discarded. The fitness metric is brutally simple: how exploitable is the resulting strategy after a fixed number of iterations?

What makes this result compelling is not just that the system works, but what it actually discovered. In the domain of Counterfactual Regret Minimization, a foundational technique for solving imperfect-information games, AlphaEvolve produced an entirely new variant called Volatility-Adaptive Discounted CFR, or VAD-CFR. Instead of relying on the static discounting rules and linear averaging that human researchers have refined for years, VAD-CFR dynamically adjusts its discounting based on measured volatility in the game. The system found a mechanism that human designers had never explored, and it performs competitively against established baselines like DCFR and PCFR+ across multiple test games.

Game theory algorithms might sound like an academic niche, but the implications stretch well beyond card tables. Multi-agent reinforcement learning, the broader field encompassing these techniques, underpins real-world systems where multiple actors compete or cooperate with incomplete information. Think automated trading, negotiation systems, cybersecurity defense, and autonomous vehicle coordination. Any domain where you need an AI to reason about what another intelligent agent might do, without full visibility into their intentions, relies on precisely these kinds of equilibrium-finding algorithms.

The two algorithmic families DeepMind targeted with AlphaEvolve, Counterfactual Regret Minimization and Policy Space Response Oracles, represent the backbone of modern computational game theory. CFR iteratively minimizes regret across decision points, eventually converging toward Nash Equilibrium strategies. PSRO takes a population-based approach, maintaining diverse strategies and computing meta-distributions over them. Both have seen years of human engineering to reach their current performance levels. As MarkTechPost reported in its coverage of the research, the evolved variants matched or exceeded these hand-crafted baselines on standard benchmarks using the OpenSpiel framework, with final evaluation conducted on larger, unseen games to ensure the results were not mere overfitting.

The broader context here is critical. This is not the first time DeepMind has used automated search to surpass human expertise. AlphaZero famously taught itself to master chess, Go, and shogi without human examples. But AlphaZero searched within a fixed game environment. AlphaEvolve searches the space of algorithms themselves, which is a fundamentally harder and more open-ended problem. The system is essentially conducting automated research, proposing and testing hypotheses in code rather than prose.

The Engineering Tradeoff

None of this is free. The evolutionary process requires substantial compute. Every generation involves evaluating candidate algorithms across multiple proxy games, computing exploitability metrics, and maintaining a distributed population. The researchers used an exact best response oracle computed via value iteration to remove Monte Carlo sampling noise, which is a luxury available in research settings but expensive at scale. Whether this approach generalizes efficiently to problems where exact solutions are impractical remains an open question.

There is also the matter of interpretability. VAD-CFR's volatility-adaptive mechanism was discovered through search, not designed through theory. The research community will need to work backward from the discovered code to understand why it works, which is a different kind of scientific process than traditional algorithm design. That tension between performance and understanding is becoming a defining feature of AI-generated science.

For startups and enterprises building multi-agent AI systems, the practical takeaway is straightforward: the bottleneck in algorithmic innovation is shifting. Where teams once needed specialized game theorists to hand-craft solvers, LLM-powered search can now augment or replace parts of that pipeline. The expertise moves from designing algorithms to designing the evaluation environments that guide the search. If you can define the right fitness function, the machine can find the algorithm. That is a meaningful change in how computational research gets done, and it is happening faster than most people realize.