Jun 6, 2026 · 2:08 AM
Subscribe
Home Ai

DeepSeek V4 arrives with benchmark scores that put American AI labs on notice

DeepSeek released its V4 flagship model today without warning, posting benchmark scores that claim parity with GPT-5 and Claude 4 Opus at a reported training cost of just $5.6 million. The 540-billion parameter Mixture-of-Experts system ships with open weights, applying direct pressure on proprietary API providers and raising pointed questions about the effectiveness of US chip export controls.

Walter Schulze
· 4 min read · 469 views
DeepSeek V4 arrives with benchmark scores that put American AI labs on notice

DeepSeek's surprise release of its 540-billion parameter V4 model today posts benchmark numbers that match or beat GPT-5 and Claude 4 Opus , and it cost $5.6 million to train.

China's Hangzhou DeepSeek AI dropped its V4 flagship model today with no warning and no staged rollout, pushing a technical report and API access simultaneously while most of the Western AI industry was still at its desks. The move is classic DeepSeek: skip the hype cycle, ship the model, let the numbers do the talking. And the numbers, if they hold up to independent scrutiny, are the kind that will make proprietary AI API businesses very uncomfortable before the week is out.

The model scores 88.4% on MMLU and 92.1% on the newly introduced Humanities-X reasoning benchmark, figures the technical report says match or narrowly beat what GPT-5 and Anthropic's Claude 4 Opus posted earlier this quarter. That claim will face rigorous third-party testing in the coming days, but even the act of publishing those specific comparisons is a statement of intent. DeepSeek is not positioning V4 as a credible alternative to frontier American models. It is positioning it as a peer.

Buried beneath the headline benchmark numbers is a detail that carries more long-term weight: DeepSeek trained V4 on a cluster of 16,000 Hopper-era GPUs at a total compute cost of $5.6 million, doubling the training efficiency of its own V3 iteration. For context, estimates for comparable US frontier model runs have routinely landed in the hundreds of millions of dollars. Whether or not V4 truly matches GPT-5 on every task, demonstrating that top-tier performance is achievable at this cost fundamentally disrupts the capital intensity narrative that has justified enormous fundraising rounds across the sector.

The architecture doing this work is Mixture-of-Experts, a design that activates only a relevant subset of parameters for any given input rather than running the full 540 billion. It is a technical approach that has been gaining ground precisely because it delivers competitive performance while reducing inferential compute requirements. V4's release will accelerate serious investment into MoE research at labs that have been slower to commit.

Geopolitical timing that nobody is going to ignore

The release lands weeks after the United States tightened export controls on advanced AI chips, a policy explicitly intended to slow Chinese labs' access to cutting-edge compute. DeepSeek V4 is the clearest empirical counterargument to that strategy so far. If a Chinese lab can train a model that benchmarks against the best American offerings using restricted hardware, the effectiveness of hardware-based containment becomes a genuinely open question , and one that US policymakers will need to answer in something other than broad strokes.

Founder Liang Wenfeng's backing through his hedge fund High-Flyer has allowed DeepSeek to absorb compute costs that would be prohibitive for a standalone research organization, but the comparison to Meta's Llama strategy is the more instructive one here. Open weights, an active developer ecosystem, and a stated commitment to accessibility are how DeepSeek builds influence at a pace that proprietary API pricing cannot match. Every developer who builds a product on V4 is a distribution node that a closed model cannot easily reclaim.

For investors, the immediate pressure falls on companies whose valuations rest on the assumption that frontier model performance requires frontier model spend. That assumption is eroding. Operators and enterprises evaluating AI infrastructure contracts now have a credible, open-weight alternative with benchmark parity claims against the most expensive models on the market. Pricing power across the proprietary API layer will be tested. What to watch next is independent replication of the benchmark figures , if third-party evaluators confirm the Humanities-X and MMLU scores within a reasonable margin, the conversation about US AI dominance changes register entirely.

Also read: DeepSeek drops a 1.6 trillion parameter open-source model and the frontier AI market may never be the sameDeepSeek bets on Huawei silicon to slash the cost of frontier AI inferenceDeepSeek v4 Flash is so cheap it should embarrass every Western AI lab with a pricing page

TOPICS
Walter Schulze brings all the breaking news stories in the tech and startup world and to ensure that Startup Fortune offers a timely reporting on the trends happen in the industry. He now works on a part time basis for Startup Fortune specializing in covering tech and startup news and he also sheds light on investment opportunities and trends.
Related Articles
More posts →
Loading next article…
You're all caught up