Depthfirst says its lean AI caught Mythos misses at a fraction of the cost

Depthfirst says a task‑tuned model found critical internet vulnerabilities Mythos missed, and it did so for roughly one tenth the compute cost, a claim that forces a rethink of how enterprises buy AI security.

Depthfirst, a startup that has been quietly building AI-first tooling for vulnerability discovery, this month announced that its system autonomously found 12 memory corruption flaws in FFmpeg that Anthropic's Mythos had not reported, and that the work cost roughly $1,000 in compute versus $10,000 for the Mythos scans, according to the company's Open Defense announcement and a contemporaneous Forbes report.

The core of Depthfirst's argument is simple, and it matters: security research is a narrow, high‑signal task, and a model and harness built only for that job can be far more efficient than a large generalist model, the company writes on its Open Defense page while describing a specialized harness, post‑training on exploitability, and end‑to‑end validation that reduces false positives .

This is not just marketing. Multiple outlets picked up Depthfirst's claim that it found additional high‑severity issues in FFmpeg after Anthropic ran several hundred Mythos scans, and the startup says those results underpin a new $5 million Open Defense credits program to give maintainers access to its tooling .

What this means for startup economics

If Depthfirst's numbers hold up under independent scrutiny, the implication is practical: defenders can get frontier‑class outcomes without frontier‑class budgets, by buying purpose‑built systems instead of generalist compute by the hour. Depthfirst frames the comparison as $1,000 versus $10,000 for comparable discovery work, a claim repeated in industry coverage and the company's materials.

That matters for procurement cycles. Security teams already trade off coverage, speed, and noise. A specialist offering that demonstrably lowers cost per verified exploit could bend vendor selection toward focused players, especially for auditing widely used open source components where marginal dollars protect millions of users.

Investors have noticed the thesis before. Depthfirst raised large rounds earlier this year, and its public messaging notes internal tests where smaller models ran at an order of magnitude lower cost than frontier models while outperforming them on narrow benchmarks.

Where the incumbent advantage still counts

That is not the end of the story for labs like Anthropic. Independent evaluations, including those from industry groups, found Mythos produced breakthrough results on a range of red‑team and capture‑the‑flag problems, and it remains the baseline for autonomous multi‑stage exploit research rather than single‑target audits.

Large models retain advantages in transfer learning, wide context, and unexpected generalization, which can surface novel classes of issues that single‑task tuning might miss. Anthropic's Mythos, for example, was credited with uncovering long‑standing vulnerabilities across multiple codebases when it was first previewed, a capability that signals genuine frontier progress.

What to watch next

Three validations will decide whether Depthfirst's announcement changes the market. First, independent third‑party verification or coordinated disclosure from affected projects, such as FFmpeg, will confirm the technical claims; Depthfirst says disclosures are pending and that maintainers are being asked to coordinate .

Second, reproducibility across different codebases will show whether the company's harness is broadly applicable or tailored to a handful of targets. Third, pricing transparency from Mythos and other frontier offerings will let buyers compare cost per verified finding rather than headline model size, which is the metric that really drives budget decisions.

The broader lesson is practical. Security buyers should stop equating bigger with better on every task. For high‑value, repeatable workflows like fuzzing and exploit verification, a compact pipeline that combines a lean model, smart harnessing, and verification steps can be cheaper and more actionable than running a generalist frontier model at scale .

Depthfirst's push is a direct challenge to the dominant narrative that only the biggest models can deliver frontier capabilities. That challenge will be resolved not by press releases but by coordinated disclosures, independent audits, and the procurement choices of large cloud and enterprise defenders over the coming months.

","excerpt":"Depthfirst says a task‑tuned model found critical FFmpeg vulnerabilities Mythos missed, for about one tenth the compute cost, prompting debate over whether narrow, efficient AI pipelines can replace expensive generalist models for security audits.","tags":["Depthfirst","Anthropic","FFmpeg","open source security","task specific AI","AI vulnerability detection"],"imagePrompt":"A conceptual editorial image showing a small, focused searchlight illuminating a tangled mass of code and wiring; the light is warm and precise while a large, dim spotlight looms in the background.

Also read: AI Is Rewriting Who Gets Ahead At Work • New York Fed data says AI is not driving the hiring slowdown • Nous Research's new training method could change the economics of LLMs