Google says it stopped a planned zero-day campaign built with AI, giving founders a clear warning: attackers are adopting the same coding acceleration tools as everyone else.
The AI security debate just moved from theory to incident response. Google Threat Intelligence Group says prominent cybercriminal actors were preparing a mass exploitation campaign against a popular open-source, web-based system administration tool, using a zero-day that could bypass two-factor authentication.
That matters because this was not a flashy lab demo or a conference-stage proof of concept. The exploit was implemented in Python, required valid credentials, and targeted a logic flaw in the way the software trusted parts of its own 2FA flow. In other words, this was the kind of practical weakness that can sit quietly inside real infrastructure until someone finds it, tests it, and turns it into a repeatable attack.
Google has not named the affected product, which is standard when disclosure and mitigation are still sensitive. The company says it worked with the vendor and disrupted the campaign before it became a broad exploitation event. That detail should not be lost. The story is not that AI produced unstoppable malware. The story is that AI may have helped attackers find a subtle authentication flaw fast enough to make mass exploitation worth planning.
As The Guardian reported, Google researchers now believe AI-powered hacking has shifted in a matter of months from an emerging concern to an industrial-scale threat. The same report said criminal groups and state-linked actors from China, North Korea and Russia appear to be using commercial models, including Gemini, Claude and OpenAI tools, to refine attacks, build malware, test operations and move faster across the attack chain.
Google was careful about what it could prove. It said the exploit was developed with AI, not that a model independently chose a target, found the bug, wrote the code and launched the operation without human direction. That distinction is important. Most serious cyberattacks are still human-led. But the economics are changing when an attacker can ask a model to reason through unfamiliar code, produce an exploit scaffold, improve error handling and clean up payloads that might once have taken much longer to prepare.
The clues were not subtle to researchers. The Python script reportedly contained signs associated with large language model output, including a hallucinated CVSS score, unusually clean educational formatting and textbook-style programming elements that would be odd in a lean criminal exploit. That does not make the exploit harmless. It makes it more revealing. AI-generated code can carry fingerprints, but those fingerprints may fade as attackers become more skilled at editing model output.
The flaw itself also points to where the next wave of risk may sit. Traditional scanners are good at finding known bad patterns: unsafe input, exposed secrets, vulnerable dependencies and common memory bugs. Logic flaws are harder. If a developer hardcodes a trust assumption inside an authentication path, the application may appear to function correctly while still being strategically broken. A model that can read intent across files and compare that intent with exception paths becomes useful to defenders. It becomes useful to attackers for exactly the same reason.
Mythos changed the mood
This incident lands after weeks of anxiety around Anthropic's Mythos model, which the company has described as unusually capable at finding zero-day vulnerabilities. Anthropic limited access to Mythos through a defensive security program, arguing that broader release could create unacceptable risk. That decision was controversial, but it framed the current moment clearly: frontier coding models are no longer just helping developers ship features. They are beginning to compress vulnerability discovery itself.
For startups, the lesson is practical rather than philosophical. If AI helps attackers move from bug discovery to working exploit faster, then patch cycles measured in weeks become harder to justify. The old pattern of waiting for a quarterly maintenance window, checking whether a vulnerability is being exploited, and then deciding whether to prioritize it starts to look too slow for internet-facing systems.
Security teams should treat open-source administration tools, identity panels, CI systems, dashboards and remote management software as higher-risk assets, even when those tools are not part of the customer-facing product. Many startups run lean infrastructure stacks with a mix of open-source software, cloud services and outsourced vendors. Attackers do not care which system is considered core. They care which system gets them a session, a token or an administrative path into something more valuable.
The immediate changes are not exotic. Maintain a real inventory of externally reachable tools. Patch authentication and admin software first. Monitor successful logins that behave strangely, not just failed attempts. Review 2FA bypass paths, backup codes, remembered-device logic and API exceptions. Ask vendors how quickly they handle private vulnerability disclosure and whether they test business logic with AI-assisted review, not only dependency scanners.
Founders also need to update their assumptions about AI coding inside their own companies. The same systems helping engineers generate pull requests can also introduce fragile authorization logic if review becomes too casual. AI-written code should move faster through drafting, but not faster through security review where trust boundaries, account recovery and administrative privileges are involved.
The market implication is straightforward. AI security will not be a separate category bolted onto the stack. It will become part of normal engineering hygiene, from code review to vendor selection to incident detection. The companies that adjust early will not eliminate zero-days. They will shorten the gap between discovery, defense and recovery. In this next phase, that gap may be the whole game.
Also read: Canada widens its AI deepfake bill to cover nearly nude images • Novo turns a Parkinson's bet into a startup test • Oregon is making data centers pay more for the grid they need