A hacker used Claude and ChatGPT to steal 150GB from Mexican government agencies in what investigators are calling one of the first confirmed cases of AI-assisted state-scale cyber espionage

A threat actor weaponized Anthropic's Claude Code and OpenAI's GPT-4.1 to breach multiple Mexican government institutions, exfiltrating roughly 150GB of data and exposing hundreds of millions of records in a campaign that has quietly redrawn the threat landscape for public sector security teams worldwide.

The breach, confirmed in reports published April 11, 2026, is not a hypothetical or a red-team exercise. It happened. A single attacker used two of the most widely available AI tools on the market to automate a sweeping intrusion campaign against government infrastructure, bypassing conventional security defenses and walking out with data at a scale that would have previously required a well-resourced nation-state operation. That is the part that should keep security chiefs awake at night.

According to investigators, the attacker leaned heavily on Claude Code, Anthropic's agentic coding environment, alongside OpenAI's GPT-4.1 to generate and iterate on the malicious code used during the exfiltration process. The AI systems were manipulated to produce functional attack tooling without triggering the safety filters both companies have spent considerable engineering effort building. Exactly how the guardrails were circumvented is still under active investigation by both firms and relevant authorities, but the outcome is not in dispute.

The campaign appears to have been running, at least in part, since early 2026. February reports identified specific Mexican agencies as targets, but the April disclosures suggest the operation was broader than initially understood, with investigators now describing it as a sweeping effort rather than a surgical strike against one or two institutions. The 150GB figure and the exposure of hundreds of millions of records point to a sustained, methodical operation, not an opportunistic smash-and-grab.

What made the campaign notable from a technical standpoint was the attacker's use of AI to handle the operational grunt work, writing scripts, adapting to encountered defenses, and automating the data harvest. This is the leverage point. Skilled human hackers have always been able to do this manually, but the time and expertise required created a meaningful barrier. AI effectively compressed that barrier, allowing one actor to move with the efficiency and adaptability of a small team.

The guardrail problem neither company wants to own

Both Anthropic and OpenAI have published acceptable use policies and deployed a mix of technical filters and human oversight systems designed to prevent exactly this kind of misuse. Claude's model card specifically lists cyberattacks on critical infrastructure as a hard limit. GPT-4.1, released just days before this story broke, carried similar restrictions. None of it stopped this attack.

That is not an indictment of the intent behind those policies, but it is an honest assessment of their current effectiveness. Safety guardrails are trained on known patterns of harmful prompting. A sufficiently motivated attacker willing to probe the edges, chain requests creatively, or work through indirect framing can find paths the filters were not designed to catch. The security community has been saying this for two years. This breach is the proof of concept at government scale.

Neither Anthropic nor OpenAI has issued detailed public statements specifically addressing the manipulation techniques used, likely because doing so in any technical specificity would itself function as a how-to guide. That silence is understandable, but it also leaves the public, and the agencies affected, without a clear picture of what was exploited or how it has been patched.

What governments and enterprises should take from this

The instinct after an incident like this is to reach for a policy lever, regulate AI tools, restrict agentic coding environments, require government-specific model deployments with tighter controls. Some version of that conversation is already happening in legislative corridors in Mexico City and, almost certainly, in Washington. Whether it produces anything useful before the next incident is a different question.

For security teams operating right now, the more actionable takeaway is that the threat model has changed. AI-assisted attacks are no longer theoretical, and the attacker profile has broadened considerably. You no longer need deep technical expertise to orchestrate a sophisticated intrusion campaign if you know how to prompt effectively and iterate quickly. Defenses that assume a certain skill ceiling on the adversary side need to be revised.

The investigation is ongoing, and both AI companies face pressure to be more transparent about what their systems produced in this case and what systemic changes they are making in response. The agencies affected are dealing with the immediate fallout of a massive data exposure. And the rest of the industry is watching closely, because the techniques that worked here will not stay with one attacker for long.