Claude Code used in large-scale cyber attacks
According to Anthropic:
In mid-September 2025, we detected suspicious activity that later investigation determined to be a highly sophisticated espionage campaign. The attackers used AI’s “agentic” capabilities to an unprecedented degree—using AI not just as an advisor, but to execute the cyberattacks themselves.
Who could have seen this coming.
This campaign has substantial implications for cybersecurity in the age of AI “agents”—systems that can be run autonomously for long periods of time and that complete complex tasks largely independent of human intervention. Agents are valuable for everyday work and productivity—but in the wrong hands, they can substantially increase the viability of large-scale cyberattacks.
Our products are just so powerful they will bring on the end times. But they are also available to nice people!
At this point they had to convince Claude—which is extensively trained to avoid harmful behaviors—to engage in the attack. They did so by jailbreaking it, effectively tricking it to bypass its guardrails.
Guardrails that we all know are pointless and ineffective, but are hyped up at every opportunity as “important safeguards” and “alignment with human values”.
The sheer amount of work performed by the AI would have taken vast amounts of time for a human team. The AI made thousands of requests per second—an attack speed that would have been, for human hackers, simply impossible to match.
And yet they seemingly have no worthwhile monitoring systems in place to detect this kind of thing. Just a thought, why not have Claude analyse these things for you, Anthropic?
If you’re interested, here is the full run-down where Anthropic try and spin the fact that state-actors hijacked their service to attack multinationals as them being an advanced security research organisation.