Anthropic's Claude Mythos Shatters Security Auditing: 27-Year-Old OpenBSD Bug Found Autonomously for $50
A critical vulnerability that evaded 27 years of human security review, fuzzing, and audits within the hardened OpenBSD TCP stack was autonomously discovered by an AI agent for under $50. The flaw, exploitable with just two packets to crash any server, was found by Anthropic's Claude Mythos Preview in a single discovery campaign costing approximately $20,000. This represents a fundamental leap in capability, not an incremental improvement, signaling a paradigm shift in how software vulnerabilities will be found—and exploited.
The specific model run that surfaced the flaw cost less than $50, operating with no human guidance after the initial prompt. Benchmark results underscore the dramatic jump: on Firefox 147 exploit writing, Mythos succeeded 181 times versus just 2 for its predecessor, Claude Opus 4.6—a 90x improvement in a single generation. It also achieved 77.8% on SWE-bench Pro (vs. 53.4%) and 83.1% on CyberGym vulnerability reproduction (vs. 66.6%). The model saturated Anthropic's internal Cybench CTF at 100%, forcing red teams to pivot to real-world zero-day discovery as the only meaningful evaluation left.
This autonomous capability, which has already surfaced thousands of zero-day vulnerabilities across multiple systems, fundamentally rewrites the threat landscape. Security teams now face an adversary that can operate at machine speed and scale, uncovering flaws that have persisted for decades through conventional review. The discovery of the OpenBSD bug is a stark proof-of-concept: the playbook for vulnerability detection and defense needs an urgent, comprehensive overhaul. The economics of offensive security are being inverted, placing immense pressure on defenders to adapt to an AI-driven era of exploitation.