Anthropic’s Claude Finds 22 Firefox Vulnerabilities: A Turning Point for AI in Application Security

CyberSecureFox 🦊

A joint experiment by Anthropic and Mozilla has shown that large language models (LLMs) are already capable of identifying security flaws at industrial browser scale. In an automated security review of Firefox, the model Claude Opus 4.6 discovered 22 previously unknown vulnerabilities, many of them high severity. All issues have now been fixed in the Firefox 148 release, published at the end of last month.

AI vulnerability discovery in Firefox: key results and severity

According to Anthropic’s report, Claude Opus 4.6 identified 22 security vulnerabilities in Firefox’s codebase over roughly two weeks of analysis. Of these, 14 were classified as high severity, 7 as medium, and 1 as low. Mozilla estimates that this accounts for almost one-fifth of all high-priority bugs fixed in Firefox in 2025, underscoring the impact of AI-assisted security testing on a mature, complex browser.

The first tangible result appeared just 20 minutes after the model started its analysis, when Claude flagged a use-after-free vulnerability in the JavaScript engine. A use-after-free bug occurs when software continues to use memory after it has been freed, which can allow an attacker to execute arbitrary code by carefully controlling what is placed in that memory region. Each suspected issue reported by the model was subsequently validated manually in an isolated virtual environment before being accepted as a real vulnerability.

Technical details: scale of analysis and types of flaws found

During the experiment, Claude Opus 4.6 analyzed approximately 6,000 C++ source files related to Firefox. From this review, the model generated 112 unique reports of potential security problems. Human security engineers then triaged these reports, confirmed which ones were genuine, and developed patches.

Mozilla’s own write-up corroborates Anthropic’s findings and notes that the AI-driven review also led to the discovery of around 90 additional quality and robustness defects. These included assertion failures and subtle logical errors that traditional tools such as fuzzers and static analyzers tend to miss. This result supports a growing consensus in application security: a hybrid approach that combines fuzzing, static analysis, and AI-assisted reasoning delivers better coverage than any single technique alone.

Industry incident reports, such as the Verizon Data Breach Investigations Report and analyses from major incident response firms, consistently highlight software vulnerabilities in browsers and web applications as key entry points for attackers. The Firefox experiment illustrates how LLMs can help address this persistent problem by scaling expert-level code review across large, legacy codebases.

Can AI generate working browser exploits?

Beyond finding bugs, Anthropic and Mozilla also evaluated whether Claude could automatically generate functional exploits for confirmed vulnerabilities. The model was provided with details of validated issues and instructed to produce proof-of-concept attack code. Several hundred exploit-generation attempts were made, with an estimated API cost of around 4,000 USD.

Claude successfully produced working exploits for only two vulnerabilities. One of them targeted CVE-2026-2796, a 9.8 CVSS-rated flaw in the JavaScript WebAssembly JIT (just-in-time) compilation component. JIT bugs are particularly dangerous because they affect low-level code that transforms high-level JavaScript into native machine instructions. However, researchers emphasize that the generated exploit was reliable only in a controlled test environment, where important defenses such as the browser sandbox had been deliberately disabled.

Current limits: no automated exploit chains or sandbox escapes

Anthropic reports that, at this stage, Claude is not capable of autonomously building complex exploit chains that combine multiple vulnerabilities to escape the browser sandbox and gain persistent code execution on the underlying operating system. In the real world, such multi-stage exploit chains are what drive sophisticated zero-day attacks against modern browsers.

Nonetheless, the experiment suggests that the gap between finding vulnerabilities and exploiting them with the aid of AI is unlikely to remain wide for long. As models improve and are exposed to more exploit examples, the quality and reliability of AI-generated attack code can be expected to increase, raising the stakes for defenders.

Implications for browser security and application security programs

The Anthropic–Mozilla experiment demonstrates that AI has moved from a helpful assistant to a core capability in application security (AppSec). For large, long-lived projects like Firefox, LLM-based analysis can:

Accelerate discovery of complex logical bugs, race conditions, and corner cases that are hard to reach with traditional fuzzing alone;
Complement static analysis and manual code review, improving coverage across millions of lines of code;
Support regression analysis after major refactors or feature deployments by quickly re-examining security-sensitive components.

At the same time, the dual-use nature of this technology is clear. The same AI techniques that empower defenders could enable attackers to scale automated vulnerability discovery and exploit development. This amplifies the importance of timely patch management, layered browser defenses (sandboxing, site isolation, exploit mitigations), and secure coding practices embedded early in the software development lifecycle.

Taken together, the Firefox and Claude Opus 4.6 results indicate that ignoring AI in cybersecurity workflows is no longer viable. Organizations building complex software should invest in proactive bug bounty programs, integrate AI-powered code analysis into CI/CD pipelines, and regularly update their threat models to account for AI-accelerated attackers. Those who adapt fastest to a landscape where powerful AI tools are available to both defenders and adversaries will be best positioned to maintain the resilience of their digital infrastructure.

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.