The landscape of cybersecurity has reached a critical inflection point as commercial and open-source artificial intelligence models demonstrate a newfound capacity to autonomously identify and exploit software vulnerabilities. According to the latest findings from Forescout’s Verde Labs, the rapid evolution of large language models (LLMs) has transformed them from inconsistent research assistants into potent tools capable of discovering zero-day vulnerabilities in widely deployed software. This shift, documented throughout 2025 and into 2026, suggests that the barrier to entry for sophisticated cyberattacks has been significantly lowered, as AI systems now possess the reasoning capabilities to perform tasks that were once the exclusive domain of highly skilled human researchers.

The Rapid Evolution of AI Capability in Cybersecurity

Just one year ago, the integration of AI into vulnerability research was met with skepticism. In early 2025, Forescout’s testing indicated that 55% of the industry’s leading AI models failed even basic vulnerability research tasks, while a staggering 93% were unable to develop functional exploits. These limitations were largely attributed to "hallucinations," a lack of deep context regarding memory management in low-level programming languages, and the inability to chain multiple logical steps together to create a working exploit.

However, the 2026 data reveals a dramatic reversal. Every model tested by Forescout—spanning 50 different iterations across commercial, open-source, and "underground" categories—now successfully completes vulnerability research tasks. Perhaps more significantly, 50% of these models can now generate working exploits autonomously. This progress highlights a fundamental shift in the "agentic" capabilities of AI, where models no longer just predict the next word in a sentence but can actively interact with codebases, test hypotheses, and refine their outputs until a vulnerability is successfully triggered.

The most capable models identified in the study, Anthropic’s Claude Opus 4.6 and Moonshot AI’s Kimi K2.5, have demonstrated the ability to find and exploit vulnerabilities without the need for complex, highly specific prompting. This "zero-shot" or "low-effort" capability means that even individuals with limited technical expertise in exploit development can leverage these tools to target complex systems.

Case Study: The OpenNDS Zero-Day Discoveries

To validate the real-world implications of these advancements, Forescout utilized a combination of single prompts, the RAPTOR agentic framework, and proprietary extensions to test the models against OpenNDS. OpenNDS is a widely utilized open-source captive portal software often found in public Wi-Fi networks and enterprise guest access systems.

The AI-driven research successfully uncovered four previously unknown zero-day vulnerabilities within the OpenNDS codebase. This discovery was particularly notable because the specific section of code where the vulnerabilities were found had already undergone manual analysis by human experts at Verde Labs. The human researchers had initially cleared the code, failing to identify the flaws that the AI subsequently surfaced.

Rik Ferguson, Vice President of Security Intelligence at Forescout, emphasized the gravity of this development. "These are widely available AI models exceeding human capability," Ferguson stated. He noted that while these commercial models are highly effective, they may still operate at a different tier than specialized, non-public frontier models like Anthropic’s Claude Mythos, which is part of the secretive Project Glasswing. Nevertheless, the fact that commercially accessible models are outperforming human experts in specific code auditing tasks represents a paradigm shift in software assurance.

The Role of Agentic Frameworks: RAPTOR

The success of these models is largely tied to the implementation of agentic AI frameworks like RAPTOR. Unlike standard chatbots, agentic frameworks allow an AI to function as an autonomous "agent" that can execute a series of steps to achieve a goal. RAPTOR is an open-source framework designed specifically for cybersecurity research, supporting both offensive and defensive operations.

When an AI is placed within an agentic framework, it gains the ability to:

  1. Decompile and Analyze: Break down binary code or source code to understand its logic.
  2. Iterative Testing: Run the code in a sandbox, observe crashes, and analyze memory dumps.
  3. Self-Correction: Identify why a particular exploit attempt failed and rewrite the code to bypass security mitigations like Address Space Layout Randomization (ASLR) or Data Execution Prevention (DEP).
  4. Documentation: Generate reports on how the vulnerability was found and how it can be remediated.

The integration of RAPTOR with models like Claude Opus 4.6 has turned general-purpose AI into a specialized security auditor capable of working 24/7 without the fatigue or oversight typical of human researchers.

The Economics of AI-Driven Exploitation

One of the most significant findings in the Forescout report is the diverging cost-to-capability ratio among different AI tiers. While the most powerful models remain expensive, the emergence of high-performance, low-cost alternatives is democratizing access to exploit development.

Claude Opus 4.6, while highly capable, carries a significant price tag of approximately $25 per million output tokens. For a complex project requiring millions of tokens to analyze a large codebase, the costs can scale quickly. In contrast, Anthropic’s "frontier" model, Claude Mythos, which is reportedly capable of identifying thousands of zero-days across major operating systems, is priced even higher—between $25 and $125 per million tokens depending on the input/output ratio.

However, the research highlighted the disruptive potential of open-source models like DeepSeek 3.2. Forescout found that DeepSeek 3.2 could handle basic vulnerability research and exploitation tasks at a fraction of the cost. In many test cases, the total cost for a successful task execution was less than $0.70.

This creates a tiered ecosystem for both attackers and defenders:

  • High-End Research: Utilizing models like Claude Mythos or Opus 4.6 for targeting high-value, hardened targets like operating system kernels or encrypted communication protocols.
  • Mass-Market Exploitation: Utilizing budget-friendly models like DeepSeek to scan and exploit less secure IoT devices, web applications, and smaller open-source projects at scale.

Chronology of AI Advancement in Cybersecurity (2023–2026)

The journey to the current state of AI-driven vulnerability discovery has been characterized by exponential growth in model reasoning:

  • Late 2023 – Early 2024: Introduction of GPT-4 and Claude 2. Models show basic understanding of code but struggle with complex logic and frequently provide "broken" exploit code. Security filters often block attempts to generate malicious scripts.
  • Late 2024: Emergence of "jailbreaking" techniques and specialized "underground" models trained on leaked exploit databases. The first agentic frameworks begin to appear in academic circles.
  • Mid-2025: Forescout’s initial testing shows a 55% failure rate. Models are better at identifying bugs but remain poor at creating functional exploits that bypass modern OS defenses.
  • Late 2025: Release of Claude 4 series and Kimi K2. Models demonstrate "System 2" thinking—the ability to pause and reason through complex problems before responding.
  • Early 2026: Forescout reports 100% success in vulnerability research across 50 tested models. The discovery of four zero-days in OpenNDS confirms that AI has surpassed human-level performance in specific auditing scenarios.

Broader Implications and Defensive Strategies

The revelation that AI can now autonomously find zero-day vulnerabilities has profound implications for global cybersecurity. The traditional "window of vulnerability"—the time between a bug being introduced and it being discovered—is shrinking, but not necessarily in favor of the defenders.

If AI can find vulnerabilities faster than humans can patch them, the current model of reactive security is likely to fail. Forescout’s research suggests that organizations must adopt a "presumptive breach" posture. If large-scale initiatives like Project Glasswing can surface thousands of zero-days in critical software, it is a statistical certainty that most enterprise environments currently contain unknown vulnerabilities that AI will eventually find.

The Rise of AI-Powered Defense
The same tools being used to find vulnerabilities are also being deployed for defense. Agentic frameworks like RAPTOR are being used by "Blue Teams" to auto-patch code before it is even deployed. The future of cybersecurity appears to be an automated arms race—an "AI vs. AI" dynamic where the speed of the model and the efficiency of the framework determine the security of the system.

Geopolitical Considerations
The high performance of Kimi K2.5 (developed by China-based Moonshot AI) alongside US-based models like Claude indicates that the capability for AI-driven cyber operations is globally distributed. This limits the effectiveness of unilateral regulations or export controls on AI technology, as high-performance open-source models like DeepSeek continue to close the gap with proprietary Western models.

Regulatory and Ethical Challenges
The accessibility of these models raises urgent questions for AI providers. While Anthropic and others have implemented safety guardrails, the Forescout research demonstrates that these can often be bypassed or that the models are "capable enough" to provide the necessary components of an exploit without violating specific safety triggers.

As we move further into 2026, the cybersecurity industry must grapple with the reality that the "unknown" is becoming "known" at an unprecedented rate. The democratization of zero-day discovery means that the quality of software code must improve at the source, potentially through the same AI-driven auditing processes that currently threaten it. The era of manual code review is effectively over, replaced by an automated landscape where speed, scale, and algorithmic reasoning are the new currencies of digital security.

Leave a Reply

Your email address will not be published. Required fields are marked *