Machine-Speed Defense Needs More Than a Foundation Model

Written by: 
Surag Patel
Published on: 
Apr 23, 2026
On This Page
Share:

Machine-Speed Defense Needs More Than a Foundation Model

Anthropic and 60 CISOs defined what to defend. Here's the architecture, foundation models, enterprise context, and a deterministic harness, that makes it operationally real.

tl;dr

Anthropic and 60+ CISOs defined the defensive roadmap for AI-accelerated threats. Every recommendation assumes you can triage and remediate at machine speed. For enterprises, that takes three layers working together: foundation models for intelligence, your proprietary enterprise context for accuracy and prioritization, and a deterministic harness for reliability and cost efficiency at scale. Skip any layer and you get the same decade-old result: growing backlogs, months-long remediation, faster alerts that nobody trusts.

The 10-Hour Window

In 2018, the average time between a vulnerability's disclosure and its first confirmed exploitation was 2.3 years. Security teams had quarters to evaluate, prioritize, and patch. By 2024, that window had compressed to 56 days. In 2026, it collapsed to 10 hours.

We are in a new paradigm.

Zero Day Clock chart showing time-to-exploit declining from 2.3 years in 2018 to 10 hours in 2026
Source: Zero Day Clock: Time-to-exploit has collapsed from years to hours.

And that paradigm has accelerated even more in the last week, starting with Anthropic's Mythos model leak. Mythos discovered thousands of zero-days across every major operating system and browser including a 27-year-old OpenBSD bug that survived decades of manual code review.

The finding confirmed what security leaders had watched build for eighteen months: frontier AI models find and chain exploitable vulnerabilities together faster than any human team can patch them. In internal testing, Mythos generated 181 working Firefox exploits where Anthropic's previous model succeeded twice.

The response was equally unprecedented. Anthropic launched Project Glasswing, a $100M defensive coalition with AWS, Apple, Google, Microsoft, and eight other founding partners. The Cloud Security Alliance assembled more than 60 CISOs from Google, Netflix, Cloudflare, GitLab, Wells Fargo, and others to produce a risk register and action plan.

A coherent plan in both the macro and individual company sense is imperative. 72% of exploited CVEs in 2026 are zero-days, up from 16% in 2018. Daniel Kang at UIUC demonstrated automated exploit generation at $8.80 per working exploit. The economics of offense have never been more favorable. An attacker needs one working exploit. A defender needs to triage, prioritize, and remediate across an entire software factory, hundreds of times per day, continuously.

The CSA paper's deputy author, Anthropic's own advisory, and every analysis published in the past week converge on the same conclusion: you are now exposed for 99.9% of the vulnerability lifecycle.

The question is what it takes to turn those recommendations into reality within enterprises.

Model capability is necessary but not sufficient. What will define how organizations navigate the Mythos era is the architecture around the model: enterprise context for accuracy and prioritization, and a deterministic harness for reliability, efficiency, and scale. The rest of this post explains why.

Three-layer architecture equation: Foundation model times enterprise context times deterministic harness equals machine-speed defense

The Right Roadmap and How to Operationalize It

Anthropic published seven defensive recommendations alongside the Mythos announcement. Close your patch gap. Prepare for 10x finding volume. Find bugs before you ship them. Find vulnerabilities already in your code. Design for breach. Reduce your attack surface. Shorten incident response time. All substantive. All correct.

The CSA's Mythos-ready paper goes further. It introduces a 13-item risk register mapped to OWASP and NIST frameworks, an 11-item priority action list with aggressive timetables, and the concept of "VulnOps" as a permanent organizational function. It includes an honest acknowledgment of the human cost: "Burnout and attrition in security functions represent a direct operational risk," the paper states. "We cannot outwork machine-speed threats."

Both documents are required reading. Both represent serious, actionable thinking from people who understand the problem deeply. Neither implicitly say this, but most readers may assume the same thing: that AI capability alone, applying a foundation model to your code is sufficient to execute these recommendations at enterprise scale.

It is not. But not because the models are insufficient. Foundation models are, in fact, the essential starting point.

Foundation models are the general intelligence layer

Mythos itself proves the point. The leap from two working Firefox exploits to 181 was not the result of a purpose-built vulnerability scanner. It was the result of a more capable general-purpose model applied to the same task. Foundation models, both commercial frontiers like Mythos and the rapidly advancing open-source ecosystem, provide the reasoning, code comprehension, and pattern recognition that make AI-driven security possible in the first place. And they will continue to get better. Every model release improves code understanding, reduces hallucination rates, and expands the complexity of vulnerabilities that can be analyzed. This is the rising tide.

But a foundation model, no matter how capable, does not know your organization. It does not know your deployment topology, your compensating controls, your team's coding conventions, or which of your hundred thousand open findings are actually reachable from the internet. Anthropic's blog recommends that "a frontier model can usually propose a patch," shifting a developer's job from writing the fix to verifying one. Correct in theory. In practice, developers reject most generic model-generated fixes because they do not match codebase conventions, import unfamiliar libraries, or fail existing tests. The vulnerability stays open. The backlog stays exactly where it was. Studies consistently show that AI-generated code ships with significantly more security flaws than human-written code, not because the models lack capability, but because they lack the constraints that experienced developers internalize.

The model capability is there. What is missing is the context to direct it and the harness to make it reliable.

Enterprise context is the unlock

The real multiplier on top of foundation model intelligence is proprietary enterprise context: your codebase patterns, your deployment architecture, your scanner configurations, your team's historical merge behavior, your compensating controls. This context is what transforms a generic "critical finding" into an actionable, prioritized decision. It is what transforms a plausible-looking patch into a fix your developers actually trust and merge.

Without context, SAST and SCA tools generate findings with false positive rates routinely estimated between 70% and 90%. Engineers absorb the cost — industry surveys put the triage burden at multiple hours per developer per week, the majority of it wasted chasing noise. Context is what separates a marginal false positive reduction from a transformative one — the difference between triaging 600 findings a month and triaging 50.

The harness makes it deterministic, efficient, and scalable

Foundation models are probabilistic by design. They hallucinate. They produce inconsistent outputs across runs. And they are expensive to operate at the volume that enterprise defense demands. For security, where a missed vulnerability can mean breach, a false fix can break production, and the work needs to happen hundreds of times per day across an entire software factory, expensive, inconsistent and generic is not acceptable.

This is where the harness matters. A purpose-built orchestration layer adds three things that a foundation model alone cannot provide.

First, it makes outputs predictable and consistent. It constrains the model's outputs, validates them against your codebase's actual test suites and dependency graph, enforces style and convention matching, and ensures that what reaches your developers is as close to deterministic as possible. It is the layer that eliminates the unpredictability that keeps security leaders from trusting AI-generated changes in production.

Second, it makes the economics viable. Offense and defense have asymmetric unit economics. Kang's $8.80 per working exploit is the price of one successful attack against one target. Defense does not work in units of one. A mid-sized enterprise produces thousands of new findings per day across thousands of repositories and must make correct triage and remediation decisions on all of them, continuously. The same frontier-model capability that makes the attacker's economics thrilling makes the naïve defender's economics untenable. Project Glasswing's $100M funds a concentrated research sprint, not a model for continuous enterprise defense. A harness closes that gap by routing only the right context to the right model for each decision. The difference between running a frontier model on every finding and running it surgically on the findings that need it is the difference between a line item your CFO approves and one they reject.

Third, it abstracts against any individual model. The foundation model landscape is evolving rapidly — commercial frontiers, open-source alternatives, and specialized fine-tuned models all have different strengths, speeds, and cost profiles. A harness allows selective routing: use a smaller, faster model for straightforward triage classifications, a frontier model for complex exploitability analysis, and a code-specialized model for patch generation. This is not a theoretical optimization. It is how you run AI-driven defense economically at the scale of a real enterprise software factory without being locked to any single provider's pricing or capability curve.

This is what makes the difference between "AI that sometimes produces useful suggestions in a terminal window" and "a key operational facet of your scaled security program."

Every capability the roadmaps describe already exists in production somewhere. Triage automation, exploitability analysis, exposure-based prioritization, context-aware patch generation. None of these are theoretical. The question is whether they are operationalized with the context and harness architecture that enterprise security demands.

Where Context Matters Most and Where It Doesn't

This is not an abstract architectural argument. Where context sits in the stack determines which security outcomes are actually achievable at enterprise scale.

Diagram showing where context matters most across detection, triage, and remediation layers

Detection: foundation models already excel

For traditional vulnerability detection, finding known CVE patterns, identifying common injection flaws, flagging insecure configurations — foundation models already perform remarkably well without deep organizational context. Mythos proved this: it discovered thousands of zero-days using general code comprehension, not proprietary knowledge of any specific enterprise. Detection is the area where raw model capability matters most and where continuous model improvements, across both commercial frontiers and open-source alternatives, will compound fastest.

Where context does begin to matter for detection is at the frontier: business logic vulnerabilities, authorization flaws, and application-specific attack surfaces that no generic scanner can identify. These are the classes of vulnerabilities that have historically been invisible to automated tools because they require understanding what the application is supposed to do, not just what it does. As foundation models improve, the combination of model reasoning and enterprise context will unlock detection of vulnerability classes that were previously audit-only.

Triage: context is the difference between signal and noise

Triage is where context becomes essential. When the mean time-to-exploit is 10 hours and the industry's mean time-to-remediate still stretches into months, the bottleneck is not finding vulnerabilities. It is deciding which ones matter.

A foundation model without your organizational context cannot make that decision. It does not know whether the vulnerable component processes untrusted input, whether a WAF sits in front of it, whether compensating controls mitigate the risk, or whether the service is internet-facing or buried behind three network segments. Enterprise triage automation needs to reduce false positives by 95% or more through exploitability verification, deployment context analysis, compensating control awareness, and business criticality weighting. That number is not achievable without deep organizational context. It is achievable with a harness that systematically feeds that context into every triage decision.

Remediation: context is what gets fixes merged

Remediation is where the full architecture — foundation model, enterprise context, and deterministic harness — must work in concert. The measure of remediation is not "fix proposed." It is "fixed in production." Early evidence from context-aware remediation systems shows merge rates above 70%, a categorically different result than generic AI-generated patches that developers routinely reject. The difference is not model capability. It is whether the fix matches the codebase's conventions, passes its test suites, and respects its dependency constraints. Developers trust fixes that understand their codebase, not fixes that generate plausible-looking code.

The harness is what makes this reliable and economical. It constrains the foundation model's output against your actual test suite, your dependency graph, your linting rules. It enforces determinism where the model alone would produce variance. And it routes each remediation task to the right model at the right cost — not every fix requires a frontier model. Without this sophisticated orchestration, you get one of two outcomes: proposed patches that look plausible but break tests and sit unmerged, or an AI compute bill that dwarfs your security budget.

The backlog: why this compounds

66% of organizations carry 100,000+ open vulnerabilities, and the average codebase's known vulnerability count continues to grow year over year. AI-accelerated offense does not just create new vulnerabilities faster. It makes existing unfixed vulnerabilities more exploitable, because the same models that find new bugs can weaponize your known-but-unpatched ones overnight.

This is why the architecture matters. Foundation models provide the intelligence. Context provides the accuracy and alignment to match enterprise expectations. The harness provides the reliability, the economic efficiency, and the model flexibility to operate at scale. Without all three, VulnOps — the CSA's term for vulnerability operations as a permanent function — is either too unreliable to trust or too expensive to sustain.

How the process changes

The shift is from human-as-executor to human-as-verifier and hopefully soon human-as-watcher.

In the current model, a scanner finds a vulnerability. A security engineer investigates whether it matters. A developer writes a fix. Another developer reviews the fix. Weeks pass between each step.

In the new model, automated triage determines exploitability and business impact using your organizational context. Automated remediation generates a fix matched to your codebase through a deterministic harness. A developer reviews and approves. The entire cycle completes in the time the old model spent on step one.

What This Looks Like in Practice

Consider the same critical CVE disclosed at two organizations.

At Company A, the vulnerable library processes untrusted input from an internet-facing API with no web application firewall in front of it. An attacker can reach the vulnerable code path in three hops from the public internet. This is a stop-everything situation.

At Company B, the same library exists in an internal batch processing service behind two network segments, with rate limiting and input validation at every boundary. The vulnerable code path is technically present but practically unreachable without pre-existing internal access.

A foundation model alone sends both companies the same critical alert. A system with enterprise context sends Company A a critical finding with an auto-generated fix and sends Company B a low-priority informational with a recommendation to address it during the next maintenance window. Both security teams spend their time on what actually matters. The model intelligence was identical. The context made the difference.

The same dynamic applies to remediation. A foundation model generates a fix for a SQL injection vulnerability. Without context and a harness, it applies a parameterized query pattern the codebase does not use, imports a library the project has never included, and formats the code in a style that does not match any existing file. The developer looks at the pull request, sees code that clearly does not belong, and gets bogged down.

The same fix, generated through a harness that enforces awareness of the codebase's existing data access patterns, validates against the test suite, and matches style conventions, produces a pull request that looks like something the team wrote themselves. The developer reviews it in minutes, confirms the logic, and merges. The foundation model provided the intelligence to construct the fix. The context told it what patterns to follow. The harness ensured the output was deterministic, trustworthy, and generated at a cost that makes sense when you need to do this hundreds of times a day.

What Comes Next

The industry's threat models have caught up to reality. Anthropic, the CSA, and more than 60 CISOs agree that AI-accelerated offense demands a different defensive posture. The foundation models are here, and they will keep getting better, both commercial and open-source, compounding the value of whatever context and harness layers are built on top of them.

This is not a technology capability conversation. Every frontier AI lab can build a model that finds vulnerabilities and proposes patches. What separates organizations that handle the Mythos era from those that do not is whether they have built the context and harness layers that turn model capability into enterprise-grade outcomes. Model intelligence without organizational context and deterministic orchestration produces the same result the industry has been stuck with for a decade: more findings, same backlog, same months-long remediation cycles. Just faster alerts that nobody trusts.

The roadmap is clear. What remains is the architecture that makes it real.

Machine-speed threats need machine-speed defense. And machine-speed defense needs more than a model. It needs your proprietary context and the harness that makes it operational.

Key references cited in this post:


Related Resources

Weekly Intel

AppSec Weekly

The briefing security leaders actually read. CVEs, tooling shifts, and remediation trends — every week in 5 minutes.

Weekly only. No spam. Unsubscribe anytime.