The frontier labs continue to push into security in ways both promising and noise-generating. This week it was OpenAI announcing the launch of daybreak to bring security more natively to "change the way software is built and defended." Not sure from reading their press release exactly what that means but I can tell you that we are hearing constantly from our clients and prospects about whether they can just point frontier models at the problem and call it a day (TL/DR: you can't, our CEO has written extensively about this as has our CTO in terms of the context engineering, harness, and enterprise deployment realities required to make these workflows cost effective and efficient).
In other news more compromised npm packages exposed the supply chain side yet again as a prime attack vector, Google confirmed AI was used to find and develop a working zero-day exploit, and MCP insecurity continues to be a problem.
More below.
Google's Threat Intelligence Group confirmed that a cybercrime organization used AI to develop a working zero-day against an open-source web administration tool. The exploit bypassed two-factor authentication by targeting a semantic logic flaw: a developer had hardcoded a trust assumption that contradicted the application's authentication enforcement. Pattern-matching scanners cannot find this vulnerability class. The AI did.
State-backed actors from China and North Korea have used AI for vulnerability discovery for some time. What changed this week is the criminal side. Google's report marks the first confirmed case where a non-state threat actor shipped an AI-assisted exploit to production. The flaw class matters: semantic logic errors, authentication bypasses, and hardcoded trust assumptions are what static analysis tools miss because they require understanding application intent, not just code syntax.
The detection gap this creates is specific. AI identifies vulnerabilities through logic analysis that produces zero scanner alerts before exploitation. If your team's signal problem comes from scanners flagging too many low-confidence findings, the inverse problem is now confirmed: the findings that matter most may not be in the queue at all.
The Shai-Hulud campaign produced the first npm supply chain attack with valid SLSA Build Level 3 attestations, compromising 169 packages across TanStack, Mistral, Squawk, and UiPath. The attack chained a GitHub Actions "Pwn Request," cache poisoning, and OIDC token extraction from runner memory. The resulting packages were cryptographically signed by legitimate build infrastructure. Downstream consumers and dependency scanners had no basis to reject them.
Aikido's forensics team traced the credential harvest to CI/CD secrets, npm tokens, and GitHub API keys from any machine that ran a postinstall script. The payload executed automatically on npm install. The same week, TeamPCP published a backdoored Checkmarx Jenkins AST plugin to the Jenkins Marketplace, and a malicious Hugging Face repository impersonating OpenAI's Privacy Filter accumulated 244,000 downloads before removal.
Three attack vectors, three supply chain layers, one week. Package repositories, CI/CD tooling, and AI model repositories all fell in the same news cycle. The Shai-Hulud case invalidates a core SLSA assumption which is that trustworthy build infrastructure means trustworthy attestations.
A researcher chained three bugs in LiteLLM v1.82.3-stable from internal_user API key to root RCE in three HTTP requests at Pwn2Own Berlin 2026, winning a $40K bounty. The chain matters more than the bounty. LiteLLM is the proxy layer in front of frontier models for many of the enterprises currently shipping AI features. If you wrapped GPT-5 or Claude behind LiteLLM to centralize keys, rate-limit, or route across providers, the path from "intern with an API key" to "root on the proxy host" is now three HTTP requests.
Two of the three bug classes are SSTI in template-rendered fields that an internal-user-tier token can write to. The third is path-traversal in a logging endpoint. CWE-94, CWE-22, CWE-78: pattern-matchable, indexable, the kind of bug a competent SAST run should have caught before merge. The fact that they shipped in a release tagged "stable" tells you AI infrastructure projects are getting velocity-driven the same way early web frameworks were in 2009: feature shipping has outrun the security review baseline that mature web frameworks now enforce by default.
No confirmed in-wild exploitation yet. The Pwn2Own writeup is public and the exploit path is documented. Treat the gap between "demonstrated at a research venue" and "scripted into a botnet" as days, not weeks.
Capsule Security's internet scan found 402,599 AI agent hosts directly reachable from the public internet across 36 services, with no authentication requirement. A concurrent Knostic analysis found 1,800+ Model Context Protocol servers exposed without authentication. Both findings document the same pattern: AI agent deployment velocity is outrunning basic security configuration.
AI agent framework designers did not build in the security-by-default assumptions that shaped web application deployment over the last decade. MCP launched without authentication in the base spec, and organizations that deployed quickly inherited that default. The result is an attack surface that is public-facing, broadly scoped, and largely undocumented in most security inventories.
Linux kernel maintainers proposed a parallel response to a different flavor of the same problem: an emergency runtime "killswitch" to disable vulnerable kernel functions while patches are built and distributed. The prompt was Dirty Frag (CVE-2026-43284, CVE-2026-43500), a privilege escalation pair under active exploitation across Ubuntu, RHEL, CentOS, and OpenShift. Both the MCP exposure data and the killswitch proposal reflect the same operational gap: systems deploy faster than security configuration follows, and exploitation concentrates in that window.
CVE-2026-43284 + CVE-2026-43500 — Dirty Frag — Linux Kernel — Privilege Escalation
Local → root via chained xfrm page-cache write and RxRPC memory fragment flaws. Microsoft confirmed in-the-wild exploitation. Exploits entry points include SSH, web-shell, container escape, and low-priv processes. Affected: Ubuntu, RHEL, CentOS, AlmaLinux, OpenShift. Patch and reboot. Linux kernel maintainers are evaluating a runtime killswitch as an interim option.
Shai-Hulud npm worm — 169 packages compromised — TanStack, Mistral, Squawk, UiPath
GitHub Actions OIDC token extraction produced valid SLSA Build Level 3 attestations for malicious packages. Payload auto-executed on npm install. If your environment ran affected packages May 11-12, rotate all CI/CD secrets, npm tokens, and GitHub API keys. Aikido forensics report.
Checkmarx Jenkins AST Plugin backdoor — supply chain compromise via credential theft
TeamPCP published a backdoored version to the Jenkins Marketplace after stealing credentials from Checkmarx's GitHub repository. Revert to the safe plugin version (see Checkmarx advisory). Rotate all secrets from Jenkins environments where the malicious version ran.
LiteLLM 3-bug RCE chain (CVE-2026-42203) covered in Deep Dive above.
SAP Commerce Cloud + S/4HANA — Critical RCE and Data Access — May 2026 Security Update
15 CVEs in May patch bundle, including two critical flaws: one enabling unauthenticated code execution in Commerce Cloud, one enabling data theft in S/4HANA. No active exploits confirmed. Apply within your standard SAP patch cadence. Exploit-risk does not currently warrant out-of-cycle deployment.
dnsmasq — 6 new CVEs — Heap overflow, heap corruption, code execution, DNS cache poisoning
Memory safety cluster across DNS response parsing and DHCP handling. Enables cache poisoning, security control bypass, and local privilege escalation. High exposure due to dnsmasq prevalence in routers, IoT, and container networking. CVE numbers pending full disclosure. Update when patches are available.
Five links that add to this week's story — not covered in the Deep Dives above.
Primary Source
Google GTIG: AI-Assisted Vulnerability Exploitation and Initial Access — Google Cloud Blog The actual report behind Deep Dive 1. Hallucinated CVSS scores and educational docstrings were the AI fingerprints. Google's Big Sleep agent (DeepMind + Project Zero) independently discovered the same flaw before mass exploitation. Read the primary source, not the headlines.
Technical Deep Dives
Dirty Frag: Full Exploit Writeup — Hyunwoo Kim (@v4bel) The complete PoC walkthrough for CVE-2026-43284 + CVE-2026-43500. Chains two page-cache corruption bugs (xfrm/ESP + RxRPC) into a deterministic local-to-root with no race condition. Affects kernels back to 2017. If you run Linux in production, read this before your next patch window.
When the Worm Forged Its Own Security Certificate — Lucie Cardiet, Vectra AI Deep forensics on how Shai-Hulud extracted OIDC tokens from GitHub Actions runner process memory, obtained signing certificates from Sigstore's Fulcio CA, and produced valid SLSA Build Level 3 attestations for malicious packages. The best technical breakdown of why SLSA failed.
Thought-Provoking
Defender's Guide to Frontier AI Impact on Cybersecurity: May 2026 Update — Palo Alto Networks, Unit 42 Unit 42 estimates organizations have 3-5 months before attackers broadly access frontier AI cyber capabilities. They scanned 130+ products, found 75 legitimate vulnerabilities. The clock on the AI offense-defense gap has a number on it now.
AppSec Didn't Need a Faster Way to Find Bugs — Securely Built (Substack) Contrarian take on Daybreak: the problem was never finding more bugs. Scanning tools already produce hundreds of thousands. What defenders actually need is exploitable proof and self-healing environments. The strongest pushback on the "AI finds vulns" narrative this week.
Pixee Research & Long-Form (For When You Have 30 Minutes Instead of 8)
• What We Learned Reading 30 Security Reports So You Don't Have To — A 2026 industry meta-analysis. We extracted 155+ quantitative data points from 30 reports across DBIR-class threat intelligence, AppSec/DevSecOps surveys, AI code-security benchmarks, and supply-chain studies. Useful when the cross-vendor disagreement on a number (say, false positive rates or time-to-exploit) is itself the story.
• The AI-Generated Zero-Day Is Here. Your Scanner Missed It. — Our deeper analysis of this week's Google GTIG disclosure: why semantic logic flaws fall outside SAST's detection grammar, the four code-pattern categories where this failure mode concentrates, and what an "AI-augmented review" pipeline looks like for the paths your scanner can't reach.
• AI-Accelerated Offense: The Defense Gap CISOs Must Close — Why generic AI defense fails against the class of logic-based vulnerabilities this week's zero-day exploited. Background reading for the staff conversation about whether your existing tooling can close this gap or whether it's an architectural one.
The briefing security leaders actually read. CVEs, tooling shifts, and remediation trends — distilled into 5 minutes every week.
Join security leaders who start their week with AppSec Weekly. Free, 5 minutes, no fluff.
First briefing drops this week. Check your inbox.
Weekly only. No spam. Unsubscribe anytime.