Copy Fail (CVE-2026-31431): The Linux Kernel Flaw Threatening Millions of Cloud Workloads
Key Finding
A nine-year-old optimization in the Linux kernel’s cryptographic subsystem has, this week, become the single largest privilege escalation risk facing cloud infrastructure since Dirty Pipe. CVE-2026-31431 — nicknamed “Copy Fail” — affects every mainstream Linux distribution built since 2017, including RHEL, Ubuntu, SUSE, Amazon Linux, and the host OS underpinning the vast majority of Kubernetes nodes in production today. A working proof-of-concept exploit is 732 bytes long, requires no user interaction, and works unmodified across distros. CISA added the CVE to the Known Exploited Vulnerabilities (KEV) catalog within hours of public disclosure.
If you operate workloads in AWS, Azure, GCP, or a self-hosted Kubernetes cluster, this is the threat you should be triaging today — not next sprint.
Technical Analysis: Why Copy Fail Is Different
The Root Cause: A 2017 In-Place Optimization
The flaw lives in the algif_aead module, part of the AF_ALG userspace crypto API that exposes kernel cryptographic primitives to user-mode applications. In 2017, kernel commit 72548b093ee3 introduced an in-place optimization for AEAD (Authenticated Encryption with Associated Data) operations — a sensible performance improvement that, in theory, allowed the kernel to encrypt or decrypt buffers without an intermediate copy.
In practice, the authencesn algorithm reuses the caller’s destination buffer as a scratch pad mid-operation. When the destination buffer is a chained scatterlist (a common pattern in Linux memory management), the algorithm writes four controlled bytes past the legitimate output region — across a scatterlist boundary — and then fails to restore them.
Those four bytes land in adjacent kernel memory pages. With careful heap grooming and page cache manipulation, a local attacker can place a target page (for example, the in-memory copy of /usr/bin/sudo or /usr/bin/passwd) at that boundary and overwrite specific bytes of executable code — without touching the file on disk.
The Exploit Primitive
The exploitation pattern is elegant and brutally simple:
- The attacker reads a privileged setuid binary into the page cache (a no-op for any unprivileged user).
- The attacker triggers the
algif_aeadoperation with a crafted scatterlist whose destination boundary aligns with a chosen offset inside the cached binary page. - Four controlled bytes are written into the cached page. The kernel marks the page dirty for cache purposes but the on-disk file is unchanged.
- The attacker invokes the modified binary. The kernel executes the corrupted instructions from the page cache, granting root.
- After reboot — or after the page is evicted — the binary on disk is pristine. Forensic teams looking for modified binaries find nothing.
This is a CVSS 7.8 vulnerability with a CVSS 10 operational impact, because the on-disk integrity check that most file integrity monitors (Tripwire, AIDE, OSSEC, Falco’s filesystem rules) rely on will not detect the compromise. The page cache is the attack surface; the disk is untouched.
Attack Vector and Privilege Requirements
The vulnerability is local (AV:L), requires low privileges (PR:L), and no user interaction (UI:N). Standalone, this means an attacker needs an unprivileged shell on the target. In a cloud or Kubernetes context, that unprivileged foothold is trivially obtained via:
- A compromised CI/CD job (any pipeline runner with
kubectl execor a non-root container) - A leaked SSH key on a bastion host
- A container breakout chained from a separate runtime vulnerability
- A misconfigured pod with
hostPID: trueor shared namespaces - A long-running developer shell in a multi-tenant Jupyter or Notebook environment
Once root is achieved on the node, the attacker has full access to every container scheduled on it, every secret mounted into those containers, and the kubelet’s credentials — which usually means lateral movement to the rest of the cluster.
Impact Assessment
Affected Platforms
Any Linux kernel from 4.10 through the patched May 2026 releases is vulnerable. Specifically confirmed by vendor advisories:
- Red Hat Enterprise Linux 7, 8, 9 (RHSB-2026-02)
- Ubuntu 18.04 LTS, 20.04 LTS, 22.04 LTS, 24.04 LTS
- SUSE Linux Enterprise Server 12, 15
- Amazon Linux 2, Amazon Linux 2023 (used as the default host OS for EKS managed node groups)
- AlmaLinux, Rocky Linux, Oracle Linux (RHEL-based)
- Container-Optimized OS (COS) for GKE — patched node images rolling out
- Azure Linux (CBL-Mariner) — patched releases available
The kernels powering Kubernetes — whether managed (EKS, AKS, GKE) or self-hosted — are vulnerable until the patched kernel package is installed and the node is rebooted or live-patched.
Business Impact
For organizations running production workloads on shared Kubernetes clusters, the impact rating is Critical:
- Lateral movement risk: Any compromised pod on a vulnerable node escalates to node root, then to every co-tenant on that node.
- Compliance exposure: PCI-DSS, HIPAA, and SOC 2 controls assume node-level isolation. Copy Fail breaks that assumption.
- Detection gap: Standard file integrity monitoring does not detect the post-exploit state. EDR products that rely on on-disk binary hashes are blind to this.
- Cloud blast radius: Workload identity tokens (IRSA on EKS, Workload Identity on GKE, Managed Identity on AKS) become attacker-accessible once kubelet credentials are stolen, enabling cloud control-plane abuse.
Risk rating: Critical (9.0+ operational, 7.8 CVSS base).
CloudShieldSecure Perspective
This vulnerability is a textbook example of why on-disk file integrity monitoring is no longer sufficient for cloud workload protection. The attack surface for modern cloud breaches is the kernel page cache, the container runtime, the workload identity layer, and the cluster control plane — not the filesystem.
CloudShieldSecure’s runtime detection model focuses on three signal layers that catch Copy Fail-style attacks even before vendor patches are available:
- Behavioral anomaly detection on setuid binary execution — we baseline normal usage of
sudo,passwd,pkexec, and other privileged binaries per workload, and alert when execution patterns deviate. A freshly compromisedsudotriggering an unusual syscall sequence is flagged in seconds. - Kernel syscall telemetry via eBPF — CloudShieldSecure agents trace
AF_ALGsocket usage and flag the specificalgif_aeadoperation pattern used by the public Copy Fail PoC. This is a high-fidelity signature with effectively zero false positives in a production workload. - Workload identity abuse detection — even if a node is fully compromised, abnormal use of kubelet credentials or IRSA tokens (calls from unexpected pods, calls to APIs the workload has never previously touched) is surfaced in the CloudShieldSecure control plane within minutes.
This layered approach is what we mean by deep cybersecurity weakness and vulnerability findings with exception reporting. Patching is necessary but never sufficient — production environments need a detection fabric that assumes the patch has not yet been applied.
Recommended Actions
A practical, sequenced response checklist for cloud and platform security teams:
- Inventory. Pull a kernel version report across every node in every cluster. For EKS, query the
kubernetes.io/osandkubernetes.io/archlabels alongsidenode.status.nodeInfo.kernelVersion. For self-hosted, usekubectl get nodes -o jsonpath='{.items[*].status.nodeInfo.kernelVersion}'. - Patch the kernel. Apply the patched kernel package from your distro vendor. RHEL/CentOS:
dnf update kernel. Ubuntu:apt upgrade linux-image-generic. Amazon Linux 2:yum update kernel. For Kubernetes nodes, drain and cycle the node — or replace it via your managed node group upgrade workflow. Live-patch services (Canonical Livepatch, kpatch, Oracle Ksplice) have shipped fixes; apply if reboot windows are constrained. - Disable AF_ALG if you don’t use it. Most workloads do not use the userspace crypto API directly. Adding
CONFIG_CRYPTO_USER_API_AEADto a kernel blocklist viamodprobe.dremoves the attack surface entirely. Test before deploying broadly — TLS userspace acceleration in some environments depends on AF_ALG. - Pod Security Standards. Enforce
restrictedPod Security Standards across all namespaces. Specifically: denyhostPID,hostNetwork,hostIPC; requirerunAsNonRoot: true; requireallowPrivilegeEscalation: false; require a read-only root filesystem where workload tolerates it. - Audit privileged service accounts. Any service account with
cluster-admin,system:masters, or unconstrainednodes/proxyaccess is a post-exploitation accelerator. Constrain or rotate. - Hunt for the indicator. Search EDR and audit logs for unprivileged processes opening
AF_ALGsockets withaeadsocket types — this is uncommon in normal workloads and is the syscall fingerprint of the public PoC. - Update detection rules. If you run Falco, Tetragon, or a similar runtime tool, add a rule for
socket(AF_ALG, SOCK_SEQPACKET, 0)followed by abind()to asalg_typeofaead. Block or alert based on your environment’s risk tolerance.
Sources and References
- Red Hat: RHSB-2026-02 Cryptographic Subsystem Privilege Escalation - Linux Kernel (CVE-2026-31431)
- Microsoft Security Blog: CVE-2026-31431: Copy Fail vulnerability enables Linux root privilege escalation across cloud environments
- Palo Alto Networks Unit 42: Copy Fail: What You Need to Know About the Most Severe Linux Threat in Years
- Tenable: Copy Fail (CVE-2026-31431) FAQ
- CISA: Known Exploited Vulnerabilities Catalog
- AlmaLinux: Copy Fail (CVE-2026-31431) Patches Released
Assess your security posture today
CloudShield Secure scans, validates, and prioritises threats across your entire attack surface.
Explore CloudShield Secure →