Products

VMware vSphere Kubernetes Service

Issue/Introduction

Control Plane and worker nodes on vSphere Kubernetes Service when running with Ubuntu 22.04 and 24.04 are confirmed to be affected by CVE-2026-31431 (“Copy Fail”) under certain circumstances only. Nodes running Photon OS are not affected.

Environment

vSphere Kubernetes Service

Cause

A local privilege escalation (LPE) vulnerability affecting the Linux kernel was publicly disclosed on April 29, 2026. The vulnerability has been assigned CVE-2026-31431 and is referred to as Copy Fail. The affected component is the kernel module algif_aead that provides hardware-accelerated cryptographic functions.

Resolution

Update: These vulnerabilities have been addressed in VKr 1.34.8 and VKr 1.35.5 by integrating fixes provided by Canonical Ubuntu, and it is highly recommend updating instead of attempting manual mitigation. To ensure your workloads and applications are not impacted, please review below section "1. Possible Impact on Deployed Applications" first.

1. Possible Impact on Deployed Applications

No software package in vSphere Kubernetes Service uses algif_aead, and Broadcom is not aware of any container workloads that are using mentioned kernel module at the present time. The module is most frequently used for wireless connectivity and strongswan. Even when using Antrea with IPSec encryption enabled, AEAD is handled in userspace via openssl for ESP, and the kernel module is not loaded.

Before applying below mitigation, ensure deployed applications do not depend on the kernel module algif_aead.

1.1. Finding if an application is using AF_ALG sockets

Note: Socket use for AF_ALG tends to be transient and that sockets are only opened for a very short period of time. Nevertheless, it may be possible to see an entry when running the following:

sudo lsof -nP 2>/dev/null | awk 'NR==1 || /protocol: ALG/'

If applications are identified using the kernel module algif_aead, it may be needed to contact the vendor to find an alternative or wait for the updated kernel patches.

2. Software Releases

New VKRs will be published in due course with the following Ubuntu package versions:

Ubuntu Release	Linux Kernel Version	kmod package version
Ubuntu 22.04	5.15.0.179.163	29-1ubuntu1.1
Ubuntu 24.04	VKr 1.35.5: 6.17.0-29.29~24.04.1 VKr 1.34.8: 6.8.0-117.117	31+20240202-2ubuntu7.2

3. Mitigation

NOTE: Broadcom strongly recommends testing this mitigation in non-production environment first. Review above "Possible Impact on Deployed Applications" carefully.

3.1. Persist mitigation on a workload cluster

To patch an existing workload cluster with the manual mitigation, it is required to identify current pause image, used by the cluster subject to be patched.

Copy script "mitigate-copyfail-ds.sh" from attachments to a Linux-machine with kubectl present and logged in to the Supervisor cluster.
To prevent script execution failures, ensure the file retains LF line endings and avoid saving it with Windows-native editors that might introduce CRLF formatting.
Make the script executable:
```
chmod +x mitigate-copyfail-ds.sh
```
Generate a DaemonSet YAML file for applying mitigation shortly: (It will not automatically apply the YAML definition)
```
./mitigate-copyfail-ds.sh --namespace <namespace> --cluster <workload-cluster>
```
Copy previous output and save this to file "copyfail-mitigation.yaml".
Apply customized YAML on the workload cluster to apply mitigation:
```
kubectl apply -f copyfail-mitigation.yaml
```

This has to be completed for all workload clusters which should be mitigated.

3.2. Disable mitigation

Deleting the DaemonSet from the workload cluster stops the mitigation from being applied on its new nodes. However, it does not remove the /etc/modprobe.d/manual-disable-algif_aead.conf file already written to each node. This file will persist until explicitly deleted, or the VKS node got re-provisioned.

kubectl delete daemonset disable-algif-aead -n kube-system

3.3. After mitigation is applied, does the machine need a reboot?

Under normal operation, the VKS nodes will not need a reboot. A reboot is only recommended if the kernel module is loaded and hence was or is actively used by an application.

3.3.1. Finding out if the kernel module is loaded

grep -qE '^algif_aead ' /proc/modules && echo "Affected module is loaded" || echo "Affected module is NOT loaded"

3.3.2. What to do if the kernel module is loaded

While it is possible to attempt unloading the relevant kernel module using rmmod, we do not recommend this approach if avoidable. Force-unloading kernel modules, which are actively in use, may negatively impact the kernel stability and/or applications relying on this module.

Hence, Broadcom recommends to apply above mitigation and reboot the node. However, if the kernel module is required by any workload, review above section "Possible Impact on Deployed Applications".

If a reboot is not immediately possible, it can be attempted to live unload with rmmod on nodes where the mitigation was applied:

sudo rmmod algif_aead

Guidance on rebooting

Note: If it is chosen to reboot nodes, it is highly recommended to cordon and drain nodes prior to their reboot. This is to reduce any disruptions to running workload.

Before rebooting any node, cordon it with kubectl cordon to stop new pods being scheduled, then drain it with kubectl drain --ignore-daemonsets --delete-emptydir-data to evict running workloads gracefully. Draining honors PodDisruptionBudgets and gives stateful workloads (etcd members, databases, message brokers) the chance to fail over or flush state cleanly; skipping this risks data loss, split-brain, and avoidable downtime.

Confirm the drain has completed and the node reports SchedulingDisabled before issuing the reboot. Reboot one node at a time and wait for it to return to Ready before moving on - this is especially important for control plane nodes, where etcd quorum (n/2 + 1 members) must be preserved throughout. Once the node is healthy, uncordon it with kubectl uncordon so workloads can be scheduled again.

4. Verification/Auditing

4.1. Listing every node in the cluster that needs a reboot

The snippet below iterates over all nodes via kubectl debug node/<name>, reads each node's /proc/modules through the host mount, and prints a single line per node. Nodes reporting REBOOT NEEDED still have algif_aead loaded in the running kernel; the blocklist will prevent future loads but the running instance can only be cleared by rmmod or a reboot as recommended above.

for node in $(kubectl get nodes -o name); do
  name="${node#node/}"
  result=$(kubectl debug "$node" -q --image=busybox -- \
    chroot /host sh -c 'grep -qE "^algif_aead " /proc/modules && echo LOADED || echo OK' \
    2>/dev/null | tail -n1)
  case "${result:-ERROR}" in
    LOADED) printf '%-50s %s\n' "$name" "REBOOT NEEDED (algif_aead loaded)" ;;
    OK)     printf '%-50s %s\n' "$name" "OK" ;;
    *)      printf '%-50s %s\n' "$name" "ERROR: could not determine state" ;;
  esac
done

Any node printed as REBOOT NEEDED should be cordoned, drained, and rebooted following the rebooting guidance above. Debug pods are short-lived and will be cleaned up automatically; if the environment restricts kubectl debug, an equivalent check can be run via SSH or any internal automation tools.

4.2. Auditing the mitigation across the cluster

Once new VKR releases do contain an appropriate fix, the VKS nodes should be updated. Until then, below snippet can be used to confirm every node in a cluster has the blocklist file in place. It uses kubectl debug node/<name> to spawn a privileged debug pod on each node and check for the blocklist configuration file:

for node in $(kubectl get nodes -o name); do
  name="${node#node/}"
  result=$(kubectl debug "$node" -q --image=busybox -- \
    chroot /host sh -c 'test -f /etc/modprobe.d/manual-disable-algif_aead.conf && echo OK || echo MISSING' \
    2>/dev/null | tail -n1)
  printf '%-50s %s\n' "$name" "${result:-ERROR}"
done

Nodes reporting MISSING have not had the mitigation applied - re-run the DaemonSet, or apply the manual blocklist directly. Debug pods are short-lived and will be cleaned up automatically; if the environment restricts kubectl debug, an equivalent check can be run via SSH or any internal automation tools.

Additional Information

More information about the security vulnerability and impacted VCF products are provided in KB Impact Evaluation of CVE‑2026‑31431 ("Copy Fail") of VMware by Broadcom product portfolio.

Should you require further information or support, contact Broadcom Support.
To be notified on any changes, subscribe to this knowledge base article.

Attachments

mitigate-copyfail-ds.sh get_app

Mitigation of CVE-2026-31431 ("Copy Fail") in vSphere Kubernetes Service for Ubuntu Nodes

Article ID: 439866

Updated On: