Calico-node kube-proxy and antrea pods may fail intermittently on Photon 3

Products

VMware vSphere ESXi VMware vSphere Kubernetes Service

Issue/Introduction

Symptoms:

Intermittently, calico-node pods in Kubernetes clusters on Photon 3 may start failing readiness checks.
The pods will report:

calico/node is not ready: felix is not ready: readiness probe reporting 503

With the following events:

"Readiness probe failed: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/calico/bird.ctl: connect: connection refused" "Liveness probe failed: calico/node is not ready: Felix is not live: Get " [http://localhost:9099/liveness](http://localhost:9099/liveness "http://localhost:9099/liveness")": dial tcp 127.0.0.1:9099: connect: connection refused" "Readiness probe failed: 2023-06-07 08:53:26.150 [INFO][775] confd/health.go 180: Number of node(s) with BGP peering established = 8"

When looking at the logs of the pod, the following errors will be present:

[PANIC][9342] felix/table.go 769: iptables-legacy-save command failed after retries ipVersion=0x4 table="raw" panic: (*logrus.Entry) 0xc000284b90

Kube-proxy logs on affected nodes will also show errors like:

2023-07-07T14:09:28.076615713Z stderr F E0707 14:09:28.076532 1 proxier.go:859] "Failed to ensure chain exists" err="error creating chain \"KUBE-EXTERNAL-SERVICES\": exit status 3: iptables v1.8.2 (legacy): can't initialize iptables table `filter': No child processes\nPerhaps iptables or your kernel needs to be upgraded.\n" table=filter chain=KUBE-EXTERNAL-SERVICES 2023-07-07T14:09:28.076640119Z stderr F I0707 14:09:28.076553 1 proxier.go:851] "Sync failed" retryingTime="30s"

antrea-agent-error logs on affected nodes will also show errors like:

F0326 07:31:50.801001 1 main.go:53] Error running agent: failed to start NPL agent: error when initializing NodePortLocal port table: initialization of NPL iptables rules failed: error checking if chain ANTREA-NODE-PORT-LOCAL exists in table nat: running [/usr/sbin/iptables -t nat -S ANTREA-NODE-PORT-LOCAL 1 --wait]: exit status 3: iptables v1.8.3 (legacy): can't initialize iptables table `nat': No child processes
Perhaps iptables or your kernel needs to be upgraded.

In all cases, the error must include "no child processes"

Environment

VMware vSphere 7.0 with Tanzu
VMware vSphere 8.0 with Tanzu

Cause

Whilst the conditions that trigger the failure are not currently determined, the failure is identified as an issue in the Linux kernel in the bpfilter filter in versions prior to 5.2-rc2.

Resolution

This is fixed in the following TKR's and newer.

v1.26.13---vmware.1-fips.1-tkg.3 for vSphere 8.x
v1.26.12---vmware.2-fips.1-tkg.2 for vSphere 7.x
v1.27.6---vmware.1-fips.1-tkg.1 for vSphere 7.x

Workaround:

1. Save the following script to a file named vsphere_7_disable-bpfilter.sh OR vsphere_8_disable-bpfilter.sh (depending on version), or download the attached script and place it on a machine with a kubeconfig for the supervisor cluster and has L3 access to the nodes to be remediated.

NOTE: The difference between the attached scripts is the vspheremachine reference for vsphere 8.x vs. the wcpmachine reference for vsphere 7.x.

#!/bin/bash

# Check if namespace and cluster arguments are provided
if [ $# -ne 2 ]; then
 echo "Usage: $0 <namespace> <cluster>"
 exit 1
fi

# Set the namespace and cluster variables
NAMESPACE=$1
CLUSTER=$2

# Retrieve the SSH private key from the secret and write it to a temporary file
PRIVATE_KEY_FILE=$(mktemp)
kubectl get secret -n "$NAMESPACE" "$CLUSTER-ssh" --template='{{index .data "ssh-privatekey" |
base64decode}}' >"$PRIVATE_KEY_FILE"
chmod 600 "$PRIVATE_KEY_FILE"

# Embedded script to run on each node
SCRIPT_TO_RUN=$(
 cat <<'END_SCRIPT'

#!/bin/bash

# Your script commands go here
echo "Running bpfilter remediation script on node: $(hostname)"

echo Unloading bpfilter module
sudo modprobe -r bpfilter

echo Disabling bpfilter module
echo "blacklist bpfilter" | sudo tee /etc/modprobe.d/disable-bpfilter.conf >/dev/null
echo "install bpfilter /bin/true" | sudo tee -a /etc/modprobe.d/disable-bpfilter.conf >/dev/null

sudo systemctl restart systemd-modules-load.service

echo "Testing disablement"
sudo modprobe -n -v bpfilter
sudo lsmod | grep bpfilter || echo "bpfilter is not loaded"

END_SCRIPT
)

# Get the list of node names using kubectl and --template
NODE_NAMES=$(kubectl get vspheremachine -n "$NAMESPACE" -l "cluster.x-k8s.io/cluster-name=$CLUSTER"
--template='{{range .items}}{{.metadata.name}}{{"\n"}}{{end}}')

# Iterate over each node
for NODE_NAME in $NODE_NAMES; do
 # Get the node's IP address using kubectl and --template
 NODE_IP=$(kubectl get vspheremachine -n "$NAMESPACE" "$NODE_NAME" --template='{{.status.vmIp}}')

 # SSH into the node using the private key file, ignore host key checking, and run the embedded
script
 echo "Running script on node: $NODE_NAME"
 ssh -i "$PRIVATE_KEY_FILE" -o StrictHostKeyChecking=no "vmware-system-user@$NODE_IP" "bash -s"
<<<"$SCRIPT_TO_RUN"

 # Add any additional commands you want to run on each node here
 # ...

 echo "Finished running script on node: $NODE_NAME"
done

# Remove the temporary private key file
rm "$PRIVATE_KEY_FILE"

2. Make the script executable (change the filename to reflect the downloaded version):

# chmod +x ./vsphere_7_disable-bpfilter.sh

3. Run the script to disable the bpfilter module (change the filename to reflect the downloaded version):

# ./vsphere_7_disable-bpfilter.sh

Node reboots can also temporarily resolve the problem.

Additional Information

Impact/Risks:

Pod networking will fail on affected nodes.

Attachments

vsphere_8_disable-bpfilter get_app

vsphere_7_disable-bpfilter get_app