Following a Virtual Machine (VM) restart, hard reset, or power-off/power-on cycle, Tanzu Hub components fail to function correctly. The following specific failures are observed:
Antrea-Agent CrashLoopBackOff: The antrea-agent pods on Kubernetes worker nodes fail to start. Logs show the following fatal error: F1217 10:52:40.900641 1 main.go:54] Error running agent: error initializing agent: open /proc/sys/net/ipv4/conf/antrea-gw0/arp_announce: read-only file system
Registry Malfunction: The Registry service starts but fails to function. Investigation reveals that the storage root directory is empty or missing expected mount points.
Tanzu Hub 10.0~10.3
The issue is caused by a dependency on BOSH lifecycle scripts that are not triggered during a standard VM-level reboot or power cycle.
Antrea-Agent: By design, the antrea-agent pod is not privileged and cannot modify host-level kernel parameters. It relies on a BOSH pre-start script to initialize the /proc/sys/net/ipv4/conf/antrea-gw0/arp_announce parameter on the host.
Registry: The Registry component relies on a BOSH pre-start script to mount the overlay directories and storage paths.
When a VM is restarted at the OS/vSphere level, BOSH does not re-run these pre-start initialization scripts, leaving the host in an unconfigured state that the Kubernetes pods cannot self-correct.
The product team is currently working on a permanent fix to ensure these configurations persist across reboots. Until then, use the following manual recovery steps:
:~$ bosh -d hub-#### ssh registry -c "sudo /var/vcap/jobs/registry/bin/pre-start"
// wait about 10 seconds
:~$ bosh -d hub-#### ssh registry -c "sudo monit restart registry"
At first, check pods status and confirm the error message.
:~$ bosh -d hub-#### ssh system
system/####:~$ /var/vcap/packages/kubernetes/bin/kubectl --kubeconfig /var/vcap/jobs/kube-controller-manager/config/admin-kubeconfig -n kube-system get pods -owide
...
antrea-agent-lldjm 2/2 Running 1 (22d ago) 22d 192.168.0.165 192.168.0.165 <none> <none>
antrea-agent-mwrr4 1/2 CrashLoopBackOff 44 (15s ago) 22d 192.168.0.162 192.168.0.162 <none> <none>
antrea-agent-pb98f 2/2 Running 1 (22d ago) 22d 192.168.0.163 192.168.0.163 <none> <none>
...
system/####:~$ /var/vcap/packages/kubernetes/bin/kubectl --kubeconfig /var/vcap/jobs/kube-controller-manager/config/admin-kubeconfig -n kube-system logs antrea-agent-mwrr4
...
E1217 12:05:35.833430 1 sysctl_linux.go:64] "Error when setting sysctl parameter" err="open /proc/sys/net/ipv4/conf/antrea-gw0/arp_announce: read-only file system" path="ipv4/conf/antrea-gw0/arp_announce" value=1
F1217 12:05:35.834116 1 main.go:54] Error running agent: error initializing agent: open /proc/sys/net/ipv4/conf/antrea-gw0/arp_announce: read-only file system
...
Locate the node which host the crashing antrea-agent pods and rerun prepare-antrea-nodes pre-start script.
:~$ bosh -d hub-#### is | grep 192.168.0.162
control/#### running az3 192.168.0.162 hub-####
:~$ bosh -d hub-#### ssh control/#### -c "sudo /var/vcap/jobs/prepare-antrea-nodes/bin/pre-start"
control/####: stdout | [Wed Dec 17 12:25:05 PM UTC 2025] Installing systemd-networkd configuration files for Antrea interfaces if needed
control/####: stdout | [Wed Dec 17 12:25:05 PM UTC 2025] Systemd version: 249
control/####: stdout | [Wed Dec 17 12:25:05 PM UTC 2025] Installing files
control/####: stdout | [Wed Dec 17 12:25:05 PM UTC 2025] Restarting systemd-networkd
control/####: stdout | [Wed Dec 17 12:25:06 PM UTC 2025] Setting arp_announce to 1 for antrea-gw0 interface
Antrea-agent pod back-off restart interval is 5 minutes, please wait at least 5 minutes and check if the crashing antrea-agent pods start functioning properly.