The floating IP (FIP) doesn't get assigned to any of the Supervisor control planes although etcd is healthy and has a leader elected

Products

VMware vSphere ESXi VMware vSphere with Tanzu

Issue/Introduction

The KB should help with understanding how the FIP is assigned, investigating FIP pods logs, and solve the issues with the floating IP assignment accordingly.

Symptoms:

Environment

VMware vSphere 7.0 with Tanzu
VMware vSphere 8.0 with Tanzu

Cause

Each control plane should have a wcp-fip-<node-ID> pod that has one purpose: monitor the etcd instance running on the same node every 5 seconds for leader election, then set/unset the floating IP on eth0 that makes the management network based on the outcome. From the FIP pod logs we can see that the script running on it does the following:

- Query the status of the etcd instance running on the same node:

curl -m 10 --cacert /etc/kubernetes/pki/etcd/ca.crt --cert /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key -q -s -X POST -d '{}' https://127.0.0.1:2379/v3beta/maintenance/status

- This query returns the etcd status, including the member_id, revision, and the leader ID which are the values we are interested in:

out='{"header":{"cluster_id":"14063905004556717415","member_id":"14064401587421453142","revision":"7378603","raft_term":"2"},"version":"3.4.13","dbSize":"35561472","leader":"14064401587421453142","raftIndex":"7904694","raftTerm":"2","raftAppliedIndex":"7904694","dbSizeInUse":"15966208"}'

- It then compares the member_id with the leader ID:

echo -n '{"header":{"cluster_id":"14063905004556717415","member_id":"14064401587421453142","revision":"7378603","raft_term":"2"},"version":"3.4.13","dbSize":"35561472","leader":"14064401587421453142","raftIndex":"7904694","raftTerm":"2","raftAppliedIndex":"7904694","dbSizeInUse":"15966208"}' | jq -r -e 'select(.header.member_id == .leader).header.member_id'

1- If they match, it will state the leadr ID and then declare it's the current leader:

14064401587421453142
Is the current leader!

Then it runs the fip_assign function which ensures the FIP is already assigned to eth0, in which case it does nothing:

+ fip_assign
+ is_alias_exist
+ ip addr show eth0
+ grep 'inet 192.168.111.200/'
+ '[' 0 -eq 0 ']'
+ FIP=1

If not assigned yet, then it assigns the FIP to eth0:

+ ip addr add 192.168.111.200 dev eth0
+ '[' 0 -ne 0 ']'
+ echo 'Assigned fip 192.168.111.200 on eth0'
+ FIP=1

2- If they don't match, it returns 4 for no valid results:

echo -n '{"header":{"cluster_id":"14063905004556717415","member_id":"16529434649879028690","revision":"7378603","raft_term":"2"},"version":"3.4.13","dbSize":"35561472","leader":"14064401587421453142","raftIndex":"7904694","raftTerm":"2","raftAppliedIndex":"7904694","dbSizeInUse":"15966208"}' | jq -r -e 'select(.header.member_id == .leader).header.member_id'

+ return 4

Then it runs the clean_up function to check whether FIP is assigned to eth0, in which case it unassigns it:

+ echo 'Removing interface eth0'
+ ip addr delete 192.168.111.200 dev eth0

If not assigned already, then it does nothing:

+ clean_up
+ '[' 0 -eq 0 ']'
+ return
+ continue
+ '[' 1 ']'

In such case where the FIP doesn't get assigned to eth0 on any of the nodes, we see that the etcd instances are healthy and all FIP pods are running, but the script running on the FIP pod where the leader etcd is running has stopped. If we check that pod logs we will find that they don't get updated every 5 seconds:

# kubectl logs wcp-fip-4207b7f3e3b2e83135c7536ec42ef3b4 -n kube-system --tail=15 -f

The last log for the pod shows a very old etcd revision as opposed to the current version, and maybe doesn't declare the correct leader:

+ echo -n '{"header":{"cluster_id":"14063905004556717415","member_id":"16529434649879028690","revision":"7378603","raft_term":"5"},"version":"3.4.13","dbSize":"35561472","leader":"14064401587421453142","raftIndex":"8644970","raftTerm":"5","raftAppliedIndex":"8644970","dbSizeInUse":"17739776"}'

# etcdctl endpoint status list --cluster -w fields |grep -E "MemberID|Leader|Revision|Endpoint" |sed 'N;N;N;s/\n/\t/g'
"MemberID" : 10774885632129568322       "Revision" : 8087236    "Leader" : 16529434649879028690 "Endpoint" : "https://192.168.111.203:2379"
"MemberID" : 14064401587421453142       "Revision" : 8087236    "Leader" : 16529434649879028690 "Endpoint" : "https://192.168.111.201:2379"
"MemberID" : 16529434649879028690       "Revision" : 8087236    "Leader" : 16529434649879028690 "Endpoint" : "https://192.168.111.202:2379"

Restarting/deleting the FIP pod in question using kubectl restarts the pod age but doesn't actually restart the pod or change this behaviour.

Resolution

The FIP is one of those static pods that the kubelet deploys from /etc/kubernetes/manifests/ directory. To actually restart that pod, we need to follow this procedure:

- Make a backup copy of wcp-fip.yaml:
# cp /etc/kubernetes/manifests/wcp-fip.yaml /root/wcp-fip.yaml.backup

- Delete the file:
# rm /etc/kubernetes/manifests/wcp-fip.yaml

- The kubelet service, which monitors /etc/kubernetes/manifests/, should delete the pod.

# kubectl get pods -A |grep fip
kube-system                                 wcp-fip-4207357203729ea9f3eb35d0484e76b1                          1/1     Running     0          9d
kube-system                                 wcp-fip-4207b7f3e3b2e83135c7536ec42ef3b4                          1/1     Running     0          9d

- Move the file back to /etc/kubernetes/manifests/ which should start the pod back and assign the floating IP to eth0:

# cp /root/wcp-fip.yaml.backup /etc/kubernetes/manifests/wcp-fip.yaml

# kubectl get pods -A |grep fip
kube-system                                 wcp-fip-4207357203729ea9f3eb35d0484e76b1                          1/1     Running     0          9d
kube-system                                 wcp-fip-4207a28a1d8b5474fa78ec4e2cbfbc5f                          1/1     Running     0          2s
kube-system                                 wcp-fip-4207b7f3e3b2e83135c7536ec42ef3b4                          1/1     Running     0          9d

# ip addr show dev eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:50:56:87:5b:d1 brd ff:ff:ff:ff:ff:ff
    inet 192.168.111.202/24 brd 192.168.111.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet 192.168.111.200/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::250:56ff:fe87:5bd1/64 scope link
       valid_lft forever preferred_lft forever

Additional Information

Impact/Risks:
No access to the supervisor cluster. Services cannot communicate with the API server of the supervisor cluster.