SSPI upgrade fails at workload cluster upgrade with error "TimeoutError: Rollout of kubeadmControlPlane <control-plane name> did not complete within 3600 seconds"

search cancel

SSPI upgrade fails at workload cluster upgrade with error "TimeoutError: Rollout of kubeadmControlPlane <control-plane name> did not complete within 3600 seconds"

book

Article ID: 416602

calendar_today

Updated On:

Products

VMware vDefend Firewall

Issue/Introduction

During SSPI upgrade from v5.0 to v5.1, at workload cluster upgrade stage, when a new node is created with the same IP address but different MAC address (reusing the ip from the node ip pool), stale ARP cache entries on gateway router (or L3 switch) prevent network connectivity between new node and gateway because gateway router (or L3 switch) continue to send traffic to the old MAC address.

Environment

SSPI 5.0.0

Cause

During workload cluster upgrade, newly deployed platform nodes with reused IP addresses experience network connectivity failures. The gateway router (or L3 switch) retain ARP cache entries mapping the IP address to the old VM's MAC address, causing traffic to be misdirected.

Resolution

Network Side: Manually flush the ARP cache on the gateway router or L3 switch.
Node Side: Manually force an ARP update from the new VM using arping. (instructions below)

Procedure: Forcing ARP Update from the Node

Note: This procedure must be repeated for each newly created node initialized during the upgrade process.

1. Initiate Upgrade: Start the SSPI upgrade process again from the SSPI UI upgrade page.

2. Identify the New Node: Log in to the SSPI management node and identify the newly created platform node.

SSH to the SSPI node as root.

Run the following command to list nodes:
k get nodes -o wide

Action: Copy the NAME of the newly created node from the output above.

Action: Search for this node name in the vCenter UI and note the IP address listed on the VM Summary page.

3. Access the New Node: From the SSPI CLI (where you are currently logged in), SSH into the newly created node using the capv user and the IP obtained in Step 2.

ssh capv@<IP_of_New_Node>

4. Verify Connectivity Issue: Attempt to reach the Gateway or DNS server to confirm the connectivity failure.

- Option A (Standard):
  
  ping <Gateway_IP>
- Option B (If ICMP/Ping is blocked in your environment):

dig @<DNS_Server_IP>

Result: You should observe that connectivity is failing.

5. Force ARP Update Send gratuitous ARP packets to update the network cache.

sudo arping -I eth0 -U <IP_of_New_Node>

6. Verify Resolution: Repeat the test from Step 4. Connectivity should now be successful.

Additional Information

This issue is fixed in 5.1.1

Feedback

thumb_up Yes

thumb_down No