Server Pools related on NSX for any Tanzu Clusters (port 6443) remains in "In Progress"
search cancel

Server Pools related on NSX for any Tanzu Clusters (port 6443) remains in "In Progress"

book

Article ID: 394311

calendar_today

Updated On:

Products

vSphere with Tanzu

Issue/Introduction

Server Pools (specifically port 6443) associated with any Tanzu clusters consistently remain in an "In Progress" state and do not transition to a completed or stable.

 

The update-controller in the Supervisor sync loop is stuck in an infinite loop.

var/log/update-controller/sync.log

YYYY-MM-DDTHH:MM:SS INFO network_setting: Network setting changed for if eth0
YYYY-MM-DDTHH:MM:SS INFO network_setting: Network setting changed for if eth0
YYYY-MM-DDTHH:MM:SS INFO network_setting: Network setting changed for if eth0
YYYY-MM-DDTHH:MM:SS INFO network_setting: Network setting changed for if eth0

journalctl logs confirm that the rule is being added and then removed shortly after, by systemd-networkd

# journalctl --since today

YYYY-MM-DDTHH:MM:SS <Node-id> systemd-networkd[XXXX]: eth1: Removing route: dst: XX.XX.XX.XX/XX, src: n/a, gw: n/a, prefsrc: n/a, scope: link, table: 200, proto: static, type: unicast
YYYY-MM-DDTHH:MM:SS <Node-id> systemd-networkd[XXXX]: eth1: Removing route: dst: n/a, src: n/a, gw: XX.XX.XX.XX, prefsrc: n/a, scope: global, table: 200, proto: static, type: unicast
YYYY-MM-DDTHH:MM:SS <Node-id> systemd-networkd[XXXX]: Removing routing policy rule: priority: 0, 0.0.0.0/0 -> XX.XX.XX.XX/XX, iif: n/a, oif: n/a, table: 200
YYYY-MM-DDTHH:MM:SS <Node-id> systemd-networkd[XXXX]: Removing routing policy rule: priority: 0, 0.0.0.0/0 -> XX.XX.XX.XX/XX, iif: n/a, oif: n/a, table: 200

Environment

VMware vSphere Kubernetes Service - Earlier than vCenter 8u3e (8.0.3.00500)

Cause

If there is no sso domain change that needs to applied in /etc/vmware/wcp/wcp-schedext-admission-controller-user-whitelist, then the function returns updated=False, which translates to sync retry = True causing update-controller sync loop to retry forever.

Resolution

To fix this issue update VC to version 8u3e (8.0.3.00500)

Follow below steps to workaround this issue:

1. SSH to the Supervisor

2. Backup the target file

cp /usr/lib/vmware-wcp/update-controller/sync.py ~

3. Comment out the line number 

vi /usr/lib/vmware-wcp/update-controller/sync.py
    520         # Sync if domain changes
    521         #retry = self.sync_sso_domain_change(messages) or retry # <-----  comment out this code

4. Restart wcp-sync process

systemctl restart wcp-sync
systemctl status wcp-sync

5. Repeat this operation for the remaining 2 Supervisors.