vSphere with Tanzu NSX-T Deployment Supervisor Cluster stuck in a configuring state when overriding networking settings on a namespace
search cancel

vSphere with Tanzu NSX-T Deployment Supervisor Cluster stuck in a configuring state when overriding networking settings on a namespace

book

Article ID: 323426

calendar_today

Updated On:

Products

VMware vSphere Kubernetes Service

Issue/Introduction

Symptoms:


This issue only happens when all of the following conditions are met.

1. There are 1 or more supervisor namespaces that use the "Override Supervisor Network" checkbox when their supervisor namespace was created.

 

 

 


You can tell if a namespace is overridden or not from the configure tab on the namespace in the GUI. 

Without override:

 

 

With override:

 


2. On vCenter the /var/log/vmware/wcp/wcpsvc.log shows the cluster state flapping from configuring to running and back to configuring. You can verify this with the following command:


# cat /var/log/vmware/wcp/wcpsvc.log | grep "is changed from ConfigStatus"

wcpsvc.log:[timestamp] info wcp [kubelifecycle/kube_instance_conditions.go:90] Config status for WCP cluster <CLUSTER_UUID> is changed from ConfigStatus CONFIGURING to ConfigStatus RUNNING
wcpsvc.log:[timestamp] info wcp [kubelifecycle/kube_instance_conditions.go:90] Config status for WCP cluster <CLUSTER_UUID> is changed from ConfigStatus RUNNING to ConfigStatus CONFIGURING
wcpsvc.log:[timestamp] info wcp [kubelifecycle/kube_instance_conditions.go:90] Config status for WCP cluster <CLUSTER_UUID> is changed from ConfigStatus CONFIGURING to ConfigStatus RUNNING
wcpsvc.log:[timestamp] info wcp [kubelifecycle/kube_instance_conditions.go:90] Config status for WCP cluster <CLUSTER_UUID> is changed from ConfigStatus RUNNING to ConfigStatus CONFIGURING



3. From inside of one of the supervisor control plane VMs, the script below returns that the "EXTERNAL_IP_POOLS_LB" is changing intermittently. To run the script ssh into a supervisor control plane VM as root and copy/paste the following script into a file named check-diff.sh:

 

$ vi check-diff.sh

#! /usr/bin/env bash
if [ $# -ne 1 ]; then
    echo "Usage: output-diffs.sh <file>"
    exit 1
fi
file=$1
if [ ! -f "$file" ]; then
    echo "File not found: $file"
    exit 1
fi
cmd="stat -c %Y $file" # Linux
currTs=$($cmd)
updatesToCollect=5
i=1
cat "$file" > "$file.update0"
while [ "$i" -lt "$updatesToCollect" ]; do
  newTs=$($cmd)
  if [ "$newTs" != "$currTs" ]; then
    currTs=$newTs
    echo "Detected change in \"$file\" at $(date)"
    cat "$file" > "$file.update$i"
    i=$((i+1))
  fi
  sleep 1
done


Then make the file executable

$ chmod +x check-diff.sh 


Then run the command against the node-config file.

$ ./check-diff.sh /dev/shm/wcp_decrypted_data/node-config 


Wait a few minutes to see if the script returns that it is detecting a change. If you are not seeing any messages that means that the file is staying static. You can ctrl+c out of the script if nothing happens in 4-5 minutes. Otherwise it will detect 5 changes and then stop.:

root@<SUPERVISOR_HOSTNAME> [ ~ ]# bash ./check-diff.sh /dev/shm/wcp_decrypted_data/node-config

Detected change in "/dev/shm/wcp_decrypted_data/node-config" at Thu Aug 24 21:39:28 UTC 2023
Detected change in "/dev/shm/wcp_decrypted_data/node-config" at Thu Aug 24 21:39:42 UTC 2023
Detected change in "/dev/shm/wcp_decrypted_data/node-config" at Thu Aug 24 21:39:44 UTC 2023
Detected change in "/dev/shm/wcp_decrypted_data/node-config" at Thu Aug 24 21:39:46 UTC 2023


If you see that there are changes, run a grep to validate that the change is on the EXTERNAL_IP_POOLS_LB

root@<SUPERVISOR_HOSTNAME> [ ~ ]# grep -e '^EXTERNAL_IP_POOLS_LB' /dev/shm/wcp_decrypted_data/*

dev/shm/wcp_decrypted_data/node-config.update0:EXTERNAL_IP_POOLS_LB = <IP_1>/<NETMASK>,<IP_2>/<NETMASK>,<IP_3>/<NETMASK>
dev/shm/wcp_decrypted_data/node-config.update0:EXTERNAL_IP_POOLS_LB = <IP_2>/<NETMASK>,<IP_1>/<NETMASK>,<IP_3>/<NETMASK>
dev/shm/wcp_decrypted_data/node-config.update0:EXTERNAL_IP_POOLS_LB = <IP_1>/<NETMASK>,<IP_3>/<NETMASK>,<IP_2>/<NETMASK>

* Notice how the IP_# ordering changes on each line

Environment

VMware vSphere prior to 8.0 U2b

 

Cause

Issue only occurs prior to VMware vSphere 8.0 U2b when the "Override Supervisor network settings" option is enabled. The number of namespaces created with this option enabled will determine how frequent this issue will occur.

Resolution

Issue is fixed in vCenter Server 8.0 U2b 

Please contact VMware by Broadcom support for assistance in resolving this issue.