Supervisor upgrade stuck in pending state when deployed using DHCP
search cancel

Supervisor upgrade stuck in pending state when deployed using DHCP

book

Article ID: 415358

calendar_today

Updated On:

Products

VMware vSphere Kubernetes Service

Issue/Introduction

  • None of the nodes will be assigned a FIP IP. However, new node will spin up with a new FIP IP assigned by DHCP server
  • vCenter /var/log/vmware/wcpsvc.log:
    • YYYY-MM-DDwarning wcp [kubelib/retry.go:93] [opID=ContainerImageRegistryController-c654] Request to apiserver failed. Err , Endpoint http://localhost:1080/external-cert/http1/APISERVERIP/6443/api/v1/namespaces/kube-system/secrets/image-registry
  • Workload Supervisor Configuring UI error - "returned non-zero exit status 1"
  • SSH to FIP will fail -
    • Failed to connect to FIP IP(x.x.x.x) - no route to host

Environment

vSphere with Tanzu

Cause

In Supervisor clusters configured with DHCP, the Client Identifier is not reserved in the DHCP server. As a result, the Floating IP (FIP) assigned to the Supervisor components may change after lease renewal or host reboot or new node creation. This change in FIP can lead to connectivity disruptions and instability in the Supervisor cluster operations.

Resolution

To resolve the issue need to manually update the new FIP IP in the vCenter database.

NOTE: Perform snapshot of the vCenter before making changes in the postgres DB

VMware vCenter in Enhanced Linked Mode pre-changes snapshot (online or offline) best practice 

Steps:

  • Find the IP used in the certificate
    • openssl x509 -in /etc/kubernetes/pki/apiserver.crt -text -noout 
  •  Login to the vCenter DB
    • sudo -u wcp psql -U wcpuser -d VCDB;
  • Run the below command to confirm the FIP. Use both the IPs(old management IP/ DHCP server FIP ) in the below command  
    •  SELECT cluster, instance_id, master_mgmt_ip FROM cluster_db_configs WHERE master_mgmt_ip='x.x.x.x' OR master_mgmt_ip='x.x.x.x'; 
  • Update the master_mgmt_ip with the DHCP server provided FIP  
    • UPDATE cluster_db_configs SET master_mgmt_ip='x.x.x.x' WHERE master_mgmt_ip='x.x.x.x' AND instance_id='<the instance id from query above>';
  • Quit from the Postgres DB and restart wcp service
    • \q
  • Restart wcp service
    • vmoncli -r wcp