Troubleshooting vSphere with Tanzu (TKGS) Supervisor Control Plane VM's
search cancel

Troubleshooting vSphere with Tanzu (TKGS) Supervisor Control Plane VM's

book

Article ID: 323407

calendar_today

Updated On:

Products

VMware vSphere with Tanzu

Issue/Introduction

vSphere with Tanzu provisions three Supervisor Control Plane VMs which act as the control plane for the Supervisor Cluster. There are times when users may need to troubleshoot these VM's in order to run a investigate or test networking. 

This kb details the following.

  • Networking details around the supervisor control plane VM's.
  • How to ssh into supervisor control plane VMs.
  • How supervisor control plane VM's correlate to EAM agencies in vCenter.

Resolution

Network Diagram of vSphere with Tanzu

 

How to SSH into Supervisor Control Plane VMs

First SSH into vCenter and run /usr/lib/vmware-wcp/decryptK8Pwd.py

root@vcenter [ ~ ]# /usr/lib/vmware-wcp/decryptK8Pwd.py
Read key from file

Connected to PSQL

Cluster: domain-c8:rg64l2-ghkl-256l-a32c-3b85d0b5a1d5
IP: 10.10.10.10
PWD: UPvd82Jc8buR9nsceMbg=
------------------------------------------------------------

root@vcenter [ ~ ]#

 

This will output the IP address and password required for supervisor SSH. We recommend SSH-ing from the vCenter to test that the vCenter and management network on the SV VM have connectivity. The IP address shown in the decrypt password will always be the FIP. If etcd is down on the supervisor cluster, the FIP will not be assigned and users will need to SSH into the actual IP (eth0) of the supervisor control plane VM. When SSH-ing into the FIP users may see an error about the SSH public certificate changing. This is normal as the FIP "floats" between nodes, so the backing cert changes each time it floats. Delete the entry for the FIP under /root/.ssh/known_hosts or delete that file entirely to workaround it.


PLEASE NOTE: When on the supervisor control plane VM you have permissions to permanently damage the cluster. If VMware Support finds evidence of a customer making changes to the supervisor cluster from the SV VM, they may mark your cluster as unsupported and require you redeploy the entire vSphere with Tanzu solution. Only use this session to test networks, look at logs, and run kubectl logs/get/describe commands. Do not deploy, delete, or edit anything from this session without the express permission of a KB or VMware Support.

 

Supervisor Control Plane VMs and their EAM Agency/VM ID Information

Each Supervisor Control Plane VM has a correlated EAM (ESXi Agent Manager) Agency. 

The EAM Agency can be found under the notes section when viewing the VM. In the below example the eam agency is EAM Agency: vmware-vsc-apiserver-p69z67

 

 

The VM ID can be found in the URL when selecting the VM. 
In the below example the VM ID is vm-13007

 

 

Use the EAM agency and the VM ID to correlate errors with the supervisor control plane VMs located in the logs on vCenter under /var/log/vmware/vpxd , /var/log/vmware/eam , and /var/log/vmware/wcp 

IMPORTANT NOTE: EAM agencies can be manually deleted from the web client via Menu -> Administration -> vCenter Server Extensions -> vSphere ESX Agent Manager -> Configure. Deleting an EAM agency will DELETE the supervisor control plane VM and a new one will be created. THIS IS NOT A VALID TROUBLESHOOTING METHOD. Do not delete EAM agencies without EXPRESS guidance from a VMware support engineer. Depending on versions and the existing health of the supervisor cluster it is entirely possible to render the entire cluster un-recoverable. If VMware Support finds evidence of manual EAM Agency deletion, they may mark the cluster as unsupported and require a redeploy of the entire vSphere with Tanzu solution.