Supervisor cluster upgrade might get stuck if wcp service on vCenter is restarted or crashed during upgrade
search cancel

Supervisor cluster upgrade might get stuck if wcp service on vCenter is restarted or crashed during upgrade

book

Article ID: 323436

calendar_today

Updated On:

Products

VMware vCenter Server

Issue/Introduction

Symptoms:
During a supervisor cluster upgrade the 4th node fails to come up.
root@42397b89d11566df143a55066724a350 [ ~ ]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
423941ed1d520ec8cf940c2b8860840f Ready master 5h19m v1.19.1+wcp.3
42397b89d11566df143a55066724a350 Ready master 5h54m v1.19.1+wcp.3
4239d4da55ae6243c84d865842b253af NotReady <none> 3h8m v1.20.2+wcp.3
4239fe3031a588a4a0524a771ce9749b Ready master 5h19m v1.19.1+wcp.3
sc1-10-78-225-240.eng.vmware.com Ready agent 5h14m v1.19.1-sph-b0161d9
sc1-10-78-225-36.eng.vmware.com Ready agent 5h14m v1.19.1-sph-b0161d9
sc1-10-78-228-146.eng.vmware.com Ready agent 5h14m v1.19.1-sph-b0161d9
And etcd pod logs on the not ready node show 

2021-08-24 21:22:25.962007 C | etcdmain: error validating peerURLs {ClusterID:fbb28004e70cc8d7 Members:[&{ID:86394628723018fa RaftAttributes:{PeerURLs:[https://10.78.225.103:2380] IsLearner:false} Attributes:{Name:4239fe3031a588a4a0524a771ce9749b ClientURLs:[https://10.78.225.103:2379]}} &{ID:91f24f5f2954401a RaftAttributes:{PeerURLs:[https://10.78.231.12:2380] IsLearner:false} Attributes:{Name:423941ed1d520ec8cf940c2b8860840f ClientURLs:[https://10.78.231.12:2379]}} &{ID:bb8603c19df1616c RaftAttributes:{PeerURLs:[https://10.78.239.82:2380] IsLearner:false} Attributes:{Name:42397b89d11566df143a55066724a350 ClientURLs:[https://10.78.239.82:2379]}}] RemovedMemberIDs:[]}: member count is unequal


Environment

VMware vCenter Server 7.0.x

Cause

This is due to wcp service on vCenter being restarted or crashed during the upgrade.

Resolution

1. Login to supervisor cluster as root using KB 90194
2. "etcdctl member list"
3. etcdctl member remove <vm-id shown by above command>   (Only use this if there is a 4th Supervisor node)
4. Delete the EAM agency for Supervisor control VM from UI using below steps.


Navigating to EAM agency configuration in VC web UI (Menu -> Administration -> vCenter Server Extensions -> vSphere ESX Agent Manager -> Configure >> Select the agency of WCP control plane VM and delete

In some cases step 2 and 3 may not apply but step 1 and 4 applies in all cases.
 

PLEASE NOTE: When on the supervisor control plane VM you have permissions to permanently damage the cluster. If VMware Support finds evidence of a customer making changes to the supervisor cluster from the SV VM, they may mark your cluster as unsupported and require you redeploy the entire vSphere with Tanzu solution. Only use this session to test networks, look at logs, and run kubectl logs/get/describe commands. Do NOT deploy, delete, or edit anything from this session.