Supervisor Cluster Unhealthy and down with the following error message reported on the vCenter's UI under Workload Management inventory - "System error occurred on Master Node with identifier returned non-zero exit status 1.."
IMPACT :
Determining logs for the Exited/Crashing containers:
crictl logs <Container_ID_Exited_etcd> :
YYYY-MM-DD HH:MM:SS I | pkg/flags: recognized and used environment variable ETCD_ENABLE_V2=true[WARNING] Deprecated '--logger=capnslog' flag is set; use '--logger=zap' flag insteadYYYY-MM-DD HH:MM:SS I | etcdmain: etcd Version: 3.4.13YYYY-MM-DD HH:MM:SS I | etcdmain: Git SHA: GitNotFoundYYYY-MM-DD HH:MM:SS I | etcdmain: Go Version: go1.15.2YYYY-MM-DD HH:MM:SS I | etcdmain: Go OS/Arch: linux/amd64YYYY-MM-DD HH:MM:SS I | etcdmain: setting maximum number of CPUs to 16, total number of available CPUs is 16YYYY-MM-DD HH:MM:SS N | etcdmain: the server is already initialized as member before, starting as etcd member...[WARNING] Deprecated '--logger=capnslog' flag is set; use '--logger=zap' flag insteadYYYY-MM-DD HH:MM:SS I | embed: peerTLS: cert = /etc/kubernetes/pki/etcd/peer.crt, key = /etc/kubernetes/pki/etcd/peer.key, trusted-ca = /etc/kubernetes/pki/etcd/ca.crt, client-cert-auth = true, crl-file =YYYY-MM-DD HH:MM:SS C | etcdmain: tls: private key does not match public key
crictl logs <Container_ID_Exited_kube-apiserver> :
Flag --experimental-encryption-provider-config has been deprecated, use --encryption-provider-config.Flag --kubelet-https has been deprecated, API Server connections to kubelets always use https. This flag will be removed in 1.22.IMMDD HH:MM:SS 1 server.go:629] external host was not specifiedI0822 HH:MM:SS 1 server.go:181] Version: v1.21.0+vmware.wcp.2Error: tls: private key does not match public key
WCP logs on vCenter Server, /var/log/vmware/wcp/wcpsvc.log:
YYYY-MM-DDTHH:MM:SS debug wcp [kubelifecycle/kube_instance.go:5515] [opID=68c130d8-672ee620-db30-40af-987a-c7d025bcc8f7] Cluster is not ready yet, wouldretry in 1m0s time.YYYY-MM-DDTHH:MM:SS error wcp [vclib/guestop.go:338] [opID=68ca282d-672ee620-db30-40af-987a-c7d025bcc8f7-reconcile] Kubenode guest command failed. RC: 12 8, Out: , Err: Error: tls: private key does not match public keyYYYY-MM-DDTHH:MM:SS error wcp [licensemonitor/license_event_monitor.go:259] [opID=licenseRefreshMonitor] Supervisor control plane failed: No connectivity to API Master: connectivity Get "https://<Control_Plane_IP_Address>:6443/healthz?timeout=5s": dial tcp <Control_Plane_IP_Address>:6443: connect: no route to host, config status ERRORYYYY-MM-DDTHH:MM:SS debug wcp [notifications/notifications.go:244] [opID=66cd08d8] No notifications. seqNum: 72, Current seqNum: 71YYYY-MM-DDTHH:MM:SS error wcp [licensemonitor/license_event_monitor.go:259] [opID=licenseRefreshMonitor] Supervisor control plane failed: No connectivity to API Master: connectivity Get "https://<Control_Plane_IP_Address>:6443/healthz?timeout=5s": dial tcp <Control_Plane_IP_Address>:6443: connect: no route to host, config status ERROR
VMware vSphere with Tanzu
Issue happens because the Server key and the Server certificate for ETCD does not matches on all of the Control Plane nodes or on either two of them because of which ETCD quorum is not maintained and Containers fails to start.
"/etc/kubernetes/pki/etcd" openssl rsa -modulus -noout -in server.keyopenssl x509 -modulus -noout -in server.crt