Supervisor cluster stuck in upgrade from 1.28.3 to 1.29.7 with error "failed to retrieve current version from etcd"
search cancel

Supervisor cluster stuck in upgrade from 1.28.3 to 1.29.7 with error "failed to retrieve current version from etcd"

book

Article ID: 418010

calendar_today

Updated On:

Products

Tanzu Kubernetes Runtime

Issue/Introduction

  • Supervisor upgrade stuck in upgrading state with the following error message on the vCenter UI:

Initialized vSphere resources
Deployed Control Plane VMs
Configured Control Plane VMs
 • Configuration error (since dd/mm/yyyy, hh:mm:ss)
 • CoreDNS configuration failed on Master node with identifier ############ . Details: failed to retrieve current version from etcd.
Configured Load Balancer fronting the kubernetes API Server
Configured Core Supervisor Services

  • Validating the wcpsvc.log on the vCenter Server, shows following error messages : 

/var/log/vmware/wcp/wcpsvc.log

yy-mm-ddThh:mm:ssZ debug wcp [kubelifecycle/master_node.go:466] Config status updated for Master VM VirtualMachine:vm-############. New value {"config_modify_time_ns": 1761928910708348043, "modify_time": "yy-mm-ddThh:mm:ssZ ", "conditions": [{"type": "GuestCustomized", "status": "TRUE", "reason": "", "messages": [], "severity": "", "lastTransitionTime": "yy-mm-ddThh:mm:ssZ"}, {"type": "ManagementNetworkConfigured", "status": "TRUE", "reason": "", "messages": [], "severity": "", "lastTransitionTime": "yy-mm-ddThh:mm:ssZ"}, {"type": "ConfiguredAsK8sNode", "status": "TRUE", "reason": "", "messages": [], "severity": "", "lastTransitionTime": "yy-mm-ddThh:mm:ssZ"}, {"type": "WorkloadNetworkConfigured", "status": "FALSE", "reason": "FailedWithSystemError", "messages": [{"Severity": "ERROR", "Details": {"Id": "vcenter.wcp.master.coredns.config.error", "DefaultMessage": "CoreDNS DNS setting failed on control plane <node_id>. Details: failed to retrieve current version from etcd", "Args": ["############", "failed to retrieve current version from etcd"]}}], "severity": "ERROR", "lastTransitionTime": "yy-mm-ddThh:mm:ssZ "}]}

Environment

vSphere Kubernetes Service

Cause

Current etcd version was removed during upgrade from /usr/lib/vmware-wcp/upgrade/upgrade-ctl.py

Resolution

Login to one Supervisor control plane and run:

  1. K8S_VER=$(kubectl version | grep Server | awk '{print $3}')

    1. On some versions the K8S_VER may not be as expected, for example:

      K8S_VER=$(kubectl version | grep Server | awk '{print $3}')  
      echo $K8S_VER
      version.Info{Major:"1", 


    2. If that's the case, run the below command:

      K8S_VER=$(kubectl version --short | awk -F': ' '/Server Version/ {print $2}')

  2. WCP_VER=$(grep 'wcp_version:' /etc/vmware/wcp/wcp_versions.yaml | head -1 | awk '{print $2}')
  3. /usr/lib/vmware-wcp/upgrade/upgrade-ctl.py set-current-version --wcp-version "$WCP_VER" --kubernetes-version "$K8S_VER"