vSphere Supervisor Cluster in Error State - Unable to resolve the vCenter Primary Network Identifier with the configured management DNS servers control plane vm to vCenter FQDN
search cancel

vSphere Supervisor Cluster in Error State - Unable to resolve the vCenter Primary Network Identifier with the configured management DNS servers control plane vm to vCenter FQDN

book

Article ID: 400447

calendar_today

Updated On:

Products

VMware vSphere Kubernetes Service

Issue/Introduction

On the vSphere web UI, one or more of the following symptoms are present:

  • Workload Management shows the following symptoms:
    • The affected Supervisor cluster is in Error state.
    • Viewing the Error in detail shows that all three Supervisor control plane VMs are deployed but are Configuring
      • The Supervisor Control Plane VMs do not have accurate numbers assigned and instead show (To be determined)
      • no connectivity to api master at localhost:1080/external-cert/http1/<Supervisor FIP>/6443/healthz?timeout context deadline exceeeded

  • In the Inventory view, the Supervisor control plane VMs are present, poweredOn with IP addresses and with two Network Adapters assigned.

  • When viewing the Kubernetes Status of the Supervisor cluster, one or more of the following errors are observed, where values in brackets <> will vary by environment:
    • Unable to resolve the vCenter Primary Network Identifier <vCenter FQDN> with the configured management DNS servers on control plane VM <Supervisor control plane VM DNS name>. Validate that the management DNS servers '<DNS Servers>' can resolve <vCenter FQDN>

 

While SSH to the vCenter Server Appliance (VCSA), the following symptoms are present:

  • Viewing wcpsvc logs show an error message similar to the below, where values in brackets <> will vary by environment:
    • cat /var/log/vmware/wcp/wcpsvc.log
       
    • {"type": "ManagementNetworkConfigured", "status": "FALSE", "reason": "ManagementDNSServerHostNotFound", "messages": [{"Severity": "ERROR", "Details": {"Id": "vcenter.wcp.node_state_check.mgmt_network.vcenter_pnid_host_not_found", "DefaultMessage": "Unable to resolve the vCenter Primary Network Identifier <vCenter FQDN> with the configured management DNS servers on control plane VM <Supervisor control plane VM DNS name>. Validate that the management DNS servers '<DNS Servers>' can resolve <vCenter FQDN>.", "Args": ["<vCenter FQDN>", "<Superivsor control plane VM DNS name>", "<DNS servers>"]}}], "severity": "ERROR"

 

While connected to the Supervisor control plane VM from the error message, the following symptoms are observed:

  • All system pods are in Running or Completed state:
    • kubectl get pods -A | egrep -v "Run|Complete"
  • nslookup to the vCenter FQDN fails:

    • nslookup <vCenter FQDN>

Environment

vSphere Supervisor

Cause

One or more DNS servers configured with Workload Management on the Management Network cannot reach the vCenter FQDN.

Resolution

Correct the DNS server connection issue to the vCenter FQDN.

  1. SSH to the Supervisor control plane VM in the Kubernetes status error

  2. Perform nslookups to the vCenter FQDN with the DNS server(s) configured for the management network in Workload Management:
    nslookup <vCenter FQDN> <DNS server>
  3. After correcting the DNS server issues to the vCenter FQDN, the system coreDNS pods should be rolling restarted in the Supervisor cluster:
    kubectl rollout restart deployment -n kube-system coredns
  4. Confirm that nslookups to the vCenter FQDN from the Supervisor control plane VM are now successful:
    nslookup <vCenter FQDN>