vCLS pods fail to deploy in a DRS and HA enabled cluster
search cancel

vCLS pods fail to deploy in a DRS and HA enabled cluster

book

Article ID: 425726

calendar_today

Updated On:

Products

VMware vCenter Server VMware vSphere ESXi

Issue/Introduction

  • In a vSphere cluster with DRS and HA enabled, vSphere Cluster Services (vCLS) pods are not deployed.
  • Enabling and disabling Retreat Mode successfully removes and recreates the vCLS ESX Agent, however the vCLS pods remain absent from the cluster.
  • The following errors may be observed:

vCenter: /var/log/vmware/vpxd/vpxd.log

YYYY-MM-DDTHH:MM:SS.176Z info vpxd [...] Completed request from LRO request queue; {VclsPodCrxReconfigure(reason: 'VM power-on timeout on host-#####')}

ESXi: /var/run/log/infravisor.log 

ValidatePodCreate failed: [spec.nodeName: Invalid value: "<hostname.>": a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character]

Environment

VMware vCenter 8.0 U3
VMware vSphere ESXi 8.0 U3

Cause

The ESXi host configuration contains a trailing dot at the end of the FQDN (example, esxi01.domain.local.). This formatting violates the RFC 1123 standard for subdomains. Because the infravisor service cannot validate the invalid hostname, it is unable to decode the /etc/vmware/infravisor/manifests/vcls.yaml manifest required to deploy the vCLS PodVM.

Resolution

To resolve this issue, the hostname must be corrected to a valid format, enable pod settings, restart Infravisor service, and re-trigger PodVM deployment:

  1. Correct the Hostname/FQDN:

    • Verify current hostname: esxcli system hostname get

    • Set the correct FQDN (ensure no trailing dots): esxcli system hostname set -f <correct_fqdn>

  2. Enable Pod Settings in ConfigStore:

    • Check current status: configstorecli config current get -c esx -g infravisor_pods -k vcls

    • If disabled, run: configstorecli config current set -c esx -g infravisor_pods -k vcls -p /pod_settings/enabled -v true

  3. Restart/Manage Infravisor Service:

    • Check status: /etc/init.d/infravisor status

    • Restart the service to apply changes: /etc/init.d/infravisor restart

  4. Force Re-deployment:

    • Kill the existing PodVM entry: inf-cli kill -p /etc/vmware/infravisor/manifests/vcls.yaml

    • Verify deployment: inf-cli get pods -n vcls

Additional Information

 

  • Log Locations: Logs for further investigation can be found at /var/log/vmware/vpxd/vpxd.log (vCenter) and /var/run/log/infravisor.log (ESXi).

  • Standard Compliance: Ensure all hostnames across the cluster follow RFC 1123 (alphanumeric characters, hyphens, or dots; must not start or end with a non-alphanumeric character).