Embedded vCLS machines will not deploy in vSphere 8.0 Update 3
search cancel

Embedded vCLS machines will not deploy in vSphere 8.0 Update 3

book

Article ID: 378133

calendar_today

Updated On:

Products

VMware vCenter Server 8.0

Issue/Introduction

  • After updating to/installing vSphere 8.0 Update 3 vCLS virtual machines do not deploy.
  • There are no tasks seen for vCLS deployments on the vSphere Web Client.
  • Entering and exiting Retreat mode option will not deploy new vCLS VMs
  • The ESX Agent Manager log on vCenter does not show any errors (/var/log/vmware/eam/eam.log).
  • The vpxd.log on vCenter will have similar entries to:

    [Originator@6876 sub=MoCluster opID=PodCrxMgr-domain-c1#-19098] Dumping vCLS Pod Crx host infos; domain-c1#, [{[vim.HostSystem:host-1##,esx.example-domain.com]
    [Originator@6876 sub=MoCluster opID=PodCrxMgr-domain-c1#-19098] Completed request from LRO request queue; {VclsPodCrxReconfigure(reason: 'VM power-on timeout on host-1##'), p: 00007f7790199c50, attempt: 1}, e: (null)

  • On the infravisor.log (/var/run/log/infravisor.log) of the offending host you will see the following snippets:

YYYY-MM-DDTHH:MM:SSZ No(5) infravisor[3553587]: time="YYYY-MM-DDTHH:MM:SS.292657Z" level=error msg="Failed to get resource from spec vcls.yaml: failed to decode pod from /etc/vmware/infravisor/manifests/vcls.yaml: ValidatePodCreate failed: [spec.nodeName: Invalid value: \"esx.\": a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')]"

  • When viewing the hosts (/etc/hosts) file on the offending ESXi you see:

# Do not modify this file directly, please use esxcli.
127.0.0.1       localhost.localdomain localhost
::1             localhost.localdomain localhost
192.#.#.###     esx. esx.example.com

  • Checking hostname using the command hostname in SSH of the ESXi gives an output as follows.

esx.

Environment

VMware vSphere ESXi 8.0 U3

Cause

  • This issue is being caused by an unsupported, invalid hostname. 
  • In the example used in this KB we can see the unsupported entry in the hosts file of the ESXi: 192.#.#.### esx. esx
  • The offending part here is the "." after the first "esx" instance: esx.
  • Other entries that may cause the same issue
    • esx-
      esx.example.com.
      esx.example.com-
      "esx.example.com " (a space at the end can cause this)

Resolution

To rectify this issue, the hostnames of any affected ESXi nodes must be corrected.

1. Enable retreat mode on the cluster as per the KB: KB 316514

2. Put the affected ESXi host into maintenance mode.

3. SSH to the same ESXi and run the following command to change the hostname:

# esxcli system hostname set --host esx --domain example.com

Info: "esx" hereby being the hostname, "example.com" being the domain. This will result in a valid FQDN of "esx.example.com".

4. Verify the hosts file has changed via SSH:

# cat /etc/hosts

# Do not modify this file directly, please use esxcli.
127.0.0.1       localhost.localdomain localhost
::1             localhost.localdomain localhost
192.#.#.###     esx esx.example.com

5. Restart the agents on the ESXi with active maintenance mode:

# services.sh restart

Note: The host can be shown as "not responding" in vCenter for a short period of time while the host re-establishes connection to vCenter.

6. Take the ESXi host out of maintenance mode.

7. If any other host affected, repeat with next host beginning Step 2.

8. If all hosts completed, enable system-managed vCLS again (revert the change made in Step 1).

Additional Information

vSphere 8.0 Update 3 uses "Embedded vCLS" (vCLS 2.0) by default. An overview and more information regarding new "Embedded vCLS" can be found here: https://blogs.vmware.com/cloud-foundation/2024/07/17/embedded-vsphere-cluster-services-overview/

Note: The hostname can also be changed via DCUI.