vCLS virtual machines fail to deploy due to ESX Agent Manager certificate desynchronization
search cancel

vCLS virtual machines fail to deploy due to ESX Agent Manager certificate desynchronization

book

Article ID: 435829

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

Failures observed following cluster rolling reboots.

  • vCLS VMs fail to power on or deploy.

  • EAM service fails to initialize or loops.

  • Inventory shows orphaned vCLS VMs.

  • Log Error: com.vmware.vim.binding.eam.fault.EamServiceNotInitialized.

Environment

VMware vCenter Server 7.0 Update 3i Build 20845200

 

Cause

Exhaustion of the /storage/log partition blocked vPostgres database transactions, causing orphaned vCLS virtual machine entity locks in the database and a desynchronization of the vpxd-extension certificate thumbprint between the VMware Endpoint Certificate Store (VECS) and the vCenter Database (VCDB).

Resolution

 

  1. Verify and clear space on the vCenter Server /storage/log partition.

  2. Verify EAM Configuration: Check that /etc/vmware-eam/features.properties contains pmm.infra=true and possesses 664 file permissions. If missing, inject the parameter using echo "pmm.infra=true" >> /etc/vmware-eam/features.properties and correct permissions using

    chmod 664 /etc/vmware-eam/features.properties.

  3. Force the removal of orphaned vCLS VMs from the vPostgres database (VCDB):

    • In the vSphere Client, create a temporary folder in the VMs and Templates inventory.

    • Move the orphaned vCLS VMs into the temporary folder.

    • Delete the temporary folder to force the VCDB to execute a cascading delete, dropping the stale entity registrations.

    • Warning: If the temporary folder method fails, manual SQL intervention is required to execute DELETE statements across 15 tables for the specific orphaned VM_ID records (e.g., 4001, 4002, 4003). Consult a DBA or Broadcom Support for advanced vPostgres cleanup procedures.

  4. Synchronize the vpxd-extension certificate to restore EAM database access:

    • Connect to the vCenter Server Appliance via SSH as root.

    • Create a temporary directory: mkdir /certificate

    • Extract the certificate: /usr/lib/vmware-vmafd/bin/vecs-cli entry getcert --store vpxd-extension --alias vpxd-extension --output /certificate/vpxd-extension.crt

    • Extract the key: /usr/lib/vmware-vmafd/bin/vecs-cli entry getkey --store vpxd-extension --alias vpxd-extension --output /certificate/vpxd-extension.key

    • Update the VCDB using the script: python /usr/lib/vmware-vpx/scripts/updateExtensionCertInVC.py -e com.vmware.vim.eam -c /certificate/vpxd-extension.crt -k /certificate/vpxd-extension.key -s localhost -u [email protected]

  5. Restart the EAM service:

    • service-control --restart vmware-eam

  6. Trigger fresh vCLS deployment:

    • Navigate to vCenter Server > Configure > Advanced Settings.

    • Toggle Retreat Mode by setting config.vcls.clusters.<domain-id>.enabled to False, wait 60 seconds, then set back to True.

Additional Information

Removing an orphaned/stale virtual machine from the vCenter Server Database

vSphere DRS functionality was impacted due to unhealthy state vSphere Cluster Services caused by EAM issue

Disable vCLS on a cluster via Retreat Mode