vSphere DRS fails with vCLS VMs in Orphaned state due to expired vpxd-extension certificate
search cancel

vSphere DRS fails with vCLS VMs in Orphaned state due to expired vpxd-extension certificate

book

Article ID: 440365

calendar_today

Updated On:

Products

VMware vCenter Server

Issue/Introduction

  • Users are unable to log in to the vCenter Server Appliance (VCSA).
  • vSphere DRS (Distributed Resource Scheduler) stops functioning across the cluster.
  • The vSphere Client displays the alert: "vSphere DRS functionality was impacted due to unhealthy state vSphere Cluster Services caused by the unavailability of vSphere Cluster Service VMs."
  • vSphere Cluster Service (vCLS) virtual machines appear in an Orphaned state.
  • eam.log file located at /var/log/vmware/eam/ show authentication failures:
    com.vmware.eam.security.NotAuthenticated: Failed to authenticate extension com.vmware.vim.eam to vCenter.

     

Environment

VMware vCenter Server 8.x

Cause

This issue occurs because the internal vpxd-extension solution user certificate has expired.

vCenter Server uses solution user certificates for internal service-to-service authentication. The ESX Agent Manager (EAM) service relies on the vpxd-extension certificate to authenticate with the vCenter Server. When this certificate expires, EAM can no longer manage vCLS VMs. Because vCLS is a mandatory requirement for DRS health in vSphere 7.x and 8.x, DRS functionality is disabled when these VMs are unavailable or orphaned.

Resolution

Ensure there are valid backup or snapshot of the vCenter Server Appliance before implementing these steps. Snapshot Best practices for vCenter Server Virtual Machines.

  1. Step 1: Renew Solution User Certificates:
    • Log in to the vCenter Server Appliance via SSH.
    • Use the vCert utility or the vSphere Certificate Manager to replace the expired solution user certificates with new VMCA-signed certificates.
    • Restart all vCenter services using service-control --stop --all && service-control --start --all

  2. Step 2: Manually Update the EAM Extension Certificate:
  3. Step 3: Restart EAM Service:
    • Restart the ESX Agent Manager: service-control --stop vmware-eam && service-control --start vmware-eam

Additional Information

Why certificates do not auto-renew

Internal solution user certificates are not auto-renewed by design to allow administrators to maintain control over the security chain and avoid unexpected service restarts in production environments.

Monitoring:
Administrators should monitor the Certificate Status alarm in vCenter Server. By default, this alarm triggers 30 days before a certificate expires. It is highly recommended to configure email or SNMP alerts for this alarm to prevent service outages.

 

Related Articles:

  1. vCLS Machines in Orphaned State Due to EAM Login Failure
  2. Expiring solution user certificates cause vCenter Server to throw "Certificate Status" alarm
  3. Using vSphere Certificate Manager to Replace SSL Certificates