When performing an ESXi cluster upgrade in a VMware Cloud Foundation (VCF) environment, the upgrade task fails at the REMEDIATE stage within the SDDC Manager UI.
Symptoms: The upgrade progress stops, and the UI displays a generic failure message.
UI Error Message: ESX upgrade using vSphere Lifecycle Manager images failed. Error: Upgrade failed, internal error.
Associated Reference Tokens: M3OE58, PC4R9S, FH7AHT.
VMware Cloud Foundation 5.x / 9.x
The root cause of this failure is a significant Clock Skew between the SDDC Manager and the vCenter Server.
SAML Token Invalidity: When the SDDC Manager attempts to authenticate with vCenter to trigger vLCM tasks, the vCenter vpxd service rejects the request.
Authentication Error: The vpxd.log shows: SAML token validation failed. Error: ... Token start date is in the future. This occurs because the SSO-signed token has a timestamp ahead of the vCenter Server's current system time.
Task Failure: Because the token is considered "not yet valid," authentication fails (vim.fault.InvalidLogin), preventing the LCM service from obtaining a Task ID and leading to a Connection refused or No route to host exception in the LCM logs.
To resolve this issue, time synchronization must be restored across all VCF components.
Verify System Time:
Log in via SSH to the SDDC Manager, vCenter Server, and ESXi hosts.
Run the date command on each to identify discrepancies.
Correct NTP Configuration:
Ensure all components are pointed to the same reliable NTP server(s).
In the vCenter Management Interface (VAMI) at https://<vcenter-ip>:5480, ensure the time synchronization mode is set correctly.
Restart Services:
Once time is synchronized, restart services on the vCenter Server: service-control --stop --all && service-control --start --all
Restart the LCM service on the SDDC Manager: systemctl restart lcm
Retry Upgrade:
Return to the SDDC Manager UI and click Retry on the failed upgrade task.