VCF ESXi upgrade fails with "Internal Error" due to SAML token validation failure
search cancel

VCF ESXi upgrade fails with "Internal Error" due to SAML token validation failure

book

Article ID: 429115

calendar_today

Updated On:

Products

VMware Cloud Foundation

Issue/Introduction

When performing an ESXi cluster upgrade in a VMware Cloud Foundation (VCF) environment, the upgrade task fails at the REMEDIATE stage within the SDDC Manager UI.

  • Symptoms: The upgrade progress stops, and the UI displays a generic failure message.

  • UI Error Message: ESX upgrade using vSphere Lifecycle Manager images failed. Error: Upgrade failed, internal error.

  • Associated Reference Tokens: M3OE58, PC4R9S, FH7AHT.

Environment

VMware Cloud Foundation 5.x / 9.x

Cause

The root cause of this failure is a significant Clock Skew between the SDDC Manager and the vCenter Server.

  1. SAML Token Invalidity: When the SDDC Manager attempts to authenticate with vCenter to trigger vLCM tasks, the vCenter vpxd service rejects the request.

  2. Authentication Error: The vpxd.log shows: SAML token validation failed. Error: ... Token start date is in the future. This occurs because the SSO-signed token has a timestamp ahead of the vCenter Server's current system time.

  3. Task Failure: Because the token is considered "not yet valid," authentication fails (vim.fault.InvalidLogin), preventing the LCM service from obtaining a Task ID and leading to a Connection refused or No route to host exception in the LCM logs.

Resolution

To resolve this issue, time synchronization must be restored across all VCF components.

  1. Verify System Time:

    • Log in via SSH to the SDDC Manager, vCenter Server, and ESXi hosts.

    • Run the date command on each to identify discrepancies.

  2. Correct NTP Configuration:

    • Ensure all components are pointed to the same reliable NTP server(s).

    • In the vCenter Management Interface (VAMI) at https://<vcenter-ip>:5480, ensure the time synchronization mode is set correctly.

  3. Restart Services:

    • Once time is synchronized, restart services on the vCenter Server: service-control --stop --all && service-control --start --all

    • Restart the LCM service on the SDDC Manager: systemctl restart lcm

  4. Retry Upgrade:

    • Return to the SDDC Manager UI and click Retry on the failed upgrade task.