Remediation of Hosts in vLCM cluster fails at the cluster health check
search cancel

Remediation of Hosts in vLCM cluster fails at the cluster health check

book

Article ID: 390505

calendar_today

Updated On:

Products

VMware NSX VMware vCenter Server VMware vSphere ESXi

Issue/Introduction

  • Attempting to remediate a cluster using vCenter Server’s vLCM component fails with the message: “A general system error occurred: Health Check for <cluster name>”

  • Navigating to Cluster -> Updates -> Image shows the error message:  "Error invoking getSoftwareSolutionsBeingApplied API".




  • On vCenter Server, in /var/log/vmware/vmware-updatemgr/vum-server/vmware-vum-server.log :

YYYY-MM-DDTHH:MM:SS error vmware-vum-server[57001] [Originator@6876 sub=EHP] Response from <nsx-manager-ip>/api/v1/vlcm/esx/health/cluster/perspectives/ready-for-apply/status?action=check: HTTP Status:500 'Internal Server Error'
YYYY-MM-DDTHH:MM:SS error vmware-vum-server[57001] [Originator@6876 sub=EHP] Error calling Nsxt API /api/v1/vlcm/esx/health/cluster/perspectives/ready-for-apply/status?action=check
YYYY-MM-DDTHH:MM:SS info vmware-vum-server[57001] [Originator@6876 sub=EHP] [cluster-id] A provider has finished (0 remaining).
YYYY-MM-DDTHH:MM:SS info vmware-vum-server[57001] [Originator@6876 sub=EHP] [cluster-id] All providers have finished. Elapsed time (sec): 7
YYYY-MM-DDTHH:MM:SS info vmware-vum-server[57001] [Originator@6876 sub=EHP] [cluster-id] [com.vmware.vpxd.healthPerspectives.ready_for_apply.ha] returned status: OK
YYYY-MM-DDTHH:MM:SS info vmware-vum-server[57001] [Originator@6876 sub=EHP] [cluster-id] [com.vmware.vcIntegrity.lifecycle.health.internal.external_provider] returned status: NOT_OK
YYYY-MM-DDTHH:MM:SS info vmware-vum-server[57001] [Originator@6876 sub=EHP] Entity [cluster-id] health status is: NOT_OK
YYYY-MM-DDTHH:MM:SS info vmware-vum-server[58078] [Originator@6876 sub=RemediateClusterTask] [HealthCheck 434] CheckHostHealth -  check status name = HA image constraints check -
YYYY-MM-DDTHH:MM:SS info vmware-vum-server[58078] [Originator@6876 sub=RemediateClusterTask] [HealthCheck 434] CheckHostHealth -  check status name = Executes remote health checks for service NSX Manager -
YYYY-MM-DDTHH:MM:SS info vmware-vum-server[58078] [Originator@6876 sub=RemediateClusterTask] [ApplyHelpers 539] CheckClusterHealth - calling EHP check completed. (cluster name = <cluster-name>) - (perspective = 1) - (check result status = 3) - (check timeout = 4200)
YYYY-MM-DDTHH:MM:SS info vmware-vum-server[58078] [Originator@6876 sub=RemediateClusterTask] [ApplyHelpers 565] CheckClusterHealth - check -  issues size = 0 - status = 0 -  check = com.vmware.vpxd.healthPerspectives.ready_for_apply.ha  name = HA image constraints check - description = This check verifies that the HA constraints in the image spec about to be applied are set appropriately.
YYYY-MM-DDTHH:MM:SS info vmware-vum-server[58078] [Originator@6876 sub=RemediateClusterTask] [ApplyHelpers 565] CheckClusterHealth - check -  issues size = 1 - status = 3 -  check = com.vmware.vcIntegrity.lifecycle.health.internal.external_provider  name = Executes remote health checks for service NSX Manager - description = Executes remote health checks for service NSX Manager.
YYYY-MM-DDTHH:MM:SS info vmware-vum-server[58078] [Originator@6876 sub=RemediateClusterTask] [ApplyHelpers 594] CheckClusterHealth - health check error - (cluster id = cluster-id) - (cluster name = <cluster-name>) - (perspective = 1) - (status = 3)
YYYY-MM-DDTHH:MM:SSinfo vmware-vum-server[58078] [Originator@6876 sub=RemediateClusterTask] [Task, 457] Task:com.vmware.vcIntegrity.lifecycle.RemediateClusterTask ID:#########-####-####-####-########8182. Finalizing Task
YYYY-MM-DDTHH:MM:SS info vmware-vum-server[58078] [Originator@6876 sub=EHP] Deleting cached credentials for vapi session ID ####################################f357
YYYY-MM-DDTHH:MM:SSinfo vmware-vum-server[58078] [Originator@6876 sub=RemediateClusterTask] [ApplyHelpers 1052] Updating status with failure. -originator = vSphere Lifecycle Manager - retriable = false
YYYY-MM-DDTHH:MM:SS info vmware-vum-server[58078] [Originator@6876 sub=RemediateClusterTask] [Task, 457] Task:com.vmware.vcIntegrity.lifecycle.RemediateClusterTask ID:#########-####-####-####-########8182. RemediateClusterTask - FinalizeTaskResult - remaining hosts set size : 3
YYYY-MM-DDTHH:MM:SS info vmware-vum-server[58078] [Originator@6876 sub=RemediateClusterTask] [Task, 457] Task:com.vmware.vcIntegrity.lifecycle.RemediateClusterTask ID:#########-####-####-####-########8182. RemediateClusterTask - FinalizeTaskResult - skip remaining host(<host-fqdn>).
YYYY-MM-DDTHH:MM:SS info vmware-vum-server[58078] [Originator@6876 sub=RemediateClusterTask] [Task, 457] Task:com.vmware.vcIntegrity.lifecycle.RemediateClusterTask ID:#########-####-####-####-########8182. Finalize host(=host-#######)  status(=2)
YYYY-MM-DDTHH:MM:SS info vmware-vum-server[58078] [Originator@6876 sub=RemediateClusterTask] [Task, 457] Task:com.vmware.vcIntegrity.lifecycle.RemediateClusterTask ID:#########-####-####-####-########8182. RemediateClusterTask - FinalizeTaskResult - skip remaining host(<host-fqdn>).
YYYY-MM-DDTHH:MM:SS info vmware-vum-server[58078] [Originator@6876 sub=RemediateClusterTask] [Task, 457] Task:com.vmware.vcIntegrity.lifecycle.RemediateClusterTask ID:#########-####-####-####-########8182. Finalize host(=host-#######)  status(=2)
YYYY-MM-DDTHH:MM:SS info vmware-vum-server[58078] [Originator@6876 sub=RemediateClusterTask] [Task, 457] Task:com.vmware.vcIntegrity.lifecycle.RemediateClusterTask ID:#########-####-####-####-########8182. RemediateClusterTask - FinalizeTaskResult - skip remaining host(<host-fqdn>).
YYYY-MM-DDTHH:MM:SSinfo vmware-vum-server[58078] [Originator@6876 sub=RemediateClusterTask] [Task, 457] Task:com.vmware.vcIntegrity.lifecycle.RemediateClusterTask ID:#########-####-####-####-########8182. Finalize host(=host-#######)  status(=2)
YYYY-MM-DDTHH:MM:SS info vmware-vum-server[58078] [Originator@6876 sub=RemediateClusterTask] [Task, 457] Task:com.vmware.vcIntegrity.lifecycle.RemediateClusterTask ID:#########-####-####-####-########8182. Not updating the commitId since not all the hosts in the cluster were successfully remediated.
YYYY-MM-DDTHH:MM:SS error vmware-vum-server[58078] [Originator@6876 sub=RemediateClusterTask] [Task, 457] Task:com.vmware.vcIntegrity.lifecycle.RemediateClusterTask ID:#########-####-####-####-########8182. Task Failed. Error: Error:
-->    com.vmware.vapi.std.errors.error
--> Messages:
-->    com.vmware.vcIntegrity.lifecycle.TaskError.HealthCheckFailed<Health Check for '<cluster-name>' failed>
-->

  • The below errors may also be observed in vmware-vum-server.log:

    YYYY-MM-DDTHH:MM:SS info vmware-vum-server[1397653] [Originator@#### sub=EHP opID=[] Calling NSX-T API /api/v1/vlcm/esx/health/cluster/perspectives/ready-for-apply/st
    atus?action=check (/external-tp/http1/##.##.###.##/443/##############################/E####FA9/api/v1/vlcm/esx/health/cluster/perspectives/ready-for-apply/status?action=check).
    <Timestamp> info vmware-vum-server[1680039] [Originator@6876 sub=EHP opID=[] [domain-] A provider [wcp] has finished (2 remaining).
    ------------------------------------------------
    Task Failed. Error: Error:
    -->    com.vmware.vapi.std.errors.error
    --> Messages:
    -->    com.vmware.vcIntegrity.lifecycle.TaskError.HealthCheckFailed<Health Check for 'Cluster name' failed>
    -->

    YYYY-MM-DDTHH:MM:SS warning vmware-vum-server[1424181] [Originator@6876 sub=TaskStatsCollector] [taskStatsCollector 190] Task type or creation time not present
    YYYY-MM-DDTHH:MM:SS info vmware-vum-server[1680038] [Originator@6876 sub=PM.AsyncTask.RemediateClusterTask{1751}] [vciTaskBase 1496] SerializeToVimFault fault:
    --> (vmodl.fault.SystemError) {
    -->    faultCause = (vmodl.MethodFault) null,
    -->    faultMessage = (vmodl.LocalizableMessage) [
    -->       (vmodl.LocalizableMessage) {
    -->          key = "com.vmware.vcIntegrity.lifecycle.TaskError.HealthCheckFailed",
    -->          arg = (vmodl.KeyAnyValue) [
    -->             (vmodl.KeyAnyValue) {
    -->                key = "1",
    -->                value = "Cluster name"
    -->             }
    -->          ],
    -->          message = <unset>
    -->       }
    -->    ],
    -->    reason = "vLCM Task failed, see Error Stack for details."



    YYYY-MM-DDTHH:MM:SS info vmware-vum-server[2057606] [Originator@6876 sub=EHP opID=########-####-####-####-########f9fa] Calling NSX-T API /api/v1/vlcm/esx/health/cluster/perspectives/ready-for-apply/status?action=check (/extern
    al-tp/http1/##.###.##.##/443/##############################/api/v1/vlcm/esx/health/cluster/perspectives/ready-for-apply/status?action=check).
    YYYY-MM-DDTHH:MM:SS error vmware-vum-server[2057606] [Originator@6876 sub=EHP opID=########-####-####-####-########f9fa] Response from localhost/external-tp/http1/##.###.##.##/443/##############################/api/v1
    /vlcm/esx/health/cluster/perspectives/ready-for-apply/status?action=check: HTTP Status:500 'Internal Server Error'
    YYYY-MM-DDTHH:MM:SS warning vmware-vum-server[2057606] [Originator@6876 sub=EHP opID=########-####-####-####-########f9fa] Retrying on next NSX-T node due to HTTP 500.
    YYYY-MM-DDTHH:MM:SS error vmware-vum-server[2057606] [Originator@6876 sub=EHP opID=########-####-####-####-########f9fa] No reachable NSX-T node found.

  • On the NSX Manager, in /var/log/upgrade-coordinator/upgrade-coordinator.log:
     

    YYYY-MM-DDTHH:MM:SS ERROR http-nio-127.0.0.1-####-exec-4 LcmRestClient 83987 FABRIC [nsx@#### comp="nsx-manager" errorCode="MP31815" level="ERROR" subcomp="upgrade-coordinator"] 
    Error in rest call url= //api/esx/settings/clusters/domain<domain_number>/software/solutions-being-applied-internal/com.vmware.nsxt , method= GET , response= FORBIDDEN: {"error_type":"UNAUTHORIZED","messages":[]}
    ...
    org.springframework.web.client.HttpClientErrorException$Forbidden: 403 Forbidden: "{"error_type":"UNAUTHORIZED","messages":[]}"
    YYYY-MM-DDTHH:MM:SS  INFO http-nio-127.0.0.1-####-exec-4 VLCMOperationsServiceImpl #### FABRIC [nsx@6876 comp="nsx-manager" level="INFO" subcomp="upgrade-coordinator"] VLCM API error details: {"error_type":"UNAUTHORIZED","messages":[]}

    YYYY-MM-DDTHH:MM:SS  INFO http-nio-127.0.0.1-####-exec-4 EHPServiceImpl #### SYSTEM [nsx@#### comp="nsx-manager" level="INFO" subcomp="upgrade-coordinator"] Error while calling getSoftwareSolutionsBeingApplied API. So returning NOT_OK

     

  • In the NSX Manager UI at System → Fabric → Compute Manager, the vCenter Server Registration dialog shows No/Disabled beside “Create Service Account

Environment

VMware NSX

VMware vCenter Server

VMware vSphere ESXi

Cause

  • When using vLCM to manage the images on hosts in vCenter Server clusters, NSX must be configured to ‘Create Service Account' when registering with vCenter as a Compute Manager.  See Prepare an NSX Cluster with vSphere Lifecycle Manager for further information.
  • This is true even if no clusters are prepared for NSX. If NSX is connected to the vCenter as a Compute Manager and vLCM is enabled, this setting must be enabled.

Resolution

  1. In the NSX GUI, navigate to System → Fabric → Compute Managers
  2. Click the checkbox beside the appropriate vCenter Server
  3. Click Edit
  4. Ensure all connection information is correct
  5. Click the slider beside Create Service Account and ensure it reads Yes as in the screenshot below
  6. Click Save

Note: If  the 'UNAUTHORIZED' or '403 Forbidden errors' (as shown in the log snippets above) is received, please ensure the account being used to connect to vCenter has sufficient privileges.

Additional Information