Remediation of a cluster failed as compliance check reported the host as 'non-compliant' when using Cisco HSM for vLCM
search cancel

Remediation of a cluster failed as compliance check reported the host as 'non-compliant' when using Cisco HSM for vLCM

book

Article ID: 407331

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

  • When performing remediation of a single-image cluster through Lifecycle Manager in vCenter Server, the process fails during the final compliance check for each host in the cluster.
  • When a manual compliance check is run, the host then becomes compliant and remediation can be triggered for the remaining hosts in the cluster.
  • Issue is only seen when Cisco HSM is used to manage 'firwmare and driver addon' on the cluster.

  • Reviewing Update manager logs at /var/log/vmware/vmware-updatemgr/vum-server/vmware-vum-server.log, the below error is seen:

    YYYY-MM-DDTHH:MM:SS info vmware-vum-server[1280836] [Originator@6876 sub=ServiceProvider] [EmbeddedPyServiceProvider 1834] HSM Task Info JSON String: {"operationStatusCode": 200, "messages": [], "id": "task-########", "description": "Task task-########", "action": "SCAN", "startTime": "YYYY-MM-DD HH:MM:SS +0000 UTC m=+#####.######", "hosts": ["host-#####"], "status"
    : "SUCCEEDED", "progress": 100, "complianceScanMap": {"host-#####": {"fwCompliance": "NON-COMPLIANT", "fwStageStatus": "STAGED", "internalIntegrity": "NON-COMPLIANT", "deviations": [{"systemComponent": {"name": "ServerFirmware Package", "id": "HUU", "description": "ServerFirmware Package", "type": "OTHER"}, "compliance": "NON-COMPLIANT", "currentVersion": "5.2(2.240074)", "targetVersion": "5.2
    (2.240074)", "messages": ["Deviation in HUU component."]}], "impact": {"maintenanceModeRequired": true, "preImageUpdateRequired": true, "intermediateRebootRequired": true, "intermediateRebootHwResetRequired": true, "postImageUpdateRequired": true, "finalRebootRequired": false, "finalRebootHwResetRequired": false}, "messages": ["The 'Distributable' Configured in Intersight supports the model of
    YYYY-MM-DDTHH:MM:SS info vmware-vum-server[09853] [Originator@6876 sub=Telemetry] [TelemetryManager 261] Sending telemetry data: {"@type":"pman_error_report","taskId":"#######-####-####-####-######|#######-####-####-####-########","entityId":"######-####-####-####-######|host-#####","parentTaskId":"","errorMessageId":"com.vmware.vcIntegrity.lifecycle.RemediateClus
    terTask.HostNotCompliantAfterRemediation","errorMessage":"After host 'ESXi_FQDN' remediation completed, compliance check reported host as 'non-compliant'. The image on the host does not match the image set for the cluster. Retry the cluster remediation operation.","errorTime":"HH:MM:SS"}

  • Reviewing the HSM logs at the same time at /var/log/vmware/vmware-updatemgr/vum-server/hsm-service.log shows that the POST_IMAGE_UPDATE check for the host has succeeded, thereby triggering the compliance check scan.

    YYYY-MM-DDT:HH:MM:SS info vmware-vum-server[207251] [Originator@6876 sub=Hsl::RemediationManager opID=#######-####-####-####-######] [RemediationManager 352] PerformHwUpdate called action: POST_IMAGE_UPDATE for host: host-##### with host name: ESXi_FQDN
    YYYY-MM-DDT:HH:MM:SS [Dummy-54]hsmService:419 [INFO] Retrieved information of task task-########: {"id":"task-########","description":"Task task-######## Intersight WF:","action":"POST_IMAGE_UPDATE","startTime":"YYYY-MM-DD HH:MM:SS,"hosts":["host-#####"],"status":"SUCCEEDED","progress":100,"operationStatusCode":200,"messages":["POSTUPDATE passed for host host-#####"]}

Environment

VMware vCenter Server 8.x
VMware ESXi 8.x

Cause

The Cisco HSM reports the POST_IMAGE_UPDATE status as "succeeded" despite the blade discovery process at the HSM still being in progress.
Consequently, vCenter initiates a host compliance check, which immediately fails.

Resolution

Cisco is aware of this issue and are working towards a fix.

In the meantime, the below workaround can be attempted:
Once the blade discovery process completes on Cisco HSM, manually run a compliance check on the host where remediation has failed. The host would then be compliant with the cluster image.