Remediating host in vLCM cluster with Hostbased Service VM Deployment fails after 95%
search cancel

Remediating host in vLCM cluster with Hostbased Service VM Deployment fails after 95%

book

Article ID: 324674

calendar_today

Updated On:

Products

VMware NSX Networking

Issue/Introduction

Symptoms:
  • The remediation progress for a host will be stuck at 95% and then fail after 70-minute timeout is completed.
  • The error can be seen on the VC UI. You can also check VC logs for more details about the error.


Environment

VMware NSX-T

Cause

This issue occurs when an un-prepped or a non-compliant ESX is added to a vLCM Cluster where Hostbased Service VMs are Deployed. There is a deadlock caused between EAM and NSX after the Remediation reaches 95%.

Resolution

This is a known issue affecting VMware NSX-T 3.x.
Currently, there is no resolution.

Workaround:
To work around this issue, use one of these options:

Option -1 If Remediation has already started/failed
  1. After the 70 min timeout, the Remediation will fail. Manually check Compliance of the Cluster. Check if the status of ESX is Compliant (The status of ESX will change from "Remediation Failed" to "Compliant").
  2. Manually re-apply TNP from NSX Manager to the ESX. Service VM Deployment will also start for the ESX automatically.

Option - 2 - Avoid the 70-minute timeout by putting ESX in Maintenance Mode before Remediating it
  1. Place the newly added ESX into Maintenance Mode.
  2. Remediate the ESX using vLCM.
  3. The Remediation will still fail at 95% but this time, the failure will happen without any timeout.
  4. Manually take the ESX out of Maintenance Mode after the Remediation fails.
  5. Manually check Compliance of the Cluster. You will find that the ESX is Compliant (The status of ESX will change from "Remediation Failed" to "Compliant").
  6. Manually re-apply TNP from NSX Manager to the ESX. Service VM Deployment will also start for the ESX automatically.