NSX Malware Prevention Service VM Fails To Register With NSX
book
Article ID: 317199
calendar_today
Updated On:
Products
VMware NSX
Issue/Introduction
Symptoms:
One or more Malware Prevention Service VMs have ‘Service Health Status’ as down after deployment.
Scenario 1:
SVM deployment fails while trying to power-on SVM due to insufficient resources on the host (CPU or memory)
Deployment Status is shown as Down on the NSX UI and we see “Insufficient resources. Cannot deploy agent on host due to insufficient resources (cpu or memory)“ message in the list of issues.
AND It takes more than 60 minutes to resolve the issue.
Scenario 2:
SVM Deployment is successful but the SVM Deployment Health Status is down even after resolve. The deployment status details will show Security Hub service down.
Relevant Logs Location:
On the SVM, as well as in the SVM support bundle, we see the below logs in in /var/log/syslog
nsx@6876 comp="nsx-mps-svm" subcomp="python" username="root" level="ERROR" errorCode="('CLI110',)"] POST /napp/api/v1/platform/trust-management/certificates returned status: 403#012b'{"module_name":"common-services","error_message":"The credentials were incorrect or the account specified has been locked.","error_code":403}'
Failed to register SVM to NSX
Scenario 3:
On a cluster with more than 50-60 hosts, new Malware Prevention service deployment or upgrade of Malware Prevention service is triggered and it takes more than an hour for the deployment to be successfully completed.
The deployment is completed successfully, but the health status of few of the SVMs is down.
From vCenter UI, for a SVM with health status down, if we navigate to the ‘Monitor’ -> ‘Tasks’ screen, we see that there is more than an hour between ‘Deploy OVF template’ and “Power On virtual machine” tasks
Sample Image :
Environment
VMware NSX-T Data Center VMware NSX-T Data Center 4.x VMware NSX-T Data Center 3.x
Cause
This is because of expiry of certificates used to register SVM with NSX. There can be multiple reasons for this :
Case 1: The NTP server(s) configured on NSX Manager and ESXi(s) are different. If the NTP servers on the host and ESX do not match, it is possible that the certificate validity start time is out of sync with the NSX Application Platform NTP settings, and hence when such a certificate is registered, NAPP trust manager throws an error that the certificate is invalid.
Case 2: Due to various reasons (eg. CPU or memory resources insufficient, etc) the SVM might not be able to power-on on the host. If the time required to resolve this issue is more than 30 minutes, the certificate created for SVM-NSX communication might expire and there will be a communication error when the SVM finally powers on.
Case 3: On a scale cluster with more than 50-60 hosts, depending on the network and storage latency, it might take more than an hour for the SVM to be deployed and powered-on on a few hosts. This causes authentication errors in communication between SVM and NSX.
Resolution
Workaround:
Ensure that the NSX NTP settings and ESX NTP settings are in sync.
Manually delete the problematic SVM from ESX or VC UI(ignore the warnings shown regarding the VM being an EAM managed entity)
Go to the Deployments page in NSX UI and resolve the corresponding deployment(s).
In case of Scenario 3, on a cluster with more than 50 hosts, deployment can be triggered in batches of 50 hosts (other hosts can be kept in Maintenance mode) and after deployment is successful, we can remove the other hosts from maintenance mode.
Resolution:
There is no fix available for this issue. Workaround steps need to be followed for the resolution.
Additional Information
Impact to Customer:
The NSX Malware Prevention functionality will not work on the specific host. The SVM deployment health status will be down.