MPS service down alarm in NSX due to incorrect DNS server configuration

Products

VMware vDefend Firewall with Advanced Threat Prevention

Issue/Introduction

NSX alarm: Malware Prevention Health — Service Status Down High on one or more transport nodes.
Alarm persists after being manually resolved even if the deployment is successful.
In the SVM cli, two or more Docker containers on the SVM are in a CrashLoopBackOff / restart loop.
- docker ps -a : <== look for restart time on the output
Container logs show repeated: nginx: [emerg] host not found in upstream "<SSP-FQDN>"
- docker logs <container-name>
systemd-resolved logs show degraded feature-set warnings (UDP/TCP fallback) for the configured DNS server.

Environment

SVM 5.0
SVM 5.1
SVM 5.1.1

Cause

The SVM's systemd-resolved is configured with an incorrect or stale DNS server IP — typically the result of an outdated NSX IP pool DNS setting that was applied when the SVM was initially deployed. Because the SVM cannot resolve the SSP FQDN, the nginx reverse-proxy container and other container fail on startup and enters a crash loop, causing the MPS service to report as down.

Resolution

NSX IP Pool Update

If the NSX IP pool config is wrong or outdated, update the NSX IP pool for SVM and redeploy, such as DNS server config. Once the deployment is successful, the alarm should be cleared automatically after sometime.

Login to SVM

Connect to the affected SVM via SSH using the default credentials:

Username: root
Password: <default-password>

Note: You will be prompted to change the password upon first login.

Step 1 — Verify IP Pool Misconfiguration on the SVM

SSH to the affected SVM as root, then run the following commands to confirm the IP assignment is incorrect:

ip addr show
ip route show
cat /etc/netplan/*.yaml        # or equivalent network config path
resolvectl status
nc -vz <SSP> 443
nc -vz <SSP-message> 9092
curl -k https://<nsx-manager-fqdn>/api/v1/node

Expected result: If the SVM's IP address does not fall within the expected pool range, the gateway is unreachable, or the NSX Manager API returns a connection error, IP pool misconfiguration is confirmed. Proceed to Step 2.

Step 2 — Identify and Correct the NSX IP Pool in NSX Manager

2a. Locate the Misconfigured IP Pool

Log in to NSX Manager UI (https://<nsx-manager-fqdn>).
Navigate to Networking → IP Management → IP Address Pools.
Identify the IP pool assigned to SVM deployment (commonly named for the service or cluster it serves).
Verify the following fields against the correct network design document:

Field	Expected Value
IP Range	<correct-start-ip> – <correct-end-ip>
Gateway	<correct-gateway-ip>
Prefix Length	<correct-prefix> (e.g., 24)
DNS Server(s)	<correct-dns-server-ip>
DNS Suffix	<correct-domain-suffix>

2b. Update the IP Pool

Click Edit on the identified IP pool.
Update all incorrect fields to match the expected values from the table above.
Click Save and confirm the changes.

Note: Changes to the IP pool do not automatically re-IP existing SVMs. A redeploy is required (Step 3).

Step 3 — Redeploy the SVM

3a. From NSX Manager UI

Navigate to Security → Service Deployments (or System → Service Deployments depending on NSX version).
Locate the affected SVM deployment associated with the updated IP pool.
Select the deployment and click Redeploy (or Resolve if shown as an error state).
Confirm the action when prompted.
Monitor the deployment status — it should progress through: Deploying → Configuring → Success.

3b. Monitor Redeployment Progress

Track the deployment status in NSX Manager:

Security → Service Deployments → [Deployment Name] → Status

You can also monitor the SVM boot and configuration from the vCenter console if direct SSH is not yet available.

Individual SVM DNS update:

Login to SVM

Connect to the affected SVM via SSH using the default credentials:

Username: root
Password: (provide default password)

Note: You will be prompted to change the password upon first login.

Step 1 — Verify DNS Misconfiguration on the SVM

SSH to the affected SVM as root, then run the following commands:

systemctl status systemd-resolved
resolvectl status
nslookup <SSP-FQDN>
nslookup <SSP-FQDN> <correct-dns-server-ip>

Expected result: If nslookup <SSP-FQDN> fails but resolves successfully when pointed directly at the correct DNS server, DNS misconfiguration is confirmed. Proceed to Step 2.

Step 2 — Correct the DNS Server Configuration

Open the resolved configuration file:

vim /etc/systemd/resolved.conf

Locate the [Resolve] section. Uncomment and update the DNS= line with the correct DNS server IP:

[Resolve]
DNS=<correct-dns-server-ip>

Save and exit, then restart the service:

systemctl restart systemd-resolved

Validate that the FQDN now resolves correctly:

nslookup <SSP-FQDN>

Step 3 — Restart Affected Docker Containers

Check the current status of all containers:

docker ps -a

If any containers are in a restart loop, restart the reverse-proxy container:

docker restart nsx-lastline-<container-name>

Monitor container status until all are healthy:

watch docker ps -a

Note: If containers do not stabilize after the restart, reboot the SVM:
reboot

Step 4 — Verify Service Health

Confirm the following once all steps are complete:

Check	Expected Result
Container status	All containers show Up in docker ps -a
NSX UI	MPS alarms have cleared
DNS resolution	SSP FQDN resolves correctly via nslookup
Log validation	Logs reflect HTTP 200 responses