Issue
The Metrics Delivery Failure alarm in the NSX UI indicates that metrics are failing to transmit from NSX components (Manager, Transport Nodes, or Edge Nodes) to the Security Services Platform (SSP).
Impact
Maintenance Window
Not required. Most remediation steps are non-disruptive.
However, restarting key services (e.g. SHA, proton, netopad) may briefly impact metric collection.
vDefend SSP 5.0 or later
There are multiple possible root causes for this alarm. The most common are:
Certificate mismatch between NSX and SSP
Outdated/stale trust entities in authserver
API certificate on NSX Manager rotated but not refreshed in SSP
Transport Node (TN) or Edge certificate mismatch
Network/firewall or DNS issues preventing TN to SSP communication
When a Metrics Delivery Failure alarm is raised in NSX → SSP integration, the troubleshooting depends on the status code included in the alarm.
In the NSX UI, expand the Metrics Delivery Failure alarm.
The alarm description includes a status code (e.g. UNAUTHENTICATED, UNAVAILABLE, PERMISSION_DENIED).
This status code is critical — it determines the troubleshooting path.
Note the affected node(s) and the status code.
The possible status codes are:
UNAUTHENTICATED – Certificate sync/authentication issue
UNAVAILABLE or DEADLINE_EXCEEDED – Network/DNS/firewall issue
PERMISSION_DENIED – Authorization failure on SSP side
This usually indicates a certificate synchronization issue between NSX and SSP.
The SSP cannot authenticate metrics sent by NSX nodes (Manager, Edge, or TN).
NSX API certificate was replaced but not updated in SSP truststore.
Transport Node (TN) or Edge node certificate changed after SSP deployment.
SSP Authserver pod is missing one or more NSX trust entities.
SHA agent running with stale certificates.
Sometimes stale config causes this. You can refresh it:
Retrieve current config:
Copy the full JSON response.
Disable metrics temporarily:
Change
"enabled": true→"enabled": false.Wait 1–2 minutes.
Re-enable metrics:
Change
"enabled": false→"enabled": true.Send PATCH again.
This forces NSX Manager and SSP to refresh their metric delivery config.
If still failing, continue below.
On impacted TN / Edge Node:
Get Node UUID:
Save Node Certificate:
On NSX Manager:
Get messaging client certificates:
Match
client_idwith the Node UUID noted earlier.Check if the certificate matches
host-cert.pem.
On SSP UI:
Navigate to System → Certificates.
Locate certificate named:
NSX_UA_TN <NODE_UUID> or
NSX_UA_EDGE <NODE_UUID>.
Export this certificate.
Compare with host-cert.pem.
On SSP Installer (Authserver validation):
List authserver pod:
Restart it:
After restart, check logs for cert sync:
To view full certificate in logs:
Validate if the certificate matches the one saved earlier.
NSX_UA_TN)
If trust entity is missing in authserver config:
Find line:
Add missing entity:
Authserver will restart and sync certs.
Validate new node’s cert (/etc/vmware/nsx/host-cert.pem) exists in SSP trust manager.
Restart authserver pod to refresh:
Step 1: Check which certificate SHA agent is using on the host (from NSX Manager side)
On NSX 4.2+, you can directly ask the SHA process about its certificates:
This shows the root certificate and node certificate that SHA is currently using.
Alternatively, if you can’t run that, you can check syslog for SHA startup messages:
Those lines show which certificate/profile SHA used when connecting to SSP.
But if the logs are rotated, you might not find it.
Step 2: Get the API certificate from NSX Manager
Every NSX Manager node has its own API certificate (used for management/API communication).
For checking it:
Log in to NSX Manager UI → System > Certificates.
Find the API certificate that belongs to your Manager node (you identify the right Manager node by UUID).
Copy that cert’s UUID.
Then query it via API:
This returns:
Full cert (
pem_encoded)Thumbprint (
leaf_certificate_sha_256_thumbprint)Who is using it (
used_bysection → service_types: "API")
Compare the node certificate in use by SHA agent (from step 1) vs. the API certificate currently installed on NSX Manager (from step 2).
If the NSX Manager’s API certificate was recently rotated/replaced, SHA might still be holding the old certificate due to which SHA agent cannot authenticate to NSX Manager/SSP correctly.
If the SHA cert in use ≠ the current API cert:
Restart SHA agent so it re-fetches the updated certificate:
If that doesn’t help (SHA is still stuck), restart proton:
protonis the higher-level security framework service that manages SHA and related processes — restarting it forces a re-registration of trust.
Each Transport Node (ESXi/Edge) has its own node certificate.
The SHA agent uses that cert to prove its identity to SSP.
If the TN’s cert changes (for example after a rotation), the SHA agent might still be trying to use the old cert, which no longer matches → authentication fails.
Step 1 – Check which certificate SHA agent is actually using
On NSX 9.0 or higher:
On a Transport Node (ESXi host):
On an Edge node:
On NSX below 9.0:
Search the nsx-syslog logs instead:
(You may need to unzip archived
nsx-syslogbundles first.)If logs are rotated, you may not find it.
Step 2 – Check what the current node certificate really is
This shows the effective Transport Node certificate.
Step 3 – Compare
If the SHA agent is using a different cert than the current TN cert → mismatch detected.
Step 4 – Fix the mismatch
Restart services so SHA re-reads the correct certificate:
Restart SHA agent:
Restart exporter:
Step 5 – If still not fixed
Do a full sync of trust between NSX and SSP:
Restart proton (leader/common agent on NSX Manager):
→ This forces NSX Manager to re-sync all certs to transport nodes.
→ Wait a few minutes for the sync to complete.Restart authserver on SSP side:
→ This makes SSP reload the updated certificate from trust manager.
Check network/firewall
Make sure there’s no firewall blocking traffic from TN → SSP FQDN on TCP 443.
Reference required ports: Broadcom Ports Guide.
Check SSP registration info (from NSX Manager API):
Look at the response:
ingress_ip_address→ should match your SSP FQDN.Confirm this matches what nodes are actually using.
Validate DNS resolution
On the reported TN, ensure DNS resolves theingress_ip_address(FQDN) to the correct SSP address.If many nodes are impacted
Check for a manager disconnection alarm:nsx_application_platform_communication.manager_disconnected
→ Fix that first, because it breaks communication for all TNs.
Check envoy logs (SSP ingress proxy)
Get envoy pod name:
View envoy logs:
Look for API response flags
Example log:
UAEX= UnauthorizedExternalService
→ usually means the auth-server pod is down.Check auth-server pod status
If it’s not running → contact support for deeper investigation.
If above checks don’t resolve:
On NSX Manager / Edge:
On ESXi host: