Title: Agent that sends metrics and status to NSX+ is unhealthy.
Event ID: nsxplus_communication.metrics_agent_unhealthy
Added in release: 4.1.1
Alarm Description: The agent responsible for sending metrics and status to NSX+ (Metrics agent) on specified NSX Manager node is not able to send data to NSX+.
- Purpose - Monitor Metrics Agent health status.
- Impact - This failure results in stale or missed metrics data and status on NSX+.
Resolution: The metrics agent can become unhealthy due to following reasons:
- Case 1 - Failure reason says 'Metrics agent is not connected to NSX+'. For NSX+ issues, VMware SRE team will have been notified and the issue will be addressed soon.
- Case 2 - Failure reason says 'Unable to get Metrics agent health status'. This indicates that the Metrics agent is not working properly. Log in to NSX Manager's root shell and check the agent status by using the following NSX CLI command 'service nsx-metrics-agents status'. If the status is not Active/Running, then try restarting metrics agent using the following NSX CLI command 'service nsx-metrics-agents restart'.
API Reference: Following NSX APIs can be used to check the status of metrics agent on NSX for better debugging.
- Realization Status - Following API will return realization info for metrics agent entities on NSX. Look for 'state' in the API response. If the state is 'REALIZED' for all entities, then this means that the metrics agent configuration is successfully realized on NSX.
GET https://nsx-mgr-ip-address/policy/api/v1/infra/realized-state/realized-entities?intent_path=/infra/sites/agents/metrics
- Consolidated Status - Use following API to view the status of metrics agent on NSX. Look for 'consolidated_status' field in the API response. Ideally the status should be 'SUCCESS'.
GET https://nsx-mgr-ip-address/policy/api/v1/infra/realized-state/status?intent_path=/infra/sites/agents/metrics
Maintenance window required for remediation?No