Remediate NSX Application Platform Health Alarm Due to Service Status Being Down

search cancel

Remediate NSX Application Platform Health Alarm Due to Service Status Being Down

book

Article ID: 320808

calendar_today

Updated On:

Products

VMware vDefend Firewall VMware vDefend Firewall with Advanced Threat Prevention

Issue/Introduction

Symptoms:

The NSX Application Platform Health alarm triggers when any pod enters a crashloopback state.
The alarm message indicates "service status down" without specifying which specific pod is affected or providing actionable insights.
The intelligence of the system is degraded as a result of the alarm.
The alarm message indicates "service status down" with component K8_Platform_Service.

Environment

VMware NSX 4.1.1

NAPP 4.1.2.1

Resolution

This issue is addressed in version 4.2.0 of the NSX Application Platform.

Workaround:

To mitigate the issue, follow these steps:

Identify pods not in the RUNNING or SUCCEEDED (completed) state:
napp-k get pods --field-selector status.phase!=Running,status.phase!=Succeeded,metadata.namespace!=kube-system,metadata.namespace!=vmware-system-csi,metadata.namespace!=vmware-system-auth,metadata.namespace!=vmware-system-cloud-provider --all-namespaces
Delete the affected pod: napp-k delete pod <pod-name> -n namespace
If the issue persists, perform a rollout restart of the deployment : napp-k rollout restart <statefulsetdeployment> <service_name> -n namespace

Additionally if the alarm message indicates "service status down" with component K8_Platform_Service, you can proceed with the following recommended workaround.

Workaround:

To mitigate this issue check the status of services by running the below mentioned API.

Run the below API to get NAPP feature health details (Use any API client like Postman etc) :

GET https://<nsx>/napp/api/v1/platform/monitor/feature/health

You can also use the below curl command to obtain the details of the NAPP health (execute it from NSX manager under root mode ) :

curl -k -u -X GET -u admin https://nsxmanager.eng.vmware.com/napp/api/v1/platform/monitor/feature/health >> /tmp/napp01.txt

In above command , we are redirecting the output to a file called "napp01.txt" under /tmp in NSX Manager

Next Steps Based on Output:

If all services are in the "Ready" state
The alarm can be safely cleared, and the issue should be considered resolved.

If any service status is "Down"
Please refer to the following knowledge base article for detailed troubleshooting steps:
Troubleshooting Alarms and Performance Issues in NSX Application Platform– Troubleshooting Alarms and Performance Issues in NSX

Implementing this workaround should help in resolving the NSX Application Platform Health alarm triggered by service status being down until the system is updated to version 4.2.0.

Additional Information

Feedback

thumb_up Yes

thumb_down No