Remediate NSX Application Platform Health Alarm Due to Service Status Being Down
search cancel

Remediate NSX Application Platform Health Alarm Due to Service Status Being Down

book

Article ID: 320808

calendar_today

Updated On:

Products

VMware vDefend Firewall VMware vDefend Firewall with Advanced Threat Prevention

Issue/Introduction

Symptoms:
  1. The NSX Application Platform Health alarm triggers when any pod enters a crashloopback state.
  2. The alarm message indicates "service status down" without specifying which specific pod is affected or providing actionable insights.
  3. The intelligence of the system is degraded as a result of the alarm.
  4. The alarm message indicates "service status down" with component K8_Platform_Service.


Environment

VMware NSX 4.1.1
NAPP 4.1.2.1

Resolution

This issue is addressed in version 4.2.0 of the NSX Application Platform.


Workaround:

To mitigate the issue, follow these steps:

  1. Identify pods not in the RUNNING or SUCCEEDED (completed) state:   
    napp-k get pods --field-selector status.phase!=Running,status.phase!=Succeeded,metadata.namespace!=kube-system,metadata.namespace!=vmware-system-csi,metadata.namespace!=vmware-system-auth,metadata.namespace!=vmware-system-cloud-provider --all-namespaces
                                                                  
  2. Delete the affected pod:   napp-k delete pod <pod-name> -n namespace
  3. If the issue persists, perform a rollout restart of the deployment :  napp-k rollout restart <statefulsetdeployment> <service_name> -n namespace

 

Additionally if the alarm message indicates "service status down" with component K8_Platform_Service, you can proceed with the following recommended workaround.

 

Workaround:

To mitigate this issue check the status of services by running the below mentioned API.

 

Run the below API to get NAPP feature health details (Use any API client like Postman etc) :

GET https://<nsx>/napp/api/v1/platform/monitor/feature/health  

 

You can also use the below curl command to obtain the details of the NAPP health (execute it from NSX manager under root mode ) :

 

curl -k -u -X GET -u admin https://nsxmanager.eng.vmware.com/napp/api/v1/platform/monitor/feature/health >> /tmp/napp01.txt 

 

In above command , we are redirecting the output to a file called "napp01.txt" under /tmp in NSX Manager

 

Next Steps Based on Output:

  • If all services are in the "Ready" state
    The alarm can be safely cleared, and the issue should be considered resolved.

 

 

 

Implementing this workaround should help in resolving the NSX Application Platform Health alarm triggered by service status being down until the system is updated to version 4.2.0.

 

 

 

Additional Information