Remediate NSX Application Platform Health Alarm Due to Service Status Being Down
search cancel

Remediate NSX Application Platform Health Alarm Due to Service Status Being Down

book

Article ID: 320808

calendar_today

Updated On:

Products

VMware vDefend Firewall VMware vDefend Firewall with Advanced Threat Prevention

Issue/Introduction

Symptoms:
  1. The NSX Application Platform Health alarm triggers when any pod enters a crashloopback state.
  2. The alarm message indicates "service status down" without specifying which specific pod is affected or providing actionable insights.
  3. The intelligence of the system is degraded as a result of the alarm.


Environment

VMware NSX 4.1.1

Resolution

This issue is addressed in version 4.2.0 of the NSX Application Platform.


Workaround:

To mitigate the issue, follow these steps:

  1. Identify pods not in the RUNNING or SUCCEEDED (completed) state:   
    napp-k get pods --field-selector status.phase!=Running,status.phase!=Succeeded,metadata.namespace!=kube-system,metadata.namespace!=vmware-system-csi,metadata.namespace!=vmware-system-auth,metadata.namespace!=vmware-system-cloud-provider --all-namespaces
                                                                  
  2. Delete the affected pod:   napp-k delete pod <pod-name> -n namespace
  3. If the issue persists, perform a rollout restart of the deployment :  napp-k rollout restart <statefulsetdeployment> <service_name> -n namespace

 

Implementing this workaround should help in resolving the NSX Application Platform Health alarm triggered by service status being down until the system is updated to version 4.2.0.

 

 


Additional Information

Impact/Risks:

The lack of detailed information in the alarm message hinders troubleshooting and impacts the overall intelligence of the system. This KB article will make debuggablity better.