Troubleshooting Alarms and Performance Issues in NSX Application Platform
search cancel

Troubleshooting Alarms and Performance Issues in NSX Application Platform

book

Article ID: 366906

calendar_today

Updated On:

Products

VMware vDefend Firewall VMware vDefend Firewall with Advanced Threat Prevention

Issue/Introduction

Alarms and performance issues may arise due to various factors such as resource contention, configuration errors, software bugs, or infrastructure issues, this article provides guidance for identifying, understanding, and resolving various alarms and performance issues within NAPP, ensuring the stability and reliability of the platform.

Environment

The KB is applicable if you are using NSX Application Platform (NAPP)

Cause

Please refer to Resolution section of this article. Look up by Alarm Summary.

Resolution

1.1. Cluster Alarms:

 1. Cluster CPU high alarm.

  • Component Name: nsx_application_platform_health.cluster_cpu_usage_high
  • Summary: NSX Application Platform cluster CPU usage is high.
  • Description: The CPU usage of NSX Application Platform cluster {napp_cluster_id} is above the high threshold value of {system_usage_threshold}%.
  • Recommended Action: In the NSX UI, navigate to System | NSX Application Platform | Core Services and check the System Load field of individual services to see which service is under pressure. See if the load can be reduced. If more computing power is required, click on the Scale Out button to request more resources.
  • Release Introduced:  3.2.0


 2. Cluster CPU very high alarm.

  • Component Name: nsx_application_platform_health.cluster_cpu_usage_very_high
  • Summary: NSX Application Platform cluster CPU usage is very high.
  • Description: The CPU usage of NSX Application Platform cluster {napp_cluster_id} is above the very high threshold value of {system_usage_threshold}%.
  • Recommended Action: In the NSX UI, navigate to System | NSX Application Platform | Core Services and check the System Load field of individual services to see which service is under pressure. See if the load can be reduced. If more computing power is required, click on the Scale Out button to request more resources.
  • Release Introduced:  3.2.0


 3. Cluster memory high alarm.

  • Component Name: nsx_application_platform_health.cluster_memory_usage_high
  • Summary: NSX Application Platform cluster memory usage is high
  • Description: The memory usage of NSX Application Platform cluster {napp_cluster_id} is above the high threshold value of {system_usage_threshold}%
  • Recommended Action: In the NSX UI, navigate to System | NSX Application Platform | Core Services and check the Memory field of individual services to see which service is under pressure. See if the load can be reduced. If more memory is required, click on the Scale Out button to request more resources.
  • Release Introduced:  3.2.0


 4. Cluster memory very high alarm.

  • Component Name: nsx_application_platform_health.cluster_memory_usage_very_high
  • Summary: NSX Application Platform cluster memory usage is very high
  • Description:  The memory usage of NSX Application Platform cluster {napp_cluster_id} is above the very high threshold value of {system_usage_threshold}%
  • Recommended Action: In the NSX UI, navigate to System | NSX Application Platform | Core Services and check the Memory field of individual services to see which service is under pressure. See if the load can be reduced. If more memory is required, click on the Scale Out button to request more resources.
  • Release Introduced:  3.2.0


 5. Cluster disk usage high alarm.

  • Component Name: nsx_application_platform_health.cluster_disk_usage_high
  • Summary: NSX Application Platform cluster disk usage is high
  • Description:  The disk usage of NSX Application Platform cluster {napp_cluster_id} is above the high threshold value of {system_usage_threshold}%.
  • Recommended Action: In the NSX UI, navigate to System | NSX Application Platform | Core Services and check the Storage field of individual services to see which service is under pressure. See if the load can be reduced. If more disk storage is required, click on the Scale Out button to request more resources. If data storage service is under strain, another way is to click on the Scale Up button to increase disk size.
  • Release Introduced:  3.2.0


 6. Cluster disk usage very high alarm.

  • Component Name: nsx_application_platform_health.cluster_disk_usage_very_high
  • Summary: NSX Application Platform cluster disk usage is very high
  • Description: The disk usage of NSX Application Platform cluster {napp_cluster_id} is above the very high threshold value of {system_usage_threshold}%.
  • Recommended Action: In the NSX UI, navigate to System | NSX Application Platform | Core Services and check the Storage field of individual services to see which service is under pressure. See if the load can be reduced. If more disk storage is required, click on the Scale Out button to request more resources. If data storage service is under strain, another way is to click on the Scale Up button to increase disk size.
  • Release Introduced:  3.2.0


 7. Cluster status degraded alarm.

  • Component Name: nsx_application_platform_health.napp_status_degraded
  • Summary: NSX Application Platform cluster overall status is degraded
  • Description: NSX Application Platform cluster {napp_cluster_id} overall status is degraded
  • Recommended Action: Get more information from alarms of nodes and services.
  • Release Introduced:  3.2.0


 8. Cluster status down alarm.

  • Component Name: nsx_application_platform_health.napp_status_down
  • Summary: NSX Application Platform cluster overall status is down
  • Description: NSX Application Platform cluster {napp_cluster_id} overall status is down.
  • Recommended Action: Get more information from alarms of nodes and services.
  • Release Introduced:  3.2.0

9.  Flow storage growth high

  • Component Name: nsx_application_platform_health.flow_storage_growth_high
  • Summary: Analytics and Data Storage disk usage is growing faster than expected
  • Description: Analytics and Data Storage disks are expected to be full in {predicted_full_period} days, less than current data retention period {current_retention_period} days.
  • Recommended ActionConnect less transport nodes or set narrower private IP ranges to reduce the number of unique flows. Filter out broadcast and/or multcast flows. Scale out Analytics and Data Storage services to get more storage
  • Release Introduced:  4.1.1

1.2. Node Alarms:

 1. Node CPU high alarm.

  • Component Name: nsx_application_platform_health.node_cpu_usage_high
  • Summary: NSX Application Platform node CPU usage is high
  • Description: The CPU usage of NSX Application Platform node {napp_node_name} is above the high threshold value of {system_usage_threshold}%.
  • Recommended Action: In the NSX UI, navigate to System | NSX Application Platform | Core Services and check the System Load field of individual services to see which service is under pressure. See if load can be reduced. If only a small minority of the nodes have high CPU usage, by default, Kubernetes will reschedule services automatically. If most nodes have high CPU usage and load cannot be reduced, click on the Scale Out button to request more resources.
  • Release Introduced:  3.2.0


 2. Node CPU very high alarm.

  • Component Name: nsx_application_platform_health.node_cpu_usage_very_high
  • Summary: NSX Application Platform node CPU usage is very high
  • Description: The CPU usage of NSX Application Platform node {napp_node_name} is above the very high threshold value of {system_usage_threshold}%..
  • Recommended Action: In the NSX UI, navigate to System | NSX Application Platform | Core Services and check the System Load field of individual services to see which service is under pressure. See if load can be reduced. If only a small minority of the nodes have high CPU usage, by default, Kubernetes will reschedule services automatically. If most nodes have high CPU usage and load cannot be reduced, click on the Scale Out button to request more resources.
  • Release Introduced:  3.2.0


 3. Node memory high alarm.

  • Component Name: nsx_application_platform_health.node_memory_usage_high
  • Summary: NSX Application Platform node memory usage is high
  • Description: The memory usage of NSX Application Platform node {napp_node_name} is above the high threshold value of {system_usage_threshold}%.
  • Recommended Action: In the NSX UI, navigate to System | NSX Application Platform | Core Services and check the Memory field of individual services to see which service is under pressure. See if load can be reduced. If only a small minority of the nodes have high Memory usage, by default, Kubernetes will reschedule services automatically. If most nodes have high Memory usage and load cannot be reduced, click on the Scale Out button to request more resources.
  • Release Introduced:  3.2.0


 4. Node memory very high alarm.

  • Component Name: nsx_application_platform_health.node_disk_usage_highmemory_usage_very_high
  • Summary: NSX Application Platform node memory usage is very high
  • Description: The memory usage of NSX Application Platform node {napp_node_name} is above the very high threshold value of {system_usage_threshold}%.
  • Recommended Action: In the NSX UI, navigate to System | NSX Application Platform | Core Services and check the Memory field of individual services to see which service is under pressure. See if load can be reduced. If only a small minority of the nodes have high Memory usage, by default, Kubernetes will reschedule services automatically. If most nodes have high Memory usage and load cannot be reduced, click on the Scale Out button to request more resources.
  • Release Introduced:  3.2.0


 5. Node disk usage high alarm.

  • Component Name: nsx_application_platform_health.node_disk_usage_high
  • Summary:  NSX Application Platform node disk usage is high
  • Description: The disk usage of NSX Application Platform node {napp_node_name} is above the high threshold value of {system_usage_threshold}%.
  • Recommended Action: In the NSX UI, navigate to System | NSX Application Platform | Core Services and check the Storage field of individual services to see which service is under pressure. Clean up unused data or log to free up disk resources and see if the load can be reduced. If more disk storage is required, Scale Out the service under pressure. If data storage service is under strain, another way is to click on the Scale Up button to increase disk size.
  • Release Introduced:  3.2.0


 6.. Node disk usage very high alarm.

  • Component Name: nsx_application_platform_health.node_disk_usage_very_high
  • Summary: NSX Application Platform node disk usage is very high
  • Description: The disk usage of NSX Application Platform node {napp_node_name} is above the very high threshold value of {system_usage_threshold}%.
  • Recommended Action: In the NSX UI, navigate to System | NSX Application Platform | Core Services and check the Storage field of individual services to see which service is under pressure. Clean up unused data or log to free up disk resources and see if the load can be reduced. If more disk storage is required, Scale Out the service under pressure. If data storage service is under strain, another way is to click on the Scale Up button to increase disk size.
  • Release Introduced:  3.2.0


 7. Node status degraded alarm.

  • Component Name: nsx_application_platform_health.node_status_degraded
  • Summary: NSX Application Platform node status is degraded.
  • Description: NSX Application Platform node {napp_node_name} is degraded.
  • Recommended Action: In the NSX UI, navigate to System | NSX Application Platform | Resources to check which node is degraded. Check network, memory and CPU usage of the node. Reboot the node if it is a worker node.
  • Release Introduced:  3.2.0


 8. Node status down alarm.

  • Component Name: nsx_application_platform_health.node_status_down
  • Summary: NSX Application Platform node status is down
  • Description: NSX Application Platform node {napp_node_name} is not running
  • Recommended Action: In the NSX UI, navigate to System | NSX Application Platform | Resources to check which node is down. Check network, memory and CPU usage of the node. Reboot the node if it is a worker node.
  • Release Introduced:  3.2.0

1.3.
Data storage alarms:

 1. Data Storage service CPU high alarm.

  • Component Name: nsx_application_platform_health.datastore_cpu_usage_high
  • Summary: Data Storage service CPU usage is high
  • Description: The CPU usage of Data Storage service is above the high threshold value of {system_usage_threshold}%.
  • Recommended Action: Scale out all services or the Data Storage service.
  • Release Introduced:  3.2.0


 2. Data Storage service CPU very high alarm.

  • Component Name: nsx_application_platform_health.datastore_cpu_usage_very_high
  • Summary: Data Storage service CPU usage is very high
  • Description: The CPU usage of Data Storage service is above the very high threshold value of {system_usage_threshold}%.
  • Recommended Action: Scale out all services or the Data Storage service.
  • Release Introduced:  3.2.0


 3. Data Storage service memory high alarm.

  • Component Name: nsx_application_platform_health.datastore_memory_usage_high
  • Summary: Data Storage service memory usage is high
  • Description: The memory usage of Data Storage service is above the high threshold value of {system_usage_threshold}%
  • Recommended Action: Scale out all services or the Data Storage service.
  • Release Introduced:  3.2.0


 4. Data Storage service memory very high alarm.

  • Component Name: nsx_application_platform_health.datastore_memory_usage_very_high
  • Summary: Data Storage service memory usage is very high
  • Description: The memory usage of Data Storage service is above the very high threshold value of {system_usage_threshold}%
  • Recommended Action: Scale out all services or the Data Storage service.
  • Release Introduced:  3.2.0


 5. Data Storage service disk usage high alarm.

  • Component Name: nsx_application_platform_health.datastore_disk_usage_high
  • Summary: Data Storage service disk usage is high
  • Description: The disk usage of Data Storage service is above the high threshold value of {system_usage_threshold}%.
  • Recommended Action: Scale out or scale up the data storage service
  • Release Introduced:  3.2.0


 6. Data Storage service disk usage very high alarm.

  • Component Name: nsx_application_platform_health.datastore_disk_usage_very_high
  • Summary: Data Storage service disk usage is very high
  • Description: The disk usage of Data Storage service is above the very high threshold value of {system_usage_threshold}%.
  • Recommended Action: Scale out or scale up the data storage service
  • Release Introduced:  3.2.0

 

1.4. Messaging Service Alarms:


 1. Messaging service CPU high alarm.

  • Component Name: nsx_application_platform_health.messaging_cpu_usage_high
  • Summary: Messaging service CPU usage is high.
  • Description: The CPU usage of Messaging service is above the high threshold value of {system_usage_threshold}%.
  • Recommended Action: Scale out all services or the Messaging service
  • Release Introduced:  3.2.0


 2. Messaging service CPU very high alarm.

  • Component Name: nsx_application_platform_health.messaging_cpu_usage_very_high
  • Summary: Messaging service CPU usage is very high.
  • Description: The CPU usage of Messaging service is above the very high threshold value of {system_usage_threshold}%.
  • Recommended Action: Scale out all services or the Messaging service
  • Release Introduced:  3.2.0


 3. Messaging service memory high alarm.

  • Component Name: nsx_application_platform_health.messaging_memory_usage_high
  • Summary: Messaging service memory usage is high
  • Description: The memory usage of Messaging service is above the high threshold value of {system_usage_threshold}%.
  • Recommended Action:  Scale out all services or the Messaging service.
  • Release Introduced:  3.2.0


 4. Messaging service memory very high alarm.

  • Component Name: nsx_application_platform_health.messaging_memory_usage_very_high
  • Summary: Messaging service memory usage is very high
  • Description: The memory usage of Messaging service is above the very high threshold value of {system_usage_threshold}%.
  • Recommended Action:  Scale out all services or the Messaging service.
  • Release Introduced:  3.2.0


 5. Messaging service disk usage high alarm.

  • Component Name: nsx_application_platform_health.messaging_disk_usage_high
  • Summary: Messaging service disk usage is high
  • Description: The disk usage of Messaging service is above the high threshold value of {system_usage_threshold}%.
  • Recommended Action: Clean up files not needed. Scale out all services or the Messaging service
  • Release Introduced:  3.2.0


 6. Messaging service disk usage very high alarm.

  • Component Name: nsx_application_platform_health.messaging_disk_usage_very_high
  • Summary: Messaging service disk usage is very high
  • Description: The disk usage of Messaging service is above the very high threshold value of {system_usage_threshold}%.
  • Recommended Action: Clean up files not needed. Scale out all services or the Messaging service
  • Release Introduced:  3.2.0

1.5.
Analytics Service Alarms: 

 1. Analytics service CPU high alarm.

  • Component Name: nsx_application_platform_health.analytics_cpu_usage_high
  • Summary: Analytics service CPU usage is high.
  • Description: The CPU usage of Analytics service is above the high threshold value of {system_usage_threshold}%
  • Recommended Action: Scale out all services or the Analytics service.
  • Release Introduced:  3.2.0


 2. Analytics service CPU very high alarm.

  • Component Name: nsx_application_platform_health.analytics_cpu_usage_very_high
  • Summary: Analytics service CPU usage is very high.
  • Description: The CPU usage of Analytics service is above the very high threshold value of {system_usage_threshold}%
  • Recommended Action: Scale out all services or the Analytics service.
  • Release Introduced:  3.2.0


 3. Analytics service memory high alarm.

  • Component Name: nsx_application_platform_health.analytics_memory_usage_high
  • Summary: Analytics service memory usage is high
  • Description: The memory usage of Analytics service is above the high threshold value of {system_usage_threshold}%.
  • Recommended Action:  Scale out all services or the Analytics service
  • Release Introduced:  3.2.0


 4. Analytics service memory very high alarm.

  • Component Name: nsx_application_platform_health.analytics_memory_usage_very_high
  • Summary: Analytics service memory usage is very high
  • Description: The memory usage of Analytics service is above the very high threshold value of {system_usage_threshold}%.
  • Recommended Action:  Scale out all services or the Analytics service
  • Release Introduced:  3.2.0


 5. Anaytics service disk usage high alarm.

  • Component Name: nsx_application_platform_health.analytics_disk_usage_high
  • Summary: Analytics service disk usage is high.
  • Description: The disk usage of Analytics service is above the high threshold value of {system_usage_threshold}%.
  • Recommended Action: Clean up files not needed. Scale out all services or the Analytics service.
  • Release Introduced:  3.2.0


 6. Analytics service disk usage very high alarm.

  • Component Name: nsx_application_platform_health.analytics_disk_usage_very_high
  • Summary: Analytics service disk usage is very high
  • Description: The disk usage of Analytics service is above the very high threshold value of {system_usage_threshold}%
  • Recommended Action: Scale out all services or the Analytics service
  • Release Introduced:  3.2.0

1.6. Config DB service:

 1. Config DB service CPU high alarm.

  • Component Name: nsx_application_platform_health.configuration_db_cpu_usage_high
  • Summary: Configuration Database service CPU usage is high.
  • Description: The CPU usage of Configuration Database service is above the high threshold value of {system_usage_threshold}%.
  • Recommended Action: Scale out all services.
  • Release Introduced:  3.2.0


 2. Config DB service CPU very high alarm.

  • Component Name: nsx_application_platform_health.configuration_db_cpu_usage_very_high
  • Summary: Configuration Database service CPU usage is very high.
  • Description: The CPU usage of Configuration Database service is above the very high threshold value of {system_usage_threshold}%.
  • Recommended Action: Scale out all services.
  • Release Introduced:  3.2.0


 3. Config DB service memory high alarm.

  • Component Name: nsx_application_platform_health.configuration_db_memory_usage_high
  • Summary: Configuration Database service memory usage is high
  • Description:  The memory usage of Configuration Database service is above the high threshold value of {system_usage_threshold}%.
  • Recommended Action: Scale out all services
  • Release Introduced:  3.2.0


 4. Config DB service memory very high alarm.

  • Component Name: nsx_application_platform_health.configuration_db_memory_usage_very_high
  • Summary: Configuration Database service memory usage is very high.
  • Description: The memory usage of Configuration Database service is above the very high threshold value of {system_usage_threshold}%.
  • Recommended Action: Scale out all services.
  • Release Introduced:  3.2.0

health.datastore
 5. Config DB service disk usage high alarm.

  • Component Name: nsx_application_platform_health.configuration_db_disk_usage_high
  • Summary: Configuration Database service disk usage is high
  • Description: The disk usage of Configuration Database service is above the high threshold value of {system_usage_threshold}%
  • Recommended Action: Clean up files not needed. Scale out all services.
  • Release Introduced:  3.2.0


 6. Config DB service disk usage very high alarm.

  • Component Name: nsx_application_platform_health.configuration_db_disk_usage_very_high
  • Summary: Configuration Database service disk usage is very high
  • Description:  The disk usage of Configuration Database service is above the very high threshold value of {system_usage_threshold}%.
  • Recommended Action: Clean up files not needed. Scale out all services.
  • Release Introduced:  3.2.0

 

1.7. Metrics Alarms:

 1. Metrics service CPU high alarm.

  • Component Name: nsx_application_platform_health.metrics_cpu_usage_high
  • Summary: Metrics service CPU usage is high
  • Description:  The CPU usage of Metrics service is above the high threshold value of {system_usage_threshold}%.
  • Recommended Action: Scale out all services.
  • Release Introduced:  3.2.0


 2. Metrics service CPU very high alarm.

  • Component Name: nsx_application_platform_health.metrics_cpu_usage_very_high
  • Summary: Metrics service CPU usage is very high
  • Description: The CPU usage of Metrics service is above the very high threshold value of {system_usage_threshold}%..
  • Recommended Action: Scale out all services
  • Release Introduced:  3.2.0


 3. Metrics service memory high alarm.

  • Component Name: nsx_application_platform_health.metrics_memory_usage_high
  • Summary: Metrics service memory usage is high
  • Description: The memory usage of Metrics service is above the high threshold value of {system_usage_threshold}%
  • Recommended Action: Scale out all services
  • Release Introduced:  3.2.0


 4. Metrics service memory very high alarm.

  • Component Name: nsx_application_platform_health.metrics_memory_usage_very_high
  • Summary: Metrics service memory usage is very high
  • Description: The memory usage of Metrics service is above the very high threshold value of {system_usage_threshold}%.
  • Recommended Action: Scale out all services
  • Release Introduced:  3.2.0


 5. Metrics service disk usage high alarm.

  • Component Name: nsx_application_platform_health.metrics_disk_usage_high
  • Summary: The disk usage of Metrics service is above the high threshold value of {system_usage_threshold}%.
  • Description: The disk usage of Metrics service is below the high threshold value of {system_usage_threshold}%.
  • Recommended Action: Follow the steps at https://knowledge.broadcom.com/external/article?legacyId=93274
  • Release Introduced:  3.2.0


 6. Metrics service disk usage very high alarm.

  • Component Name: nsx_application_platform_health.metrics_disk_usage_very_high
  • Summary: Metrics service disk usage is very high.
  • Description: The disk usage of Metrics service is above the very high threshold value of {system_usage_threshold}%
  • Recommended Action: Follow the steps at https://knowledge.broadcom.com/external/article?legacyId=93274
  • Release Introduced:  3.2.0

1.8. Platform Alarms:

 1. Platform service CPU high alarm.

  • Component Name:  nsx_application_platform_health.platform_cpu_usage_high
  • Summary:  Platform Services service CPU usage is high
  • Description: The CPU usage of Platform Services service is above the high threshold value of {system_usage_threshold}%.
  • Recommended Action: Scale out all services
  • Release Introduced:  3.2.0


 2. Platform service CPU very high alarm.

  • Component Name:  nsx_application_platform_health.platform_cpu_usage_very_high
  • Summary: Platform Services service CPU usage is very high
  • Description: The CPU usage of Platform Services service is above the very high threshold value of {system_usage_threshold}%.
  • Recommended Action: Scale out all services
  • Release Introduced:  3.2.0


 3. Platform service memory high alarm.

  • Component Name:  nsx_application_platform_health.platform_memory_usage_high
  • Summary: Platform Services service memory usage is high
  • Description: The memory usage of Platform Services service is above the high threshold value of {system_usage_threshold}%.
  • Recommended Action: Scale out all services
  • Release Introduced:  3.2.0


 4. Platform service memory very high alarm.

  • Component Name: nsx_application_platform_health.platform_memory_usage_very_high
  • Summary: Platform Services service memory usage is very high
  • Description: The memory usage of Platform Services service is above the very high threshold value of {system_usage_threshold}%
  • Recommended Action: Scale out all services
  • Release Introduced:  3.2.0


 5. Platform service disk usage high alarm.

  • Component Name: nsx_application_platform_health.platform_disk_usage_high
  • Summary: Platform Services service disk usage is high
  • Description: The disk usage of Platform Services service is above the high threshold value of {system_usage_threshold}%.
  • Recommended Action: Invoke the command to get disk usage : `napp-k exec -it $(napp-k get pods | grep cluster | cut -d ' ' -f 1) -c cluster-api -- sh -c 'kubectl df-pv'` Clean up files not needed. Scale out all services.
  • Release Introduced:  3.2.0


 6. Platform service disk usage very high alarm.

  • Component Name: nsx_application_platform_health.platform_disk_usage_very_high
  • Summary: Platform Services service disk usage is very high
  • Description: The disk usage of Platform Services service is above the very high threshold value of {system_usage_threshold}%.
  • Recommended Action: Clean up files not needed. Scale out all services.
  • Release Introduced:  3.2.0

1.9. Service Status Alarm:

 1. Service status degraded alarm.

  • Component Name: nsx_application_platform_health.service_status_degraded
  • Summary: "Service status is degraded.
  • Description: Service {napp_service_name} is degraded. The service may still be able to reach a quorum while pods associated with {napp_service_name} are not all stable. Resources consumed by these unstable pods may be released.
  • Recommended Action: In the NSX UI, navigate to System | NSX Application Platform | Core Services to check which service is degraded. Invoke the NSX API GET /napp/api/v1/platform/monitor/feature/health to check which specific service is degraded and the reason behind it. Invoke the following CLI command to restart the degraded service if necessary: `kubectl rollout restart <statefulset/deployment> <service_name> -n <namespace>` Degraded services can function correctly but performance is sub-optimal.
  • Release Introduced:  3.2.0


 2. Service status down alarm.

  • Component Name: nsx_application_platform_health.service_status_down
  • Summary: Service status is down
  • Description: Service {napp_service_name} is not running.
  • Recommended Action: In the NSX UI, navigate to System | NSX Application Platform | Core Services to check which service is degraded. Invoke the NSX API GET /napp/api/v1/platform/monitor/feature/health to check which specific service is down and the reason behind it. Follow the steps at https://knowledge.broadcom.com/external/article?legacyId=96890
  • Release Introduced:  3.2.0


 3. Manager disconnected alarm.

  • Component Name: nsx_application_platform_communication.manager_disconnected
  • Summary: The NSX Application Platform cluster is disconnected from the NSX management cluster
  • Description: The NSX Application Platform cluster {napp_cluster_id} is disconnected from the NSX management cluster.
  • Recommended ActionCheck whether the manager cluster certificate, manager node certificates, kafka certificate and ingress certificate match on both NSX Manager and the NSX Application Platform cluster. Check expiration dates of the above mentioned certificates to make sure they are valid. Check the network connection between NSX Manager and NSX Application Platform cluster and resolve any network connection failures.
  • Release Introduced:  nsx_application_platform_health3.2.0

1.10. Kafka lag alarms


 1. Data processing slow in kafka topic Raw Flow.

  • Component Name: nsx_application_platform_communication.delay_detected_in_messaging_rawflow
  • Summary: Slow data processing detected in messaging topic Raw Flow.
  • Description: The number of pending messages in the messaging topic Raw Flow is above the pending message threshold of {napp_messaging_lag_threshold}.
  • Recommended Action: Add nodes and then scale up the NSX Application Platform cluster. If the bottleneck can be attributed to a specific service, for example, the analytics service, then scale up the specific service when the new nodes are added. If you are unable to scaleout the cluster immediately, then you can try one of the other options in this KB broadcom.com/external/article?legacyId=91932
  • KB link: https://knowledge.broadcom.com/external/article?legacyId=91932
  • Release Introduced:  3.2.0


 2. Data processing slow in kafka topic Over Flow.

  • Component Name: nsx_application_platform_communication.delay_detected_in_messaging_overflow
  • Summary: Slow data processing detected in messaging topic Over Flow.
  • Description: The number of pending messages in the messaging topic Over Flow is above the pending message threshold of {napp_messaging_lag_threshold}
  • Recommended Action: Add nodes and then scale up the NSX Application Platform cluster. If bottleneck can be attributed to a specific service, for example, the analytics service, then scale up the specific service when the  new nodes are added. If you are unable to scaleout the cluster immediately, then you can try one of the other options in this KB broadcom.com/external/article?legacyId=91932
  • KB link:  https://knowledge.broadcom.com/external/article?legacyId=91932
  • Release Introduced:  3.2.0

 

1.11. TN alarms

  1. TN flow exp disconnected
  • Component Name: nsx_application_platform_communication.tn_flow_exp_disconnected
  • Summary: A Transport node is disconnected from its NSX Messaging Broker
  • Description: The flow exporter on Transport node {entity_id} is disconnected from its messaging broker {messaging_broker_info}. Data collection is affected
  • Recommended Action: Restart the messaging service if it is not running. Resolve the network connection failure between the Transport node flow exporter and its NSX messaging broker.
  • Release Introduced:  3.2.0

      2.   TN flow exp disconnected on dpu

  • Component Name: nsx_application_platform_communication.tn_flow_exp_disconnected
  • Summary: A Transport node is disconnected from its NSX messaging  broker
  • Description: The flow exporter on Transport node {entity_id} DPU {dpu_id} is disconnected from its messaging broker {messaging_broker_info}. Data collection is affected
  • Recommended Action: Restart the messaging service if it is not running. Resolve the network connection failure between the Transport node flow exporter and its NSX messaging broker.
  • Release Introduced:  4.0.0