NSX Application Platform Node Disk Usage High/Very High Alarm
book
Article ID: 378730
calendar_today
Updated On:
Products
VMware vDefend FirewallVMware vDefend Firewall with Advanced Threat Prevention
Issue/Introduction
NSX Application Platform Node Disk Usage High Alarm or NSX Application Platform Node Disk Usage Very High Alarm is in Open state with he following description.
Feature Name: NSX Application Platform Health
Event Type: Node Disk Usage High or Very High
Description: The disk usage of NSX Application Platform node {napp_node_name} is above the high threshold value of {system_usage_threshold}%.
Environment
All NAPP Versions
Resolution
Resolution:
Identify the Type of Node:
First, check whether the node under pressure is a Worker Node or a Control Plane Node. You can find this information in the Alarm Description by expanding the details of the generated alarm. Note down the node name.
Navigate to System → NSX Application Platform → Resources.
Hover over each node name to display its full name and locate the one matching the node name noted from the Alarm Description.
To differentiate the node type:
Control Plane Node will be explicitly indicated below the node name.
If it's a Worker Node, no type will be specified.
For Worker Node Alarm:
Step 1: Add Nodes to the Kubernetes Cluster:
If disk usage is high, add one or more nodes to the existing Kubernetes cluster (contact your Kubernetes provider for assistance).
Step 2: Free Up Disk Space by Deleting Pods:
On the NSX Manager root shell, run the following command to locate the relevant pods:
napp-k get pods -o wide | grep <napp_node_name>
Replace <napp_node_name> with the node name found in the open alarm.
Delete some of the pods from the list to allow Kubernetes to reschedule them on the newly added node:
napp-k delete pod <pod-name>
The following pods are recommended for deletion if present in the list:
cluster-api
monitor
Steps to get pod names:
For cluster-api pod:
napp-k get pod -l=app.kubernetes.io/name=cluster-api
For monitor pod:
napp-k get pod -l=app.kubernetes.io/name=monitor,cluster-api-client=true
Step 3: Check the disk space:
After deleting the necessary pods, Kubernetes should eventually reclaim disk space by removing unused container images, which is often the main cause of high disk usage.
If the above steps do not resolve the issue, please contact your Kubernetes service provider or Broadcom support for further assistance in clearing disk space from the Kubernetes nodes.
For Control Plane Node Alarm:
Step 1: Identify the Node Under Pressure:
In the Resources tab of the NSX Application Platform UI, check the "Storage" field for each control plane node. Hover over each node name to reveal the full name and identify the control plane node that is under pressure.
Step 2: Access the Control Plane of the Cluster Node:
Log in to the Supervisor Cluster:
SSH into the vCenter Server Appliance, log in as root, and switch to shell mode:
shell
Retrieve the Supervisor Control Plane (SCP) IP address and credentials:
Step 4: SSH into the Guest Cluster Control Plane Node:
List the machines in the Supervisor Cluster to identify the control plane node's IP:
kubectl get vm -A -owide
SSH into the Guest Cluster control plane node:
ssh vmware-system-user@<control-plane-node-ip>
Enter the password obtained earlier to access the Guest Cluster control plane.
Step 5: Check Disk Usage on the Control Plane Node:
Navigate to the /var/log directory and run the following command to check for large files, such as journal logs:
du -h --max-depth=1
If journal logs are consuming space, reduce their size:
journalctl --vacuum-size=500M
You can also limit the retention period of journal logs to the last 2 days:
journalctl --vacuum-time=2d
Step 6: Verify Disk Usage:
After performing these steps, wait for 5-10 minutes and check the disk usage status again in the NSX Application Platform UI under the Resources tab. The alarm should also auto-resolve.