Edge Disk Usage Very High on / partition affecting Control Plane connectivity and DHCP services.
search cancel

Edge Disk Usage Very High on / partition affecting Control Plane connectivity and DHCP services.

book

Article ID: 427787

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

Edge Health Alarm :  Edge Disk Usage Very High

The disk usage for the Edge node disk partition / has reached 84% which is at or above the very high threshold value of 80%

An NSX Edge Node reports a critical health alarm indicating high disk utilization on the /root partition (/dev/sda3). In this scenario, utilization exceeding 80% resulted in the following symptoms:

  • Failure of VMs to obtain DHCP IP addresses.

  • Disruption of Central Control Plane (CCP) connectivity.

  • Tunnels remaining in a "Down" state even after attempting Maintenance Mode cycles.

  • Coincident issues with VCD–NSX communication due to certificate mismatches.

 

 

Environment

VMware NSX

Cause

The issue was driven by two primary factors:

  1. Disk Space Exhaustion: The /root partition reached 84% utilization due to large, unrotated files in the /journal and syslog directories. An attempt to move these files was unsuccessful, leading to a "filled bin" scenario within the partition.

  2. Known Version Bug: NSX version 4.2.1 is impacted by a known JDK-related issue that can prevent services from recovering gracefully after disk space is reclaimed or management communication is interrupted.

  3. VCD Communication Break: A renewed VCD certificate failed to apply to one of the four VCD cells, breaking the underlying management sync between VCD and NSX.

Resolution

1. Restore VCD-NSX Management Sync

  • Ensure the renewed VCD certificate is applied consistently across all VCD cells.

  • Reconnect VCD to NSX via the VCD Service Provider Admin portal to validate credential/certificate handshake.

2. Clear Edge Node Disk Space

  • Log in to the affected NSX Edge Node CLI as root.

  • Identify large files in the /journal and /var/log directories.

  • Action: Move unrotated journal and syslog files to a temporary test directory or off-box storage to reduce /dev/sda3 utilization below the 80% threshold.

  • Verify cleanup with df -h.

3. Address JDK Issue (NSX 4.2.1)

If disk cleanup does not immediately restore tunnel or DHCP status:

  • Perform a rolling reboot of the NSX Manager cluster (one manager at a time).

  • This clears the JDK-related hang-up and forces a fresh reconciliation of the Edge nodes.

4. Post-Reboot Validation

  • Edge Sharding Behavior: During the rolling reboot, an Edge node may briefly report a "Failed" state. This is expected behavior as the Edge attempts to "shard" (re-establish a heartbeat) to a different available Manager in the cluster. Communication restores automatically once the Managers are stable.

  • Verify that VMs are successfully receiving DHCP IP addresses and that ICMP connectivity (Ping) is restored.

 

Additional Information

*