The disk usage on Bare Metal Edge /image disk partition goes high while generating the support bundle.
book
Article ID: 345923
calendar_today
Updated On:
Products
VMware NSX
Issue/Introduction
To unblock the customer if a situation as above may arise.
Symptoms:
The disk utilization of the Edge spikes during the support bundle collection.
Alarms regarding the high-disk usage are generated when the utilization spikes above the threshold of 90%
*Relevant log’s location*:
/var/log/ 2022-xx-xxTxx:xx:xx.xxxZ nsx-edge NSX 2430 - [nsx@6876 comp="nsx-edge" subcomp="node-mgmt" username="root" level="CRITICAL" eventFeatureName="edge_health" eventType="edge_disk_usage_very_high" eventSev="critical" eventState="On"] The disk usage for the Edge node disk partition /image has reached 100% which is at or above the very high threshold value of 90%.
Environment
VMware NSX-T Data Center
Cause
The /image partition has a few files/logs within it, for example it houses the nsx-file store, which stores the support bundle (temporarily) and the capture files.
During an upgrade, the /image partition is also used to temporarily store the unpacked upgrade bundle files, which could be many GB in size.
So, it is expected that the /image partition will see increased usage while a support bundle is being generated. Seeing an alarm means that either there are extra files in the nsx file-store using up some space, or the temporary bundle files are large. the var/dir is where the temp bundle files are stored.
Prior to version 3.2.2, we see a very large log file at /var/log/lb/access.log. This is meant to be rotated, but log rotation does not seem to be working for this file, therefore resulting in a high disk usage every time a support bundle is generated.
The logrotate config for the LB(/etc/logrotate.d/nsx-edge-lb) rotates the "/var/log/lb/*/*/*.log", this doesn't cover the access.log in "/var/log/lb/" directory.
Resolution
The issue has been fixed in NSX-T version 3.2.2 and above, where the /var/log/lb/access.log would be rotated when the size reaches 10M, and it would be rotated daily.
Workaround:
For the issue on this specific article where the access.log continues to grow. Clear the access.log using
echo > /var/log/lb/access.log
Additional Information
Impact/Risks:
The I/O read/write operation will be impacted.
System may perform poorly when the disk utilization hits over the threshold.