Title: Disk usage is high for Malware Prevention feature on Service VM or on NSX Edge
Event ID: malware_prevention_health.service_disk_usage_high, malware_prevention_health.service_disk_usage_very_high,
Added in release: NSX 4.1.2
Alarm Description
- Symptom: Following is observed on NSX UI: Alarms dashboard shows "Service Disk Usage is High" or "Service Disk Usage is very High".
On Alarm description, host or edge transport node name for which disk usage is high is mentioned. Purpose for which disk usage whether used for DISK_PURPOSE_STATE_FILE or for DISK_PURPOSE_FILE_SUBMISSION is seen on description. - Cause: When one of the disk volumes used for file analysis on the host or edge transport node for malware prevention feature reaches the set threshold for high disk usage alarms (default is 80%) or very high disk usage alarm (default is 90 %) is reached, we would see this alarm on Alarm dashboard. .
Resolution:Resolution for Host Transport Node
- For Host transport node, migrate some VMs which are generating lot of file events away from the affected host to reduce the load on the SVM deployed on that host.
One way to identify VMs generating lot of file events is to go on NSX Manager UI Screen
Security > Malware Prevention > All Files
and look for VMs on the ESX host that has lot of file events in progress and then vmotion those VMs. - Other option is to reduce the Malware Prevention file retention period on the affected host.
- First login into deployed SVM on the affected host tansport node using following KB https://docs.vmware.com/en/VMware-NSX/4.1/administration/GUID-E2DAD6E5-0984-41FB-BF6A-9BD8C288683B.html
- Reduce the Malware Prevention file retention period on the affected host. For example, reduce it from the default of ~4 hours to ~1 hour. To do this, modify the file /config/vmware/nsx-lastline-rapid/conf.d/override.conf as follows:
Add the following lines:
STALE_DEFAULT_FILE_TIMEOUT_SECONDS=3900
STALE_STATE_FILE_TIMEOUT_SECONDS=3600
- After modifying the file, restart the lastline rapid service by following command
restart service nsx-lastline-rapid
Resolution for Edge Transport NodeReduce the Malware Prevention file retention period on the affected edge.
- First ssh login into Edge appliance
- Reduce the Malware Prevention file retention period. For example, reduce it from the default of ~4 hours to ~1 hour. To do this, modify the file /config/vmware/nsx-lastline-rapid/conf.d/override.conf as follows:
Add the following lines:
STALE_DEFAULT_FILE_TIMEOUT_SECONDS=3900
STALE_STATE_FILE_TIMEOUT_SECONDS=3600
- After modifying the file, disable and then re-enable Malware Prevention on all Tier1 routers of the affected edge transport node in the Manager:
Security > Policy Management IDS/IPS & Malware Prevention > Settings.