SE_DISK_HIGH Alerts - SE 100% disk utilization due to bloated Pcap Keylog text file(s)
search cancel

SE_DISK_HIGH Alerts - SE 100% disk utilization due to bloated Pcap Keylog text file(s)

book

Article ID: 379101

calendar_today

Updated On:

Products

VMware Avi Load Balancer

Issue/Introduction

Continuous SE_DISK_HIGH alerts for over 90%.

Example Alerts:

AviController: AVI : [Avi-se-ekwio: reason: System-high] At 2024-09-22 19:35:00+00:00 event SE_DISK_HIGH occurred on object Avi-se-ekwio in tenant admin as Disk usage over Threshold 90 % Current 90 %.
X.X.X.X   AVI : [Avi-se-ekwio: reason: System-high] At 2024-09-22 19:35:00+00:00 event SE_DISK_HIGH occurred on object Avi-se-ekwio in tenant admin as Disk usage over Threshold 90 % Current 90 %.

Command used to identify disk utilization and largest directories in SE filesystem:

sudo df -kh

sudo du 2> >(grep -v '^du: cannot \(access\|read\)' >&2) -h / | sort -rh | head -n 30

The /var/lib/avi/log/pcap will be very large when this issue occurs.

In the directory (/var/lib/avi/log/pcap) you will find *keylog.text files will be in the high MiB or GiB in size with an up to date timestamp (last written). 

 

 

Environment

Affects Versions: 22.1.x, 30.1.x, 30.2.x

Cause

This issue occurred after taking a Virtual Service traffic capture on a GSLB DNS Virtual Service with DEBUG_VS_HM_ONLY mode (health monitor traffic only).

Why was a keylog file created for a DNS VS with no SSL enabled?

  • "SSL traffic essentially refers to the backend  HM HTTPs traffic."  When a traffic capture on DNS VS is enabled and has active HTTPS Health Monitor(s) configured in GSLB services, the health monitor probe traffic is also captured, which would capture session keys and creates the keylog txt files.

Why did the keylog file continue to grow after the capture was disabled?

  • During HM-only traffic, we don't stop the file descriptor (fd) from the SE.  When we receive an RPC from the controller to halt the capture, we transfer the files and attempt to delete them.  However, the log text file is being modified at the same time, which causes the MD5 checksum to fail in the AVI SCP.  Avi tries to delete the files but does not release the fd, resulting in a leak. 

Also after deleting the keylog files the disk utilization stayed at 100%.  The SE VM had to be rebooted from vCenter to recover the SE

  • For this scenario, even after the files are deleted, a reboot the SE is required to free the fd, this is the reason the available/free disk space did not increase/change.
  • This issue only occurs during HM-only traffic capture.

Resolution

A fix for this issue will be delivered in upcoming GA releases or patch releases of VMware Avi Load Balancer.  Please look for Bug ID AV-219297 in the product release notes.

Workaround(s) to recover the Service Engine:

Manually delete the *keylog.txt files from the Service Engine directory /var/lib/avi/log/pcap and hard reboot the SE VM or Instance.

Preventative Workaround(s):

  1. Close/Disable the HM only packet capture via the GUI.  Operations > Traffic Capture > Traffic Capture
  2. Then initiate packet capture again in either HM with data mode or in data-only mode via the GUI. This should release the file descriptor and free up the system disk after the packet capture is closed/disabled again.