NSX intelligence Appliance /var/log partition 100% full, due to kafka-authorizer "Denied Operation" events
book
Article ID: 324404
calendar_today
Updated On:
Products
VMware NSX
Issue/Introduction
Symptoms:
NSX Intelligence appliance becomes unmanageable
ESXi hosts can no longer establish new connections to NSX Intelligence appliance
/var/log partition is 100% full
The size of the /var/log/kafka directory exceeds 28GB
Issue seen in NSX-T 3.1.0 and below. NSX Intelligence 1.2
/var/log/kafka-authorizer.log filled with events similar to below:
020-11-30T19:33:30.924ZUTC INFO data-plane-kafka-request-handler-5 logger - Principal = User:UID=18163772-95fe-44ef-a713-0890e4842c29,CN=VMware-NSX-Host,1.2.840.113549.1.9.1=#161b73736c2d63657274696669636174657340766d776172652e636f6d,O=VMware\, Inc.,L=Palo Alto,ST=California,C=US is Denied Operation = Describe from host = 20.20.210.120 on resource = Topic:LITERAL:
Environment
VMware NSX-T Data Center 3.x VMware NSX-T VMware NSX-T Data Center 2.x VMware NSX-T Data Center VMware NSX-T Data Center 2.5.x
Cause
Guest metric topic is not sent today as part of config msg. However the kafka client was still creating guest metrics kafka topic object. This leads to the broker flooding the kafka-authorizer logs in Intelligence appliance.
The issue is seen in scale setup in a stable state after 26 days
Without the fix the kafka-authorizer continuously logs "Denied Operation" statements thereby flooding the logs.
Resolution
Fixed in NSX-T 3.1.1
Workaround:
Clear up the NSX Intelligence /var/log folder with the following steps:
Navigate to the /var/log/kafka folder: "cd /var/log/kafka"
Check the files size: "ls -lh" and look for larger file (Approx. 790M). The file name starts with: kafka-authorizer.log
Delete the oldest files.
Sample file for deletion:
-rw-r--r-- 1 kafka kafka 0 May 15 2020 kafka-authorizer.log.2020-05-15
This is a short term measure and may need to be repeated frequently depending on how quickly logs increase.