ESXi hosts can no longer establish new connections to NSX Intelligence appliance
/var/log partition is 100% full
The size of the /var/log/kafka directory exceeds 28GB
Issue seen in NSX-T 3.1.0 and below. NSX Intelligence 1.2
/var/log/kafka-authorizer.log filled with events similar to below:
020-11-30T19:33:30.924ZUTC INFO data-plane-kafka-request-handler-5 logger - Principal = User:UID=18163772-####-####-####-########c29,CN=VMware-NSX-Host,1.2.840.113549.1.9.1=#161b73736c2d63657274696669636174657340766d776172652e636f6d,O=VMware\, Inc.,L=Palo Alto,ST=California,C=US is Denied Operation = Describe from host = <host> on resource = Topic:LITERAL:
Environment
VMware NSX-T Data Center 3.x VMware NSX-T VMware NSX-T Data Center 2.x VMware NSX-T Data Center VMware NSX-T Data Center 2.5.x
Cause
Guest metric topic is not sent today as part of config msg. However, the kafka client was still creating guest metrics kafka topic object. This leads to the broker flooding the kafka-authorizer logs in Intelligence appliance.
The issue is seen in scale setup in a stable state after 26 days
Without the fix the kafka-authorizer continuously logs "Denied Operation" statements thereby flooding the logs.
Resolution
Fixed in VMware NSX-T 3.1.1
Workaround:
Clear up the NSX Intelligence /var/log folder with the following steps:
Navigate to the /var/log/kafka folder: "cd /var/log/kafka"
Check the files size: "ls -lh" and look for larger file (Approx. 790M). The file name starts with: kafka-authorizer.log
Delete the oldest files.
Sample file for deletion:
-rw-r--r-- 1 kafka kafka 0 May 15 2020 kafka-authorizer.log.2020-05-15
This is a short-term measure and may need to be repeated frequently depending on how quickly logs increase.