NSX Intelligence - Kafka logs fill up ephemeral storage causing kafka pods to be restarted
book
Article ID: 319077
calendar_today
Updated On:
Products
VMware NSX
Issue/Introduction
Symptoms: In NSX Intelligence 3.2.1 or 4.0.1, Kafka pod restarted with the message: "Pod ephemeral local storage usage exceeds the total limit of containers 1Gi" This message can be searched in k8s events via command: kubectl get events -n nsxi-platform | grep kafka "kafkaServer-gc.log.x and server.log.x" are filling up /opt/kafka/logs directory which is limited to 1 GB. Once the directory is full, kafka-pod will restart and clean all the files in this directory. View /opt/kafka/logs contents with command: kubectl exec -it -n nsxi-platform {kafka-pods} -- ls -lh /opt/kafka/logs
Resolution
This issue will be resolved in a future NSX Intelligence release.
Workaround: Disable Kafka logging to prevent pod restart due to ephemeral pod getting full. Console logging will be always available. Step 1. Change kafka log4j configmap file to do logging in console only. a. kubectl edit configmap -n nsxi-platform kafka-log4j-configuration b. Remove 'kafkaAppender in this line 'log4j.rootLogger=INFO, stdout, kafkaAppender c. New line should look like this > log4j.rootLogger=INFO, stdout 4. Save and exit. Step 2. Add below entry in kafka statefulset object to stop logging kafkaServer-gc.log files. a. kubectl edit statefulset kafka -n nsxi-platform b. Go to "env" section in container and add the following key / value pair. containers - env: - name: EXTRA_ARGS value: -name kafkaServer c. Save and exit. This will restart kafka pods automatically one by one. d. To verify the change, run command: kubectl describe statefulset kafka -n nsxi-platform The following entry should be seen under environment. Environment: EXTRA_ARGS: -name kafkaServer
Additional Information
Impact/Risks: During automatic Kafka pod restart, users will not be able to perform any activity like monitoring network traffic or running recommendations.