NSX Intelligence - Kafka logs fill up ephemeral storage causing kafka pods to be restarted
search cancel

NSX Intelligence - Kafka logs fill up ephemeral storage causing kafka pods to be restarted

book

Article ID: 319077

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

Symptoms:
In NSX Intelligence 3.2.1 or 4.0.1, Kafka pod restarted with the message: "Pod ephemeral local storage usage exceeds the total limit of containers 1Gi"
This message can be searched in k8s events via command:
kubectl get events -n nsxi-platform | grep kafka
 
"kafkaServer-gc.log.x and server.log.x" are filling up /opt/kafka/logs directory which is limited to 1 GB. Once the directory is full, kafka-pod will restart and clean all the files in this directory. View /opt/kafka/logs contents with command:
kubectl exec -it -n nsxi-platform {kafka-pods} -- ls -lh /opt/kafka/logs

Resolution

This issue will be resolved in a future NSX Intelligence release.

Workaround:
Disable Kafka logging to prevent pod restart due to ephemeral pod getting full. Console logging will be always available.
 
  Step 1. Change kafka log4j configmap file to do logging in console only.
         a. kubectl edit configmap -n nsxi-platform kafka-log4j-configuration
         b. Remove 'kafkaAppender in this line 'log4j.rootLogger=INFO, stdout, kafkaAppender
         c. New line should look like this > log4j.rootLogger=INFO, stdout
         4. Save and exit.
 
  Step 2. Add below entry in kafka statefulset object to stop logging kafkaServer-gc.log files.
        a.  kubectl edit statefulset kafka -n nsxi-platform
        b.  Go to "env" section in container and add the following key / value pair.
           containers
            - env:
            - name: EXTRA_ARGS
              value: -name kafkaServer
        c. Save and exit. This will restart kafka pods automatically one by one.
        d. To verify the change, run command: 
     kubectl describe statefulset kafka -n nsxi-platform
 
The following entry should be seen under environment.
 
            Environment:
              EXTRA_ARGS:  -name kafkaServer

Additional Information

Impact/Risks:
During automatic Kafka pod restart, users will not be able to perform any activity like monitoring network traffic or running recommendations.