ADE does not read from all Kafka partitions on Kafka PVC recreation
search cancel

ADE does not read from all Kafka partitions on Kafka PVC recreation

book

Article ID: 386195

calendar_today

Updated On:

Products

WatchTower

Issue/Introduction

ADE not reading from all Kafka partitions on Kafka PVC recreation will have all of the following symptoms:

  • Gaps in green highway data
  • Green highway data completely missing for some metrics
  • Reduced throughput of the ADE service

 

Environment

WatchTower 1.2

Cause

When all the services of WT are up and running and without bringing down the WT services if the Kafka's PVC is deleted and recreated then the ADE service does not automatically consume from all of the partitions on its input topic. Kafka's PVC is the source of message offsets, partitions of a topic, etc for the consumers of Kafka topics. One of these consumers is ADE and it needs this information to function as expected. Deleting the PVC makes the consumers lose this information. This failure mode can only occur if someone deletes the Kafka PVC.

     Kafka PVC Deletion

         Deletion of the Kafka PVC is a non-standard operation and should never be performed by the customer. Its deletion will result in lost data.

Resolution

Determination:

This failure mode can be determined by examining the creation timestamp of the Kafka PVC and comparing it to the creation timestamps of the Ingestor and ADE services. If the Kafka PVC creation time is newer than the creation time of the services then the system is definitely in this failure mode.

          Kafka PVC Timestamp

         kubectl -n "${NAMESPACE}" get pvc common-service-kafka-pvc-kafka-0 -o jsonpath="{.metadata.creationTimestamp}"

     Pod Ready Timestamp

         kubectl -n "${NAMESPACE}" get pod data-insights-ingestor-... -o jsonpath="{range .status.conditions[*]}{.type}
         {','}{.lastTransitionTime}{'\n'}{end}"
         kubectl -n "${NAMESPACE}" get pod ml-insights-profiler-ade-0 -o jsonpath="{range .status.conditions[*]}{.type}
         {','}{.lastTransitionTime}{'\n'}{end}"


Resolution:

If this failure mode is encountered then perform the following steps to fix the issue

1.   Scale down both the Ingestor and ADE services

       Scale down services

         kubectl scale deployment data-insights-ingestor --replicas=0
         kubectl scale statefulset ml-insights-profiler-ade --replicas=0

2.  Scale up Ingestor service and wait for it to be in the Ready state

         Scale up Ingestor service 

             kubectl scale deployment data-insights-ingestor --replicas=1

         # Wait until it is READY
         kubectl get deployment data-insights-ingestor

         NAME                             READY              UP-TO-DATE              AVAILABLE            AGE
         data-insights-ingestor           1/1                1                       1                    2m

3.  Scale up the ADE service

         Scale up ADE service

         kubectl scale statefulset ml-insights-profiler-ade --replicas=1