ADE does not read from all Kafka partitions on Kafka PVC recreation

search cancel

ADE does not read from all Kafka partitions on Kafka PVC recreation

book

Article ID: 386195

calendar_today

Updated On: 01-21-2025

Products

WatchTower

Issue/Introduction

ADE not reading from all Kafka partitions on Kafka PVC recreation will have all of the following symptoms:

Gaps in green highway data
Green highway data completely missing for some metrics
Reduced throughput of the ADE service

Environment

WatchTower 1.2

Cause

When all the services of WT are up and running and without bringing down the WT services if the Kafka's PVC is deleted and recreated then the ADE service does not automatically consume from all of the partitions on its input topic. Kafka's PVC is the source of message offsets, partitions of a topic, etc for the consumers of Kafka topics. One of these consumers is ADE and it needs this information to function as expected. Deleting the PVC makes the consumers lose this information. This failure mode can only occur if someone deletes the Kafka PVC.

Kafka PVC Deletion

Deletion of the Kafka PVC is a non-standard operation and should never be performed by the customer. Its deletion will result in lost data.

Resolution

Determination:

This failure mode can be determined by examining the creation timestamp of the Kafka PVC and comparing it to the creation timestamps of the Ingestor and ADE services. If the Kafka PVC creation time is newer than the creation time of the services then the system is definitely in this failure mode.

Kafka PVC Timestamp

kubectl -n "${NAMESPACE}" get pvc common-service-kafka-pvc-kafka-0 -o jsonpath="{.metadata.creationTimestamp}"

Pod Ready Timestamp

kubectl -n "${NAMESPACE}" get pod data-insights-ingestor-... -o jsonpath="{range .status.conditions[*]}{.type}
{','}{.lastTransitionTime}{'\n'}{end}"
kubectl -n "${NAMESPACE}" get pod ml-insights-profiler-ade-0 -o jsonpath="{range .status.conditions[*]}{.type}
{','}{.lastTransitionTime}{'\n'}{end}"

Resolution:

If this failure mode is encountered then perform the following steps to fix the issue

1. Scale down both the Ingestor and ADE services

Scale down services

kubectl scale deployment data-insights-ingestor --replicas=0
kubectl scale statefulset ml-insights-profiler-ade --replicas=0

2. Scale up Ingestor service and wait for it to be in the Ready state

Scale up Ingestor service

kubectl scale deployment data-insights-ingestor --replicas=1

# Wait until it is READY
kubectl get deployment data-insights-ingestor

NAME READY UP-TO-DATE AVAILABLE AGE
data-insights-ingestor 1/1 1 1 2m

3. Scale up the ADE service

Scale up ADE service

kubectl scale statefulset ml-insights-profiler-ade --replicas=1

Feedback

Was this article helpful?

thumb_up Yes

thumb_down No