How to drain Kafka topics in WatchTower (WT) in preparation for upgrade or reinstallation of Kafka
search cancel

How to drain Kafka topics in WatchTower (WT) in preparation for upgrade or reinstallation of Kafka

book

Article ID: 436676

calendar_today

Updated On:

Products

WatchTower WatchTower Platform

Issue/Introduction

This article will be applicable in the following situations:

  1. Upgrade from WatchTower 1.3 to WatchTower 1.4.
  2. WatchTower 1.4 onwards, increasing the number of Kafka replicas. 

If you are using  zArchitecture Bridge then this KB is not applicable.

Environment

WatchTower  1.3 and above

Cause

  1. Upgrade from WatchTower 1.3 to WatchTower 1.4.
    • It is recommended to stop the data feed and ensure that all existing data on the Kafka bus has been fully consumed before upgrading. This prevents metric data loss during the upgrade duration.

  2. WatchTower 1.4 onwards, increasing the number of Kafka replicas for High Availability. 
    • By default, WatchTower uses a single Kafka replica to minimize hardware requirements. To enable High Availability, you must increase the Kafka replication factor. The most reliable way to apply these configuration changes is to recreate the underlying Persistent Volume Claims (PVCs). Since this process involves deleting existing volumes, all data should be processed before the upgrade to avoid metric data loss.

Resolution

Instructions on how to drain Kafka topics in WatchTower (WT) in preparation for upgrade or reinstallation of Kafka

Note: These instructions are restricted to the Mainframe data feed via the Message Service Server. This process is not applicable for a data feed via zArchitecture Bridge.

Scale the datastream-hub-deployment down to 0 replicas. This action stops WatchTower from receiving new metric data from the Mainframe, allowing the existing Kafka bus to be processed.

kubectl scale -n <namespace> deploy datastream-hub-deployment --replicas 0
  1. Run the diagnostics.sh (refer to techdoc for detailed usage) script that came with the WatchTower installer to ensure the data flow from the mainframe has stopped.
    1. Issue the following command:
      sh diagnostics.sh summary -n <namespace>
    2. In the output check for the output, if you see the dataflow, then re-run the diagnostics script after 5 minutes and keep repeating until the dataflow stops: 
      Diagnostics Summary
      -------------------------------------------------------------------------
      PROBLEMS
      -------------------------------------------------------------------------
      PROBLEM: There is no data flow to WatchTower.
      ROOT CAUSE: There is no data flow to WatchTower as no active subscriptions are found.
      RECOMMENDATION: Follow the CCS Message Server technical documentation to configure or troubleshoot the subscriptions.
  2. Wait for the Kafka clients to consume all the existing data from the Kafka topics. Run the diagnostics.sh script (Diagnostics Utility) that comes with the WatchTower installer to make sure the existing data on the Kafka topics has been processed.
    1. Issue the following command:
      sh diagnostics.sh summary -n <namespace>
    2. Check the throughput.txt file generated after and see that the pods show "Average Moving Rate of 0" for the last 10 minutes at least, as shown below: 
      -------------------------------------------------------------------------
      Pod: ml-insights-profiler-ade-0
      Average Moving Rate of 0 for the last x minutes.
      -------------------------------------------------------------------------
      -------------------------------------------------------------------------
      Pod: ml-insights-profiler-alarm-manager-77dbb5c766-lq4tw
      Average Moving Rate of 0 for the last y minutes.