GemFire on Kubernetes : Proper Way to Perform Rolling Restarts
search cancel

GemFire on Kubernetes : Proper Way to Perform Rolling Restarts

book

Article ID: 412397

calendar_today

Updated On:

Products

VMware Tanzu Gemfire

Issue/Introduction

We sometimes attempt to restart a Tanzu GemFire cluster by manually deleting pods. While this may appear to work, it is effectively a forceful termination and can cause issues such as abrupt data loss, temporary unavailability, and improper cluster rebalancing.

The recommended approach is to perform rolling restarts using labels on the GemFireCluster CRD, which allows the operator to gracefully shut down and restart each node in an ordered fashion.

Environment

 

  • Tanzu GemFire on Kubernetes 

  • GemFireCluster CRD managed via Operator

  • Kubernetes cluster with standard Pod lifecycle management

 

Cause

Deleting pods manually (kubectl delete pod <podname>) is a forceful operation and does not allow the GemFire Operator to manage the shutdown and startup sequence of locators and servers. This can result in:

  • Non-graceful termination of servers

  • Loss of in-memory data not yet persisted to disk

  • Potential cluster configuration inconsistencies

  • Temporary unavailability of cluster services

Resolution

Use labels on the GemFireCluster CRD to trigger rolling restarts in a controlled and safe manner. The Operator detects label changes and gracefully restarts nodes in order.

Steps for a Rolling Restart:

  1. Check cluster health before initiating a rolling restart. Rolling restarts are intended for clusters that are healthy.

  1. Edit the GemFireCluster CRD to add or update a label under locators.labels and/or servers.labels. Example:

    apiVersion: gemfire.vmware.com/v1
    kind: GemFireCluster
    metadata:
      name: <CLUSTER-NAME>
    spec:
      image: <IMAGE-NAME>
      locators:
        labels:
          environment: production
          tier: premium
      servers:
        labels:
          environment: production
          tier: premium
  2. Save the changes. The Operator will detect the new/updated label and perform a rolling restart of locators and servers in an ordered and graceful manner.

  3. Verify that all nodes restart successfully and the cluster returns to a healthy state.

 

Additional Information

Notes:

  • Adding or updating a label triggers the restart; no need to delete pods manually.

  • Manual pod deletion is equivalent to force-killing the server and should be avoided except for emergency scenarios.

  • This method preserves cluster state, ensures proper rebalancing, and minimizes downtime.

 

Reference: