GemFire on Kubernetes : Proper Way to Perform Rolling Restarts

search cancel

GemFire on Kubernetes : Proper Way to Perform Rolling Restarts

book

Article ID: 412397

calendar_today

Updated On:

Products

VMware Tanzu Gemfire

Issue/Introduction

We sometimes attempt to restart a Tanzu GemFire cluster by manually deleting pods. While this may appear to work, it is effectively a forceful termination and can cause issues such as abrupt data loss, temporary unavailability, and improper cluster rebalancing.

The recommended approach is to perform rolling restarts using labels on the GemFireCluster CRD, which allows the operator to gracefully shut down and restart each node in an ordered fashion.

Environment

Tanzu GemFire on Kubernetes
GemFireCluster CRD managed via Operator
Kubernetes cluster with standard Pod lifecycle management

Cause

Deleting pods manually (kubectl delete pod <podname>) is a forceful operation and does not allow the GemFire Operator to manage the shutdown and startup sequence of locators and servers. This can result in:

Non-graceful termination of servers
Loss of in-memory data not yet persisted to disk
Potential cluster configuration inconsistencies
Temporary unavailability of cluster services

Resolution

Use labels on the GemFireCluster CRD to trigger rolling restarts in a controlled and safe manner. The Operator detects label changes and gracefully restarts nodes in order.

Steps for a Rolling Restart:

Check cluster health before initiating a rolling restart. Rolling restarts are intended for clusters that are healthy.

Edit the GemFireCluster CRD to add or update a label under locators.labels and/or servers.labels. Example:

apiVersion: gemfire.vmware.com/v1
kind: GemFireCluster
metadata:
  name: <CLUSTER-NAME>
spec:
  image: <IMAGE-NAME>
  locators:
    labels:
      environment: production
      tier: premium
  servers:
    labels:
      environment: production
      tier: premium

Save the changes. The Operator will detect the new/updated label and perform a rolling restart of locators and servers in an ordered and graceful manner.
Verify that all nodes restart successfully and the cluster returns to a healthy state.

Additional Information

Notes:

Adding or updating a label triggers the restart; no need to delete pods manually.
Manual pod deletion is equivalent to force-killing the server and should be avoided except for emergency scenarios.
This method preserves cluster state, ensures proper rebalancing, and minimizes downtime.

Reference:

Tanzu GemFire on Kubernetes – CRD Labels

Feedback

thumb_up Yes

thumb_down No