We sometimes attempt to restart a Tanzu GemFire cluster by manually deleting pods. While this may appear to work, it is effectively a forceful termination and can cause issues such as abrupt data loss, temporary unavailability, and improper cluster rebalancing.
The recommended approach is to perform rolling restarts using labels on the GemFireCluster CRD, which allows the operator to gracefully shut down and restart each node in an ordered fashion.
Tanzu GemFire on Kubernetes
GemFireCluster CRD managed via Operator
Kubernetes cluster with standard Pod lifecycle management
Deleting pods manually (kubectl delete pod <podname>) is a forceful operation and does not allow the GemFire Operator to manage the shutdown and startup sequence of locators and servers. This can result in:
Non-graceful termination of servers
Loss of in-memory data not yet persisted to disk
Potential cluster configuration inconsistencies
Temporary unavailability of cluster services
Use labels on the GemFireCluster CRD to trigger rolling restarts in a controlled and safe manner. The Operator detects label changes and gracefully restarts nodes in order.
Steps for a Rolling Restart:
Check cluster health before initiating a rolling restart. Rolling restarts are intended for clusters that are healthy.
Edit the GemFireCluster CRD to add or update a label under locators.labels and/or servers.labels. Example:
apiVersion: gemfire.vmware.com/v1
kind: GemFireCluster
metadata:
name: <CLUSTER-NAME>
spec:
image: <IMAGE-NAME>
locators:
labels:
environment: production
tier: premium
servers:
labels:
environment: production
tier: premium
Save the changes. The Operator will detect the new/updated label and perform a rolling restart of locators and servers in an ordered and graceful manner.
Verify that all nodes restart successfully and the cluster returns to a healthy state.
Notes:
Adding or updating a label triggers the restart; no need to delete pods manually.
Manual pod deletion is equivalent to force-killing the server and should be avoided except for emergency scenarios.
This method preserves cluster state, ensures proper rebalancing, and minimizes downtime.
Reference: