During planned physical network switch maintenance, customers often ask whether it is acceptable to keep Tanzu Kubernetes Grid (Multi-Cloud) (TKGm) Management and Workload Cluster nodes running if other virtual machines remain online, or whether these Kubernetes nodes should be shut down.
This article provides best-practice recommendations to prevent etcd corruption, control-plane instability, and cluster reconciliation issues following network outages.
VMware Tanzu Kubernetes Grid
Unlike regular virtual machines, TKGm cluster nodes host critical Kubernetes components such as control plane and etcd, which are highly sensitive to network interruptions.
If these nodes remain powered on during a network outage, communication loss among etcd members or control-plane nodes may lead to data corruption, reconciliation delays, or cluster instability after the network is restored.
It is strongly recommended to gracefully shut down all TKGm Management and Workload Cluster nodes before initiating physical network switch maintenance.
If shutting down is not feasible, an alternative approach is to pause the clusters prior to maintenance and unpause them after the network has been fully restored.
1.Check cluster pause status (no output if not paused):
For official documentation and detailed procedures, refer to:
TKG 2.5 – Shut Down and Restart Clusters
TKG 2.5 – Cluster Lifecycle Operations