This guide provides the steps to safely perform offline defragmentation on an etcd cluster, one node at a time. This process is used to reclaim space when the database size (dbSize) is significantly larger than its actual usage (dbSizeInUse).
Important: This is a high-risk maintenance operation. Ensure you have a complete, verified backup of your etcd cluster or snapshot before proceeding.
2.x. 3.x
Symptoms:
Impact/Risks:
kubectl command will not work and it will show "no route to host" also it will cause etcd container to restart frequently in the cluster.
Before modifying any etcd node, you must prepare the cluster and verify its health.
1. Pause Cluster Reconciliation Run this kubectl command from your management cluster to prevent the cluster controller from making conflicting changes during maintenance.
kubectl patch cluster <Cluster> -n <Namespace> --type merge -p '{"spec":{"paused": true}}'
2. Access a Control Plane Node SSH into any one of your control plane (CP) nodes.
3. Set up etcdctl Alias The etcdctl utility is typically located within the container image. This alias makes it accessible. You will need to run this alias in each new SSH session you open on each node.
Note: The snapshot path may vary based on your container runtime and configuration.
alias etcdctl='/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/*/fs/usr/local/bin/etcdctl --cert /etc/kubernetes/pki/etcd/peer.crt --key /etc/kubernetes/pki/etcd/peer.key --cacert /etc/kubernetes/pki/etcd/ca.crt'
4. Check Cluster Health Verify that all members are healthy and identify the leader and followers.
# List all members and their statusetcdctl -w table member list
# Check health and database size for all endpointsetcdctl -w table endpoint --cluster status
5. Check for Fragmentation Run the following command to see the percentage of wasted space. A high percentage indicates a need for defragmentation.
etcdctl endpoint status --cluster -w json | jq '.[] | ((.Status.dbSize - .Status.dbSizeInUse)/.Status.dbSize)*100'
6. Compact Revision History Before defragmenting, compact the history. First, get the current revision number from any healthy endpoint:
# Example of getting the revision from the first endpoint
REVISION=$(etcdctl endpoint status -w json | jq '.[0].Status.header.revision')
echo $REVISION
Now, use that revision number to compact the database. This discards all history prior to this revision.
etcdctl compact $REVISION
Perform these steps on one node at a time. Start with the followers first. Only defragment the leader last.
WARNING: Do not proceed to the next node until the current node has successfully rejoined the cluster and the cluster is healthy.
1. Stop Kubelet and etcd Pod On the CP node you are servicing:
# Stop kubelet to prevent it from restarting static pods
systemctl stop kubelet
# Manually stop the etcd and kube-apiserver containers
crictl rm -f $(crictl ps --label io.kubernetes.container.name=etcd -q)crictl rm -f $(crictl ps --label io.kubernetes.container.name=kube-apiserver -q)
2. Back Up etcd Data Create a backup of this specific node's etcd data directory.
mkdir -p /root/etcdbkpcp -a /var/lib/etcd /root/etcdbkp/
3. Defragment the Database Run the etcdctl defrag command, pointing it to the data directory.
# Set the alias again if this is a new session or script
alias etcdctl='/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/*/fs/usr/local/bin/etcdctl --cert /etc/kubernetes/pki/etcd/peer.crt --key /etc/kubernetes/pki/etcd/peer.key --cacert /etc/kubernetes/pki/etcd/ca.crt'
# Run defragmentation
etcdctl defrag --data-dir /var/lib/etcd/
4. Restart Services Start the kubelet, which will in turn restart the etcd and kube-apiserver static pods.
systemctl start kubelet
5. Verify Cluster Health Wait a few moments, then check the cluster status from the same node.
# Set the alias again if needed
alias etcdctl='/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/*/fs/usr/local/bin/etcdctl --cert /etc/kubernetes/pki/etcd/peer.crt --key /etc/kubernetes/pki/etcd/peer.key --cacert /etc/kubernetes/pki/etcd/ca.crt'
# Check that all members are healthy
etcdctl -w table member list
# Check that the dbSize for this node is now reduced
etcdctl -w table endpoint --cluster status
6. Repeat for Other Nodes Once you have confirmed the node is healthy and has rejoined the cluster, repeat Phase 2 (Steps 1-5) for the remaining follower nodes. After all followers are complete, perform the procedure on the leader node.
After all nodes have been successfully defragmented and the cluster is fully healthy:
1. Disarm etcd Alarms From any CP node, disarm any alarms that may have been triggered during maintenance.
# Set the alias again if needed
alias etcdctl='/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/*/fs/usr/local/bin/etcdctl --cert /etc/kubernetes/pki/etcd/peer.crt --key /etc/kubernetes/pki/etcd/peer.key --cacert /etc/kubernetes/pki/etcd/ca.crt'
etcdctl alarm disarm
2. Resume Cluster Reconciliation Run this kubectl command from your management cluster to re-enable cluster reconciliation.
kubectl patch cluster <Cluster> -n <Namespace> --type merge -p '{"spec":{"paused": false}}'