This KB will help to defragmentation the etcd database size. Defragmentation operation removes the free space holes from the storage.
Symptoms:
kube-api-server logs would contain "etcdserver: mvcc: database space exceeded" which indicates storage space exhaustion.
When you run etcdctl command from control plane VM to get endpoint status (etcdctl endpoint status -w ) it would report "alarm:NOSPACE" in the result.
Environment
VMware Tanzu Kubernetes Grid 1.x
VMware Tanzu Kubernetes Grid 2.x
Cause
Keyspace data exceeded. The default storage size limit is 2 GB.
Resolution
To resolve the issues, etcd db will need to be defragmented one node at a time in offline mode. Due to some known issues in etcd v3.5.0 - 3.5.5, it is not recommended to perform an online defragmentation.
Target the Management Cluster and pause cluster reconciliation
alias etcdctl='/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/*/fs/usr/local/bin/etcdctl --cert /etc/kubernetes/pki/etcd/peer.crt --key /etc/kubernetes/pki/etcd/peer.key --cacert /etc/kubernetes/pki/etcd/ca.crt'
Retrieve etcd cluster status and etcd Leader and Followers. If there is a significant different between dbSize and dbSizeInUse, then it needs to be defragmented.
etcdctl -w table member list etcdctl -w table endpoint --cluster status etcdctl endpoint status -w json | jq '.[]' | jq .
Start etcd and confirm all members are in the cluster and are synchronised, one with reduced db size. Do not proceed to the next node if etcd is not healthy.
systemctl start kubelet etcdctl -w table member list etcdctl -w table endpoint --cluster status
Repeat the above 4 steps on the other Follower and finally on the Leader
Disarm the alarm.
etcdctl alarm disarm
Target the Management Cluster and enable cluster reconciliation
Since history should be periodically compacted to avoid performance degradation and eventual storage space exhaustion.The keyspace can be defragmented so it can be manually done using etcdctl. You can perform the defragmentation which helps releasing this storage space back to the file system.
Impact/Risks: kubectl command will not work and it will show "no route to host" also it will cause etcd container to restart frequently in the cluster.