The kube-apiserver could be failing with the error message "etcdserver: mvcc: database space exceeded" showing in the kube-apiserver logs.
The etcd logs could also show the error message "etcdserver: no space" repeatedly.
VMware Tanzu Kubernetes Grid Integrated Edition
The default quota for the etcd db file size is 2GB. When the db file size reaches 2GB, then etcd will show errors that it cannot write to the db anymore due to no more space.
We need to reclaim the disk space consumed by the db file in such situation.
How to reclaim the disk space consumed by the etcdserver's DB files
Log into any master node, and execute the following command. When it's done, copy the snapshot.db file to a safe place.
$ /var/vcap/jobs/etcd/bin/etcdctl snapshot save snapshot.db |
etcd adopts MVCC mechanism to manage the keyspace. It actually never removes data, instead it always appends new data even for the case of deleting a key/value. So we can compact the history to avoid eventual storage space exhaustion. Please log into any master node and execute the following commands.
Note that you only need to execute the commands one time on one master node.
Firstly, execute command below to get the latest revision
# /var/vcap/jobs/etcd/bin/etcdctl endpoint status -w json | egrep -o '"revision":[0-9]*' | egrep -o '[0-9].*' |
Secondly, execute command below to compact away old revisions,
# /var/vcap/jobs/etcd/bin/etcdctl compact 2638877 # 2638877 is the revision returned by previous command |
For each master node, run the following commands.
## Once "monit summary" shows etcd is running, repeat steps in the next master node. No need to worry about errors in the etcd logs for now. |
Note that etcd 3.5 (TKGI 1.13) provides a new tool etcdutl, and we need to use etcdutl at step 3, but unfortunately we do not deploy the the binary etcdutl for now. We will get it included soon in TKGI 1.14. You could still use "etcdctl --defrag" as of TKGI 1.21 but if there is a need to use etcdutl then it is in /var/vcap/packages/etcd/bin/etcdutl.
Log into any master node, and execute commands below. Note that you only need to execute the commands one time on one master node.
# Step 1 : List all alarms # /var/vcap/jobs/etcd/bin/etcdctl alarm list # Step 2 : Disarms all alarms # /var/vcap/jobs/etcd/bin/etcdctl alarm disarm |