Error message "etcdserver: mvcc: database space exceeded" when creating resource in the K8s cluster
search cancel

Error message "etcdserver: mvcc: database space exceeded" when creating resource in the K8s cluster

book

Article ID: 298690

calendar_today

Updated On:

Products

VMware Tanzu Kubernetes Grid Integrated Edition

Issue/Introduction

The default quota for the db file size is 2GB. When the db file size reaches 2GB, then you will get the error message "etcdserver: mvcc: database space exceeded" when creating resource in the K8s cluster. We need to reclaim the disk space consumed by the db file in such situation.

Environment

Product Version: 1.11

Resolution

How to reclaim the disk space consumed by the etcdserver's DB files

Please follow steps below to reclaim the disk space consumed by the db file.

Step 1: Backup the db

Log into any master node, and execute the following command. When it's done, copy the snapshot.db file to a safe place.

$ /var/vcap/jobs/etcd/bin/etcdctl snapshot save snapshot.db


Step 2: Compact

etcd adopts MVCC mechanism to manage the keyspace. It actually never removes data, instead it always appends new data even for the case of deleting a key/value. So we can compact the history to avoid eventual storage space exhaustion.  Please log into any master node and execute the following commands. Note that you only need to execute the commands one time on one master node.

Firstly, execute command below to get the latest revision,

# /var/vcap/jobs/etcd/bin/etcdctl endpoint status -w json | egrep -o '"revision":[0-9]*' | egrep -o '[0-9].*'


Secondly, execute command below to compact away old revisions,

# /var/vcap/jobs/etcd/bin/etcdctl compact 2638877  2638877 is the revision returned by previous command


Step 3: Defragment

Please note that you need to execute the following commands on each master node.

## Step 1: Stop etcd
# monit stop etcd
 
## Step 2: Backup the /var/vcap/store/etcd/member
# cp -r /var/vcap/store/etcd/member path-to-somewhere
 
## Step 3: Defragment
# /var/vcap/jobs/etcd/bin/etcdctl defrag --data-dir /var/vcap/store/etcd
 
## Step 4: Change ownership
# chown vcap:vcap /var/vcap/store/etcd/member/snap/db
 
## Step 5: Start etcd
# monit start etcd

Note that etcd 3.5 (TKGI 1.13) provides a new tool etcdutl, and we need to use etcdutl at step 3, but unfortunately we do not deploy the the binary etcdutl for now. We will get it included soon in TKGI 1.14.


Step 4: Disarm all alarms

Log into any master node, and execute commands below:

# Step 1: List all alarms
# /var/vcap/jobs/etcd/bin/etcdctl  alarm list
 
# Step 2: Disarms all alarms
# /var/vcap/jobs/etcd/bin/etcdctl  alarm disarm