kubectl commands fails in the affected vSphere Kubernetes cluster.
context deadline exceeded" "etcdctl endpoint health status --cluster=true -w table" "etcdctl endpoint status --cluster=true -w table"etcd and kube-apiserver containers crashing and restarting in a loop.<timestamp>.357959735Z stderr F {"level":"fatal","ts":"YYYY-MM-DDTHH:MM:SS.XXXZ","caller":"etcdmain/etcd.go:204","msg":"discovery failed","error":"wal: max entry size limit exceeded, recBytes: 908, fileSize(15368192) - offset(15368064) - padBytes(4) = entryLimit(124)","stacktrace":"go.etcd.io/etcd/server/v3/etcdmain.startEtcdOrProxyV2\n\tgo.etcd.io/etcd/server/v3/etcdmain/etcd.go:204\ngo.etcd.io/etcd/server/v3/etcdmain.Main\n\tgo.etcd.io/etcd/server/v3/etcdmain/main.go:40\nmain.main\n\tgo.etcd.io/etcd/server/v3/main.go:32\nruntime.main\n\truntime/proc.go:250"}/var/lib/etcd/member/wal/, whereas in a healthy node these files are continuously updated. In the following example etcd stopped writing WAL files when the issue occurred.
The error is caused by etcd attempting to write an entry that exceeds the remaining space in the current WAL segment.
One possible reason for etcd to stop writing WAL files in "/var/lib/etcd/member/wal/" and ending up in a crash loop is a disk space issue in the Supervisor nodes.
cd /var/log/vmware/audit
rm *log.gz
journalctl --vacuum-time=2d
If issue persists after sufficient disk space then follow the below steps :
/var/lib/etcd/member/wal/ directory.mv /var/lib/etcd/member/wal/<filename>.wal /root/