Symptoms:
When deploying, scaling, or upgrading DB clusters (Postgres or MySQL) with DSM, sometimes the new workload cluster nodes fail to be added to the cluster.
This is manifested with the following status conditions on the DB clusters:
"internal error creating Kubernetes cluster: number of ready replicas differ: expected=3, actual=2: error provisioning Kubernetes cluster"
When looking at the VM console you may see an error message like this:
"etcdserver: re-configuration failed due to not enough stated members"
VCF and Data Services Manager(DSM) 2.1
This has occurred because the second added member had not fully started by the time the third member was added.
To remediate this, we need to take the following steps which will delete the problematic node and auto-generate a new replacement
# kubectl get ipaddress -A
ssh -i /opt/vmware/tdm-provider/provisioner/sshkey capv@<ip-of-node>
# crictl ps | grep etcd
crictl exec -it <container-id> sh
# etcdctl --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key member list
# etcdctl --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key member remove <bad-member-id>
# kubectl delete machine -n mysql-default "name of bad member"
# kubectl edit mysqlcluster mysql-01
This issue is scheduled to be fixed in DSM version 2.1.1