How to remove and add etcd members to an existing etcd cluster in VMware Postgres
search cancel

How to remove and add etcd members to an existing etcd cluster in VMware Postgres

book

Article ID: 296383

calendar_today

Updated On:

Products

VMware Tanzu Greenplum

Issue/Introduction

This article covers how to add and remove a node in an existing etcd cluster without experiencing any downtime.

Environment

Product Version: 11.6

Resolution

You will have to run the following command in either postgres_node_1 or postgres_node_2. To add, run the following:
[postgres@postgres_node_1 ~]$ etcdctl member add postgres_node_3 http://10.193.102.53:2380
Added member named postgres_node_3 with ID 147cd88c6f326499 to cluster

ETCD_NAME="postgres_node_3"
ETCD_INITIAL_CLUSTER="postgres_node_3=http://10.193.102.53:2380,postgres_node_2=http://10.193.102.52:2380,postgres_node_1=http://10.193.102.51:2380"
ETCD_INITIAL_CLUSTER_STATE="existing"

[postgres@postgres_node_1 ~]$ etcdctl member list
147cd88c6f326499[unstarted]: peerURLs=http://10.193.102.53:2380
3cdccba04ff14dd2: name=postgres_node_2 peerURLs=http://10.193.102.52:2380 clientURLs=http://10.193.102.52:2379 isLeader=false
6acc57bdaa99224e: name=postgres_node_1 peerURLs=http://10.193.102.51:2380 clientURLs=http://10.193.102.51:2379 isLeader=true

You'll see the node was added but not yet started. You must modify the following line in your etcd.yml file in postgres_node_3 host from:
ETCD_INITIAL_CLUSTER_STATE="new"
to
ETCD_INITIAL_CLUSTER_STATE="existing"

If you don't makes these changes, then you will see these errors in the etcd log file when you try to start etcd in postgres_node_3:
2020-04-15 22:14:05.293323 E | rafthttp: failed to find member 6acc57bdaa99224e in cluster 57fa18439694a8f7
2020-04-15 22:14:05.293516 E | rafthttp: failed to find member 6acc57bdaa99224e in cluster 57fa18439694a8f7

In postgres_node_3, you must move the old etcd data directory, otherwise you will get this message in the etcd log:
2020-04-15 22:19:57.453943 E | etcdserver: the member has been permanently removed from the cluster
2020-04-15 22:19:57.453955 I | etcdserver: the data-dir used by this member must be removed.
2020-04-15 22:19:57.454123 I | etcdserver: aborting publish because server is stopped

Once you move the directory, the startup of etcd is successful:
[postgres@postgres_node_3 ~]$ mv postgres_node_3.etcd postgres_node_3.etcd.back2
[postgres@postgres_node_3 ~]$ etcd --config-file etcd.yml > etcd_logfile 2>&1 &
[1] 28303
[postgres@postgres_node_3 ~]$ 
[postgres@postgres_node_3 ~]$ etcdctl member list
3cdccba04ff14dd2: name=postgres_node_2 peerURLs=http://10.193.102.52:2380 clientURLs=http://10.193.102.52:2379 isLeader=false
67e8357c22476fad: name=postgres_node_3 peerURLs=http://10.193.102.53:2380 clientURLs=http://10.193.102.53:2379 isLeader=false
6acc57bdaa99224e: name=postgres_node_1 peerURLs=http://10.193.102.51:2380 clientURLs=http://10.193.102.51:2379 isLeader=true


More information can be found at the etcd GitHub page: etcd/Documentation at master · etcd-io/etcd · GitHub