[gpadmin@dw-greenplum-1 ~]$ gpcc start Starting the gpcc agents and webserver… 2019/03/18 16:47:43 Agent successfully started on 3/3 hosts 2019/03/18 16:47:43 View Greenplum Command Center at http://dw-greenplum-1:28080 [gpadmin@dw-greenplum-1 ~]$ gpcc status 2019/03/18 16:47:49 GPCC webserver: running 2019/03/18 16:47:49 GPCC agents: 1/3 agents running 2019/03/18 16:47:49 Agent is stopped on dw-greenplum-3 2019/03/18 16:47:49 Agent is stopped on dw-greenplum-2
[gpadmin@dw-greenplum-3 logs]$ cat agent.log 2019/02/17 00:06:40 connect to rpc server dw-greenplum-1:8899 2019/02/17 00:06:43 Agent cannot start due to no RPC connectionrpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp 172.16.76.149:8899: connect: no route to host"
The agent log shows the message that the segment server could not communicate with 172.16.76.149:8899. So issue might be related to this IP. Then we can try to ping this IP, but not working. So that possibly means the segment could not recognize this IP.
Then we can check the /etc/hosts file on the segment server, we can see the IP is in hosts file and it's the mdw IP:
[root@dw-greenplum-2 ~]# cat /etc/hosts 127.0.0.1 localhost.localdomain localhost ::1 localhost6.localdomain6 localhost6 172.16.76.150 dw-greenplum-2 sdw1 172.16.76.151 dw-greenplum-3 sdw2 172.16.76.149 dw-greenplum-1 mdw
From now on, you can see 172.16.76.149 should be recognized by this segment server, so the only reason should be that the server does not exist in the cluster anymore.
You can confirm this conclusion by checking ifconfig and /etc/hosts file on master:
[root@dw-greenplum-1 logs]# ifconfig | less eth0 Link encap:Ethernet HWaddr 00:0C:29:B0:25:F7 inet addr:172.16.76.152 Bcast:172.16.76.255 Mask:255.255.255.0 inet6 addr: fe80::20c:29ff:feb0:25f7/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:264197 errors:0 dropped:0 overruns:0 frame:0 TX packets:174656 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:251842110 (240.1 MiB) TX bytes:73770998 (70.3 MiB) lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:65536 Metric:1 RX packets:196812 errors:0 dropped:0 overruns:0 frame:0 TX packets:196812 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:341777495 (325.9 MiB) TX bytes:341777495 (325.9 MiB) [root@dw-greenplum-1 logs]# cat /etc/hosts 127.0.0.1 localhost.localdomain localhost ::1 localhost6.localdomain6 localhost6 172.16.76.152 dw-greenplum-1 mdw 172.16.76.150 dw-greenplum-2 sdw1 172.16.76.151 dw-greenplum-3 sdw2
The true IP address of mdw is 172.16.76.152, not 172.16.76.149. That's the cause of this issue: master has changed its IP, but segment servers still use the old one.
[gpadmin@dw-greenplum-1 ~]$ gpcc status 2019/03/18 16:57:52 GPCC webserver: running 2019/03/18 16:57:53 GPCC agents: 3/3 agents running