Master log shows error messages "Unexpected internal error (cdbgang.c:1447)", for example:
2014-05-01 18:24:33.948410 PDT,"dbuser","db",p45271,th-1144576256,"xx.xx.xx.xx","12322",2014-05-01 18:07:57 PDT,34925338,con40629,cmd450,seg-1,,dx4430905,x34925338,sx1,"ERROR","XX000","Unexpected internal error (cdbgang.c:1447)",,,,,,"some SQL",0,,"cdbgang.c",1447,"Stack trace: 1 0xa6fdf9 postgres (elog.c:468) 2 0xa74202 postgres elog_internalerror (elog.c:279) 3 0xb9bb4a postgres allocateGang (cdbgang.c:1519) 4 0x705c6d postgres AssignGangs (execUtils.c:1691) 5 0x6ebceb postgres ExecutorStart (execMain.c:549) 6 0x915ff9 postgres PortalStart (pquery.c:873) 7 0x90c704 postgres (postgres.c:2451) 8 0x91067d postgres PostgresMain (postgres.c:4928) 9 0x876181 postgres (postmaster.c:6801) 10 0x87c2c0 postgres PostmasterMain (postmaster.c:2346) 11 0x7811ba postgres main (main.c:212) 12 0x7f28b96edcdd libc.so.6 __libc_start_main (??:0) 13 0x47cae9 postgres (??:0) "
Primary segment log shows error messages "Interconnect error segment lost contact with master (recv)", for example:
2014-05-01 13:47:14.163434 PDT,"dbuser","db",p53867,th527841024,"192.168.17.125","39149",2014-05-01 13:46:25 PDT,46549276,con29197,cmd33,seg11,slice6,dx2195689,x46549 276,sx1,"ERROR","58M01","Interconnect error segment lost contact with master (recv)",,,,,,"someSQL"
A possible reason for this problem is that the master's address in gp_segment_configuration is pointing to the public IP address instead of the private IP address.
For example:
template1=# select * from gp_segment_configuration where content=-1; dbid | content | role | preferred_role | mode | status | port | hostname | address | replication_port | san_mounts ------+---------+------+----------------+------+--------+------+--------------------+--------------------+------------------+------------ 1 | -1 | p | p | s | u | 5432 | gp-prd-rpt-master | gp-prd-rpt-master | | $ cat /etc/hosts|grep master 10.111.111.111 gp-prd-rpt-master.xxx.com gp-prd-rpt-master 192.168.16.125 gp-prd-rpt-master-1 192.168.17.125 gp-prd-rpt-master-2
And, all the segments' addresses are pointing to private IP addresses:
template1=# select * from gp_segment_configuration where content=11; dbid | content | role | preferred_role | mode | status | port | hostname | address | replication_port | san_mounts ------+---------+------+----------------+------+--------+-------+----------------------------+-------------------+------------------+------------ 13 | 11 | p | p | s | u | 40001 | gp-prd-rpt-ss06.xxx.com | gp-prd-rpt-ss06-2 | 42001 | 41 | 11 | m | m | s | u | 41001 | gp-prd-rpt-ss08.xxx.com | gp-prd-rpt-ss08-1 | 43001 | (2 rows) $ cat /etc/hosts|grep -i gp-prd-rpt-ss06-2 192.168.17.116 gp-prd-rpt-ss06-2
1. Change the master's address to point to a private IP address in /etc/hosts. For example:
$ cat /etc/hosts|grep master 10.111.111.111 gp-prd-rpt-master.xxx.com 192.168.16.125 gp-prd-rpt-master-1 gp-prd-rpt-master 192.168.17.125 gp-prd-rpt-master-2
2. SCP the /etc/hosts to all hosts.
3. Restart Greenplum.