gpsmon agent not starting on some segments in VMware Tanzu Greenplum
search cancel

gpsmon agent not starting on some segments in VMware Tanzu Greenplum

book

Article ID: 296501

calendar_today

Updated On:

Products

VMware Tanzu Greenplum

Issue/Introduction

You may experience that the gpsmon process is not starting on some segments:
=> ps -ef | egrep "gpmmon|gpsmon" | egrep -v grep | wc -l
[idb151] 0
[idb174] 0
[idb183] 1
[idb155] 0
[idb156] 0
(...)

At some point, gpmmon lost connection to a segment and restarted the gpsmon's.
2020-04-26 19:21:45|:-LOG: Connection to idb155 lost.  Restarting gpsmon.
2020-04-26 19:21:45|:-LOG: Connection to idb151 lost.  Restarting gpsmon.
2020-04-26 19:21:45|:-LOG: Connection to idb174 lost.  Restarting gpsmon.
2020-04-26 19:21:45|:-LOG: Connection to idb156 lost.  Restarting gpsmon.
(...)


Environment

Product Version: 5.24

Resolution

The gpmmon log shows there are issues in binding to a socket:
2020-04-24 19:27:11|:-LOG: HOSTNAME = 'idb156'
2020-04-24 19:27:11|:-FATAL: [INTERNAL ERROR gpsmon.c:1347] unable to bind udp socket
        error 98 (Address already in use)
        ... exiting

However, observe that netstat didn't show port 8888 in use:
[gpadmin@idb156]-/var/gpadmin # netstat -nalp | grep 8888
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
[gpadmin@idb156]-/var/gpadmin #

The issue is using gpfdist with a range between 8000 and 9000.
gpdb-2020-04-26_135525.csv:2020-04-26 19:21:37|INFO|started gpfdist -p 8000 -P 9000 -f ""/home/gpadmin/gpfdist.log"" -t 30 -m 1000000

With many external table accessing gpfdist, it eventually overlapped with gpsmon port 8888.


Resolution

1. Use gpfdist with a different range of ports, for example 9000 - 10000.
2. Change the configuration of gpsmon to use a different port.