After a segment host goes offline, crashes and becomes unreachable on the network, Greenplum Command Center (GPCC) fails to start.
GPCC 6.15
When executing the gpcc start command, the process may hang and eventually fail. A review of the GPCC logs indicates that the Agent Manager initialization has been aborted, often accompanied by SSH connection timeouts or host unreachable errors pointing to the downed segment host.
This issue occurs due to how GPCC validates the database cluster topology during its startup routine:
Catalog Query: During initialization (StartAgentManager), GPCC dynamically reads the database's gp_segment_configuration system catalog to generate a complete list of all segment hosts.
Lack of Status Filtering: GPCC does not currently filter this list based on the segment's actual status. It pulls the host list regardless of whether the database has marked the segments on that host as "up" or "down."
SSH Verification: For every host retrieved from the catalog, GPCC attempts to establish an SSH connection to resolve the operating system hostname.
Initialization Abort: Because the crashed host is offline, the SSH connection attempt times out or fails. GPCC treats this network failure as a critical error and immediately aborts the entire Agent Manager initialization, halting the GPCC startup.