Why is metrics_collector started much later than others processes on vSphere mirrorless cluster after a Virtual machine failure.
search cancel

Why is metrics_collector started much later than others processes on vSphere mirrorless cluster after a Virtual machine failure.

book

Article ID: 382287

calendar_today

Updated On:

Products

VMware Tanzu Greenplum Greenplum vSphere with Tanzu

Issue/Introduction

After a VM failure in a Greenplum on vSphere mirrorless cluster The reset VM and its failed primaries were restarted automatically as expected but metrics collector did not start until much later. 

Environment

Greenplum on vSphere. 

Cause

There is an metrics collector checker functionality,  This checker runs every 15 minutes, to check metrics_collector status for each segment, if not started, it will call

`gpmetrics(gp6)/gpcc_mc(gp7).metrics_collector_restart_worker(<seg_id>)`

This will bring the metrics_collector background worker up.

You can see this in the GPCC webserver log,

```

2024-11-18 14:34:03 [INFO][MCChecker] Found segments missing metrics collector: 11
2024-11-18 14:34:03 [INFO][MCChecker] Restarted metrics collector successfully for segment=11

Resolution

This is functioning as designed, R+D are looking to reduce this 15 minute time between checks.