"Metrics collector: pbuff is null" ERROR continuously occurs even after running gprecoverseg -r for rebalance
search cancel

"Metrics collector: pbuff is null" ERROR continuously occurs even after running gprecoverseg -r for rebalance

book

Article ID: 297044

calendar_today

Updated On:

Products

VMware Tanzu Greenplum

Issue/Introduction

There were failed segment and queries errors due to disk and raid controller failure in GPDB cluster. Even since after replacing disk and raid controller the following error is continually occurring although cluster has been normalized after running gprecoverseg for increment recovery and gprecoverseg -r for rebalance. 
2023-10-20 16:11:44.445605 KST,"gpmon","gpperfmon",p15569,th1319127168,"xxx.xxx.x.xx","47030",2023-10-20 15:50:59 KST,0,con12158,cmd1108,seg-1,,,,sx1,"ERROR","XX000","Metrics collector: pbuff is null. This might be due to recovery of a segment. Consider restart GPDB and recreate extension (metrics_collector_gp.c:324) (seg12 xxx.xxx.x.xx:6000 pid=1082) (metrics_collector_gp.c:324)",,,,,,"select gpmetrics.metrics_collector_restart_worker(12)",0,,"metrics_collector_gp.c",324,"Stack trace:
Above error was actually occurred when segment got failed before replacing hardware and then running gprecoverseg. Additionally it could be easily reproducible with the following step.

1) Shutdown forcibly one of segment hosts configured with mirror instances running by shutdown -h now.
2) Wait for about 10 ~ 20 minutes
3) Power on a segment host which was off
4) Run gprecoverseg ( incremental ) and gprecoverseg -r ( rebalance )
5) Check $MASTER_DATA_DIRECTORY/pg_log and $GPCC_HOME/logs/webserver*.log and monitor GPCC Web UI

What's the cause and how could it be solved?

Environment

Product Version: 6.23
OS: RHEL or CentOS 7

Resolution

This issue has been resolved in GPDB 6.25.0 and the cause has been known due to segment incorrectly stared in utility mode as a bug described at the release note[1].

Cluster Management
N/A
Resolves an issue where, after recovery, Greenplum segments were incorrectly started in utility mode. The segments are now started in the correct mode (execute).

It could be resolved by just restarting database ( gpstop -r ) as workaround and for permanent fix Greenplum Database should be upgraded to currently dowonloadialbe 6.25.1 at least or latest version which known issues has been fixed so far for.

[ VMware Tanzu Docs ]
[1] https://docs.vmware.com/en/VMware-Greenplum/6/greenplum-database/relnotes-release-notes.html