gprecoverseg times out "failure: timeout Retrying no 1 failure: OtherTransitionInProgress failure: OtherTransitionInProgress"
search cancel

gprecoverseg times out "failure: timeout Retrying no 1 failure: OtherTransitionInProgress failure: OtherTransitionInProgress"

book

Article ID: 296507

calendar_today

Updated On:

Products

VMware Tanzu Greenplum

Issue/Introduction

In Greenplum versions 4 and 5, gprecoverseg uses the pg_changetrackinglog to determine the gprecoverseg.

In one case, it was because a restore filled up the segment directories and threw the database into recovery mode so the changetracking was large. Also, in that case, the mirrors were the only thing affected. The primaries were intact and never failed. This is not a double fault scenario. 

gprecoverseg will run for an hour even if doing a gprecoverseg -i before it times out,
even if that filename was just pointing to one segment. If after an hour you get an error like, "failure: timeout Retrying no 1 failure: OtherTransitionInProgress failure: OtherTransitionInProgress", that means the recoverseg has timed out. It has ran for an hour and no progress has been made. One way to check that is if gpstate -e shows no change.

The possible root cause can be confirmed on a segment host under the pg_changetracking log. If that log directory size is 5-7 GB in size, then that is the cause of gprecoverseg timing out.

Environment

Product Version: 5.25

Resolution

First, kill any recoverseg processes left over by doing the following:
 pg_ctl -D </directory/ofMirror/segNo> stop -m fast
 
Next, do a gprecoverseg -o to create a file with all the segments that need to be recovered. Open that file, if you use VIM then you can use:
:sort
:%s!^!#!

This will sort each line by each host and comment out each line. After you've done this you can uncomment the first line and as many as 1-10 lines, after which looks like this:
filespaceOrder=<hdd_fs:ssd_fs>
sdw1:5012:/data/mirror/gpseg21
#sdw1:5013:/data/mirror/gpseg22

gprecoverseg -i <filename> -F

This will clear out the changetracking log and rewrite the segment. If the recoverseg goes well and quickly, feel free to remove the # in front of more lines and do a recoverseg in bigger batches. Also, do not forget to remove or comment out the lines of segments recovered.