Pivotal documentation explains why having a standby master is important. This article explains how easy it is to initialize a standby master and how to perform a failover or failback test.
Follow the instructions below to initialize a standby master and how to perform a failover or failback test.
1. Verify the gp_segment_configuration before adding a standby master.
From the output, observe that the current HDB master is hdm1.hadoop.local. You know DBID 1 is the primary master because the master instances always have a content id of "-1" and the role is set to "p" (primary). In this output, there is not reference to a standby master. This means you must initialize a standby master.
[gpadmin@hdm1 ~]$ psql psql (8.2.15) Type "help" for help. gpadmin=# select * from pg_catalog.gp_segment_configuration ; dbid | content | role | preferred_role | mode | status | port | hostname | address | replication_port | san_mounts ------+---------+------+----------------+------+--------+-------+-------------------+-------------------+------------------+------------ 1 | -1 | p | p | s | u | 5432 | hdm1.hadoop.local | hdm1.hadoop.local | | 2 | 0 | p | p | s | u | 40000 | hdw1.hadoop.local | hdw1.hadoop.local | | 4 | 2 | p | p | s | u | 40000 | hdw2.hadoop.local | hdw2.hadoop.local | | 6 | 4 | p | p | s | u | 40000 | hdw3.hadoop.local | hdw3.hadoop.local | | 3 | 1 | p | p | s | u | 40001 | hdw1.hadoop.local | hdw1.hadoop.local | | 5 | 3 | p | p | s | u | 40001 | hdw2.hadoop.local | hdw2.hadoop.local | | 7 | 5 | p | p | s | u | 40001 | hdw3.hadoop.local | hdw3.hadoop.local | | (7 rows)
2. Start by initializing or adding a HDB standby master:
[gpadmin@hdm1 ~]$ gpinitstandby -s hdm2.hadoop.local The filespace locations on the master must be mapped to locations on the standby. These locations must be empty on the standby master host. The default provided is the location of the filespace on the master. In most cases the defaults can be used. The exception is the pg_system filespace which must be in the same location on both the master and standby master. 20140704:00:07:11:705040 gpinitstandby:hdm1:gpadmin-[INFO]:-Validating environment and parameters for standby initialization... 20140704:00:07:12:705040 gpinitstandby:hdm1:gpadmin-[INFO]:-Checking for filespace directory /data1/master/gpseg-1 on hdm2.hadoop.local 20140704:00:07:14:705040 gpinitstandby:hdm1:gpadmin-[INFO]:-Checking for filespace directory /data1/master/dfs/gpseg-1 on hdm2.hadoop.local 20140704:00:07:14:705040 gpinitstandby:hdm1:gpadmin-[INFO]:------------------------------------------------------ 20140704:00:07:14:705040 gpinitstandby:hdm1:gpadmin-[INFO]:-Greenplum standby master initialization parameters 20140704:00:07:14:705040 gpinitstandby:hdm1:gpadmin-[INFO]:------------------------------------------------------ 20140704:00:07:14:705040 gpinitstandby:hdm1:gpadmin-[INFO]:-Greenplum master hostname = hdm1.hadoop.local 20140704:00:07:14:705040 gpinitstandby:hdm1:gpadmin-[INFO]:-Greenplum master data directory = /data1/master/gpseg-1 20140704:00:07:14:705040 gpinitstandby:hdm1:gpadmin-[INFO]:-Greenplum master port = 5432 20140704:00:07:14:705040 gpinitstandby:hdm1:gpadmin-[INFO]:-Greenplum standby master hostname = hdm2.hadoop.local 20140704:00:07:14:705040 gpinitstandby:hdm1:gpadmin-[INFO]:-Greenplum standby master port = 5432 20140704:00:07:14:705040 gpinitstandby:hdm1:gpadmin-[INFO]:-Greenplum standby master data directory = /data1/master/gpseg-1 20140704:00:07:14:705040 gpinitstandby:hdm1:gpadmin-[INFO]:-Greenplum update system catalog = On 20140704:00:07:14:705040 gpinitstandby:hdm1:gpadmin-[INFO]:-Greenplum stop database mode = smart 20140704:00:07:14:705040 gpinitstandby:hdm1:gpadmin-[INFO]:------------------------------------------------------ 20140704:00:07:14:705040 gpinitstandby:hdm1:gpadmin-[INFO]:- Filespace locations 20140704:00:07:14:705040 gpinitstandby:hdm1:gpadmin-[INFO]:------------------------------------------------------ 20140704:00:07:14:705040 gpinitstandby:hdm1:gpadmin-[INFO]:-pg_system -> /data1/master/gpseg-1 20140704:00:07:14:705040 gpinitstandby:hdm1:gpadmin-[INFO]:-dfs_system -> /data1/master/dfs/gpseg-1 Do you want to continue with standby master initialization? Yy|Nn (default=N): > y 20140704:00:07:34:705040 gpinitstandby:hdm1:gpadmin-[INFO]:-Stopping database... 20140704:00:07:45:705040 gpinitstandby:hdm1:gpadmin-[INFO]:-Updating pg_hba.conf file... 20140704:00:07:46:705040 gpinitstandby:hdm1:gpadmin-[INFO]:-Starting master in utility mode... 20140704:00:07:52:705040 gpinitstandby:hdm1:gpadmin-[INFO]:-Adding standby master to catalog... 20140704:00:07:52:705040 gpinitstandby:hdm1:gpadmin-[INFO]:-Database catalog updated successfully. 20140704:00:07:52:705040 gpinitstandby:hdm1:gpadmin-[INFO]:-Stopping master... 20140704:00:07:56:705040 gpinitstandby:hdm1:gpadmin-[INFO]:-Forcing changed blocks to disk for filespace /data1/master/dfs/gpseg-1... 20140704:00:07:57:705040 gpinitstandby:hdm1:gpadmin-[INFO]:-Filespace directory does not exist on hdm2.hadoop.local 20140704:00:07:57:705040 gpinitstandby:hdm1:gpadmin-[INFO]:-Creating hdm2.hadoop.local:/data1/master/dfs/gpseg-1 20140704:00:07:58:705040 gpinitstandby:hdm1:gpadmin-[INFO]:-Copying filespace directory to hdm2.hadoop.local 20140704:00:07:58:705040 gpinitstandby:hdm1:gpadmin-[INFO]:-Forcing changed blocks to disk for filespace /data1/master/gpseg-1... 20140704:00:07:59:705040 gpinitstandby:hdm1:gpadmin-[INFO]:-Filespace directory does not exist on hdm2.hadoop.local 20140704:00:07:59:705040 gpinitstandby:hdm1:gpadmin-[INFO]:-Creating hdm2.hadoop.local:/data1/master/gpseg-1 20140704:00:08:00:705040 gpinitstandby:hdm1:gpadmin-[INFO]:-Copying filespace directory to hdm2.hadoop.local 20140704:00:08:06:705040 gpinitstandby:hdm1:gpadmin-[INFO]:-Updating filespace flat files 20140704:00:08:06:705040 gpinitstandby:hdm1:gpadmin-[INFO]:-Updating filespace flat files 20140704:00:08:06:705040 gpinitstandby:hdm1:gpadmin-[INFO]:-Try to create temporary directories 20140704:00:08:07:705040 gpinitstandby:hdm1:gpadmin-[INFO]:-Removing pg_hba.conf backup... 20140704:00:08:08:705040 gpinitstandby:hdm1:gpadmin-[INFO]:-Starting database in production mode... 20140704:00:08:39:705040 gpinitstandby:hdm1:gpadmin-[INFO]:-Successfully created standby master on hdm2.hadoop.local
3. Verify if the standby master has been added successfully.
DBID 1 on host hdm1.hadoop.local is the current active master instance and DBID 8 on host hdm2.hadoop.local is the current standby instance. The DBID 8 role is set to "m" (mirror).
[gpadmin@hdm1 ~]$ psql psql (8.2.15) Type "help" for help. gpadmin=# select * from pg_catalog.gp_segment_configuration ; dbid | content | role | preferred_role | mode | status | port | hostname | address | replication_port | san_mounts ------+---------+------+----------------+------+--------+-------+-------------------+-------------------+------------------+------------ 1 | -1 | p | p | s | u | 5432 | hdm1.hadoop.local | hdm1.hadoop.local | | 2 | 0 | p | p | s | u | 40000 | hdw1.hadoop.local | hdw1.hadoop.local | | 4 | 2 | p | p | s | u | 40000 | hdw2.hadoop.local | hdw2.hadoop.local | | 6 | 4 | p | p | s | u | 40000 | hdw3.hadoop.local | hdw3.hadoop.local | | 3 | 1 | p | p | s | u | 40001 | hdw1.hadoop.local | hdw1.hadoop.local | | 5 | 3 | p | p | s | u | 40001 | hdw2.hadoop.local | hdw2.hadoop.local | | 7 | 5 | p | p | s | u | 40001 | hdw3.hadoop.local | hdw3.hadoop.local | | 8 | -1 | m | m | s | u | 5432 | hdm2.hadoop.local | hdm2.hadoop.local | | (8 rows) gpadmin=# select * from pg_catalog.gp_master_mirroring ; summary_state | detail_state | log_time | error_message ---------------+--------------+------------------------+--------------- Synchronized | | 2014-07-04 00:08:38-07 | (1 row)
4. Test failover and make the current active master failover to standby. Stop the master instance to simulate a failure at master side.
[gpadmin@hdm1 ~]$ gpstop -m 20140704:00:11:53:705417 gpstop:hdm1:gpadmin-[INFO]:-Starting gpstop with args: -m 20140704:00:11:53:705417 gpstop:hdm1:gpadmin-[INFO]:-Gathering information and validating the environment... 20140704:00:11:54:705417 gpstop:hdm1:gpadmin-[INFO]:-Obtaining Greenplum Master catalog information 20140704:00:11:54:705417 gpstop:hdm1:gpadmin-[INFO]:-Obtaining Segment details from master... 20140704:00:11:55:705417 gpstop:hdm1:gpadmin-[INFO]:-Greenplum Version: 'postgres (HAWQ) 4.2.0 build 1' 20140704:00:11:55:705417 gpstop:hdm1:gpadmin-[INFO]:-There are 0 connections to the database 20140704:00:11:55:705417 gpstop:hdm1:gpadmin-[INFO]:-Commencing Master instance shutdown with mode='smart' 20140704:00:11:55:705417 gpstop:hdm1:gpadmin-[INFO]:-Master host=hdm1.hadoop.local 20140704:00:11:55:705417 gpstop:hdm1:gpadmin-[INFO]:-Commencing Master instance shutdown with mode=smart 20140704:00:11:55:705417 gpstop:hdm1:gpadmin-[INFO]:-Master segment instance directory=/data1/master/gpseg-1 [gpadmin@hdm1 ~]$ gpstate -f 20140704:00:10:26:705333 gpstate:hdm1:gpadmin-[INFO]:-Starting gpstate with args: -f 20140704:00:10:27:705333 gpstate:hdm1:gpadmin-[INFO]:-local HAWQ Version: 'postgres (HAWQ) 4.2.0 build 1' 20140704:00:10:28:705333 gpstate:hdm1:gpadmin-[INFO]:-master HAWQ Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.2.0 build 1) (HAWQ 1.2.0.1 build 8119) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Apr 23 2014 16:12:32' 20140704:00:10:28:705333 gpstate:hdm1:gpadmin-[INFO]:-Obtaining Segment details from master... 20140704:00:10:31:705333 gpstate:hdm1:gpadmin-[INFO]:-Standby master details 20140704:00:10:31:705333 gpstate:hdm1:gpadmin-[INFO]:----------------------- 20140704:00:10:31:705333 gpstate:hdm1:gpadmin-[INFO]:- Standby address = hdm2.hadoop.local 20140704:00:10:31:705333 gpstate:hdm1:gpadmin-[INFO]:- Standby data directory = /data1/master/gpseg-1 20140704:00:10:31:705333 gpstate:hdm1:gpadmin-[INFO]:- Standby port = 5432 20140704:00:10:31:705333 gpstate:hdm1:gpadmin-[INFO]:- Standby PID = 106500 20140704:00:10:31:705333 gpstate:hdm1:gpadmin-[INFO]:- Standby status = Standby host passive 20140704:00:10:31:705333 gpstate:hdm1:gpadmin-[INFO]:-------------------------------------------------------------- 20140704:00:10:31:705333 gpstate:hdm1:gpadmin-[INFO]:--gp_master_mirroring table 20140704:00:10:31:705333 gpstate:hdm1:gpadmin-[INFO]:-------------------------------------------------------------- 20140704:00:10:31:705333 gpstate:hdm1:gpadmin-[INFO]:--Summary state: Synchronized 20140704:00:10:31:705333 gpstate:hdm1:gpadmin-[INFO]:--Detail state: 20140704:00:10:31:705333 gpstate:hdm1:gpadmin-[INFO]:--Log time: 2014-07-04 00:08:38-07 20140704:00:10:31:705333 gpstate:hdm1:gpadmin-[INFO]:--------------------------------------------------------------
Activate the standby master (Using the flag -c <hostname> while activating will initialize a new standby HDB master).
[gpadmin@hdm2 ~]$ source /usr/local/hawq/greenplum_path.sh [gpadmin@hdm2 ~]$ export MASTER_DATA_DIRECTORY=/data1/master/gpseg-1 [gpadmin@hdm2 ~]$ gpactivatestandby -d /data1/master/gpseg-1 220140704:00:13:40:106592 gpactivatestandby:hdm2.hadoop.local:gpadmin-[INFO]:-Examining log file /data1/master/gpseg-1/pg_log/gpdb-2014-07-04_000823.csv for warnings and errors... 20140704:00:13:41:106592 gpactivatestandby:hdm2.hadoop.local:gpadmin-[INFO]:-list index out of range 20140704:00:13:41:106592 gpactivatestandby:hdm2.hadoop.local:gpadmin-[INFO]:- 20140704:00:13:43:106592 gpactivatestandby:hdm2.hadoop.local:gpadmin-[INFO]:------------------------------------------------------ 20140704:00:13:43:106592 gpactivatestandby:hdm2.hadoop.local:gpadmin-[INFO]:-Master data directory = /data1/master/gpseg-1 20140704:00:13:43:106592 gpactivatestandby:hdm2.hadoop.local:gpadmin-[INFO]:-gpsyncmaster running = yes 20140704:00:13:43:106592 gpactivatestandby:hdm2.hadoop.local:gpadmin-[INFO]:-Last log entry time = 2014-07-04 00:11:56.124739 PDT 20140704:00:13:43:106592 gpactivatestandby:hdm2.hadoop.local:gpadmin-[INFO]:- 0:01:46.504265 ago 20140704:00:13:43:106592 gpactivatestandby:hdm2.hadoop.local:gpadmin-[INFO]:-Create new standby master = no 20140704:00:13:43:106592 gpactivatestandby:hdm2.hadoop.local:gpadmin-[INFO]:-Force standby activation = no 20140704:00:13:43:106592 gpactivatestandby:hdm2.hadoop.local:gpadmin-[INFO]:------------------------------------------------------ Do you want to continue with standby master activation? Yy|Nn (default=N): > y 20140704:00:13:45:106592 gpactivatestandby:hdm2.hadoop.local:gpadmin-[INFO]:-Stopping gpsync process... 20140704:00:13:48:106592 gpactivatestandby:hdm2.hadoop.local:gpadmin-[INFO]:-Successfully shutdown sync process 20140704:00:13:55:106592 gpactivatestandby:hdm2.hadoop.local:gpadmin-[INFO]:-Starting standby master database in utility mode... 20140704:00:14:00:106592 gpactivatestandby:hdm2.hadoop.local:gpadmin-[INFO]:-Reading current configuration... 20140704:00:14:08:106592 gpactivatestandby:hdm2.hadoop.local:gpadmin-[INFO]:-Updating catalog... 20140704:00:14:08:106592 gpactivatestandby:hdm2.hadoop.local:gpadmin-[INFO]:-Database catalog updated successful 20140704:00:14:08:106592 gpactivatestandby:hdm2.hadoop.local:gpadmin-[INFO]:-/data1/master/gpseg-1/gp_dbid - write_gp_dbid 20140704:00:14:08:106592 gpactivatestandby:hdm2.hadoop.local:gpadmin-[INFO]:-found existing file 20140704:00:14:08:106592 gpactivatestandby:hdm2.hadoop.local:gpadmin-[INFO]:-removed existing file 20140704:00:14:08:106592 gpactivatestandby:hdm2.hadoop.local:gpadmin-[INFO]:-opening new file 20140704:00:14:08:106592 gpactivatestandby:hdm2.hadoop.local:gpadmin-[INFO]:-wrote dbid: 1 20140704:00:14:08:106592 gpactivatestandby:hdm2.hadoop.local:gpadmin-[INFO]:-setting read only 20140704:00:14:08:106592 gpactivatestandby:hdm2.hadoop.local:gpadmin-[INFO]:-verifying file 20140704:00:14:08:106592 gpactivatestandby:hdm2.hadoop.local:gpadmin-[INFO]:-Updating filespace flat files 20140704:00:14:08:106592 gpactivatestandby:hdm2.hadoop.local:gpadmin-[INFO]:-Updating filespace flat files 20140704:00:14:08:106592 gpactivatestandby:hdm2.hadoop.local:gpadmin-[INFO]:-Stopping database... 20140704:00:14:21:106592 gpactivatestandby:hdm2.hadoop.local:gpadmin-[INFO]:-Starting database in production mode... 20140704:00:14:39:106592 gpactivatestandby:hdm2.hadoop.local:gpadmin-[INFO]:------------------------------------------------------ 20140704:00:14:39:106592 gpactivatestandby:hdm2.hadoop.local:gpadmin-[INFO]:-The activation of the standby master has completed successfully. 20140704:00:14:39:106592 gpactivatestandby:hdm2.hadoop.local:gpadmin-[INFO]:-hdm2.hadoop.local is now the new primary master. 20140704:00:14:39:106592 gpactivatestandby:hdm2.hadoop.local:gpadmin-[INFO]:-You will need to update your user access mechanism to reflect 20140704:00:14:39:106592 gpactivatestandby:hdm2.hadoop.local:gpadmin-[INFO]:-the change of master hostname. 20140704:00:14:39:106592 gpactivatestandby:hdm2.hadoop.local:gpadmin-[INFO]:-Do not re-start the failed master while the fail-over master is 20140704:00:14:39:106592 gpactivatestandby:hdm2.hadoop.local:gpadmin-[INFO]:-operational, this could result in database corruption! 20140704:00:14:39:106592 gpactivatestandby:hdm2.hadoop.local:gpadmin-[INFO]:-MASTER_DATA_DIRECTORY is now /data1/master/gpseg-1 if 20140704:00:14:39:106592 gpactivatestandby:hdm2.hadoop.local:gpadmin-[INFO]:-this has changed as a result of the standby master activation, remember 20140704:00:14:39:106592 gpactivatestandby:hdm2.hadoop.local:gpadmin-[INFO]:-to change this in any startup scripts etc, that may be configured 20140704:00:14:39:106592 gpactivatestandby:hdm2.hadoop.local:gpadmin-[INFO]:-to set this value. 20140704:00:14:39:106592 gpactivatestandby:hdm2.hadoop.local:gpadmin-[INFO]:-New standby master not initialized! 20140704:00:14:39:106592 gpactivatestandby:hdm2.hadoop.local:gpadmin-[INFO]:-MASTER_PORT is now 5432, if this has changed, you 20140704:00:14:39:106592 gpactivatestandby:hdm2.hadoop.local:gpadmin-[INFO]:-may need to make additional configuration changes to allow access 20140704:00:14:39:106592 gpactivatestandby:hdm2.hadoop.local:gpadmin-[INFO]:-to the Greenplum instance. 20140704:00:14:39:106592 gpactivatestandby:hdm2.hadoop.local:gpadmin-[INFO]:-Refer to the Administrator Guide for instructions on how to re-activate 20140704:00:14:39:106592 gpactivatestandby:hdm2.hadoop.local:gpadmin-[INFO]:-the master to its previous state once it becomes available. 20140704:00:14:39:106592 gpactivatestandby:hdm2.hadoop.local:gpadmin-[INFO]:-Query planner statistics must be updated on all databases 20140704:00:14:39:106592 gpactivatestandby:hdm2.hadoop.local:gpadmin-[INFO]:-following standby master activation. 20140704:00:14:39:106592 gpactivatestandby:hdm2.hadoop.local:gpadmin-[INFO]:-When convenient, run ANALYZE against all user databases. 20140704:00:14:39:106592 gpactivatestandby:hdm2.hadoop.local:gpadmin-[INFO]:------------------------------------------------------
Verify if the standby master is now acting as the active master.
From the output, observe that DBID 1 on host hdm2.hadoop.local is the current master instance because the content id is "-1" and role is set to "p".
[gpadmin@hdm2 ~]$ psql psql (8.2.15)
Note: Type "help" for help.
gpadmin=# select * from pg_catalog.gp_segment_configuration ; dbid | content | role | preferred_role | mode | status | port | hostname | address | replication_port | san_mounts ------+---------+------+----------------+------+--------+-------+-------------------+-------------------+------------------+------------ 2 | 0 | p | p | s | u | 40000 | hdw1.hadoop.local | hdw1.hadoop.local | | 4 | 2 | p | p | s | u | 40000 | hdw2.hadoop.local | hdw2.hadoop.local | | 6 | 4 | p | p | s | u | 40000 | hdw3.hadoop.local | hdw3.hadoop.local | | 3 | 1 | p | p | s | u | 40001 | hdw1.hadoop.local | hdw1.hadoop.local | | 5 | 3 | p | p | s | u | 40001 | hdw2.hadoop.local | hdw2.hadoop.local | | 7 | 5 | p | p | s | u | 40001 | hdw3.hadoop.local | hdw3.hadoop.local | | 1 | -1 | p | p | s | u | 5432 | hdm2.hadoop.local | hdm2.hadoop.local | | (7 rows)
5. Test failback.
Similar to failover, ensure that you remove or rename directories in $MASTER_DATA_DIRECTORY (/data1/master/gpseg-1 & /data1/master/dfs/gpseg-1 in this example) so that gpactivatestandby does not fail due to existing directories.
6. Add back a standby master.
Take action in step 5 on new standby master first, then run "gpinitstandby -s <hostname>" again to add back a standby master.
NOTE: if there is already a standby master but it is in a state of "out-of-synchronizaiton", "gpinitstandby -n" could be used to resynchronize standby master with master.