Data Repository (DR) will not start properly after one of the nodes went down in CA Performance Management (CAPM)
search cancel

Data Repository (DR) will not start properly after one of the nodes went down in CA Performance Management (CAPM)

book

Article ID: 213757

calendar_today

Updated On:

Products

CA Performance Management - Usage and Administration

Issue/Introduction

After an ungraceful shutdown of the DR, it will not restart with the following error or similar being shown;

** Starting database: drdata ***

        Starting nodes: 

                v_drdata_node0001 (xxx.xxx.xxx.111)

                v_drdata_node0002 (xxx.xxx.xxx.222)

                v_drdata_node0003 (xxx.xxx.xxx.333)

Error: the vertica process for the database is running on the following hosts:

xxx.xxx.xxx.222

This may be because the process has not completed previous shutdown activities. Please wait and retry again.

Database start up failed.  Processes still running. 

Trying to stop the Vertica DB via adminTools responds with DB not running. This indicates a process has failed to start/stop gracefully and is running out of sync with the rest of the Vertica process stack.

Environment

DX NetOps : CAPM 3.7.x and later

Cause

Looking on DR node 2 (v_drdata_node0002) shown in the example above, the spread process is still running (no other nodes are running this):

dradmin   17610      1  0 Mar23 ?        00:21:51 /opt/vertica/spread/sbin/spread -c /mnt/ext4path/catalog/drdata/v_drdata_node0001_catalog/spread.conf -D /opt/vertica/spread/tmp

If all nodes are rebooted ungracefully (for example, a sudden shutdown due to power outage) and then restarted, then it is possible that the spread process has not started correctly and so doesn't allow the system to shutdown and restart correctly.

Resolution

Stop the errant vertica node (in the above example - v_drdata_node0002) via

/opt/vertica/bin/adminTools -> Advanced Menu -> Stop Vertica on Host

This will show the following:

This is a standard warning that is shown when trying to stop a host. The data that will be lost, will be that which is currently being processed (i.e it will not affect data already written to the DB). However, it is unlikely that there is any valid data being processed in this scenario since the DB is not functioning properly to accept incoming data from the Data Aggregator (DA).

Once Vertica stops on the host, you can then restart the DB: