Vertica Database Does Not Start After Server Reboot
search cancel

Vertica Database Does Not Start After Server Reboot

book

Article ID: 191613

calendar_today

Updated On:

Products

CA Infrastructure Management CA Performance Management - Usage and Administration DX NetOps

Issue/Introduction

Vertica database was shutdown using the adminTools.  On a server restart the Vertica does not start.  We can see from the adminTools.log: 

Then 14:15 the node does not start:2020-03-28 14:14:30.262 admintools/2307:0x7f28252d0700 [adminExec.getRestartPolicy] <INFO> found restartpolicy dict
2020-03-28 14:14:30.262 admintools/2307:0x7f28252d0700 [commandLineCtrl.commandHost] <INFO> executing start for DB capm (policy: ksafe); host 122.123.123.123  node v_drdata_node0001
2020-03-28 14:14:30.262 admintools/2307:0x7f28252d0700 [commandLineCtrl.commandHost] <INFO> spawn: /opt/vertica/bin/vertica ['/opt/vertica/bin/vertica', '--status', '-D', '/data/verticaDB/catalog/capm/v_drdata_node0001_catalog']
2020-03-28 14:14:33.211 admintools/2307:0x7f28252d0700 [commandLineCtrl.commandHost] <WARNING> hostdown: 1 after 1 tries, return code 0
2020-03-28 14:14:33.212 admintools/2307:0x7f28252d0700 [commandLineCtrl.commandHost] <WARNING> ksafe but DB not up, skipping
2020-03-28 14:14:33.212 admintools/2307:0x7f28252d0700 [commandLineCtrl.commandHost] <INFO> should have started a DB, but didn't
2020-03-28 14:14:33.212 admintools/2307:0x7f28252d0700 [commandLineCtrl.commandHost] <INFO> overall status: 0 

Environment

All supported releases

Cause

The cause was that K-Safe was incorrectly set.  As seen in the admintools.conf file:

[Database:capm]
restartpolicy = 1


Restart Policy "Always" not taking effect on single-node cluster and this was fixed by creating an SSH key even in single node installations.
https://www.vertica.com/docs/10.1.x/HTML/Content/Authoring/InstallationGuide/TroubleshootingTheInstall/EnableSecureShellSSHLogins.htm
https://www.vertica.com/docs/10.1.x/HTML/Content/Authoring/InstallationGuide/MCClusterInstall/CreateAPrivateKeyFile.htm

Also, if your database stops abruptly when the Tuple Mover process is still running, the DB will not shut down cleanly.

Resolution

According to the restart policy Vertica documentation, setting the restart policy requires one to use “K-Safe” for a multi-node cluster and to use “Always” for a single node environment.

https://www.vertica.com/docs/10.1.x/HTML/Content/Authoring/AdministratorsGuide/AdminTools/SettingTheRestartPolicy.htm

Additional Information

The Restart Policy enables you to determine whether or not nodes in a K-Safe database are automatically restarted when they are rebooted. Since this feature does not automatically restart nodes if the entire database is DOWN, it is not useful for databases that are not K-Safe.

Never — Nodes are never restarted automatically.
K-Safe — Nodes are automatically restarted if the database cluster is still UP. This is the default setting. /* option for production k-safety enabled (KSAFE(1) or KSAFE(2)) */
Always — Node on a single node database is restarted automatically. /* option for test and development environments (one node clusters KSAFE(0)) */

Note: Always does not work if a single node database was not shutdown cleanly or crashed.

Best Practice
1. Vertica production Cluster requires that all production databases have a minimum K-safety of one (K=1). Valid K-safety values for production databases are in 1 or 2. Vertica spread logic of the K-Safety:
- 1 node (0 k-safety)
- 3 nodes or + (more) nodes (for k-safety 1)
- 5 nodes or + (more) nodes (for k-safety 2)
2. Non-production databases do not have to be K-safe and can be set to 0.

*** Get K-Safety / Disabled (value in 0) and Enabled (varlue 1 or 2)***
=> SELECT GET_DESIGN_KSAFE();
=> SELECT current_fault_tolerance FROM system;

*** K-Safety Disabled (0) ***
SELECT MARK_DESIGN_KSAFE(0);

*** K-Safety Enabled (1 or 2)***
=> SELECT MARK_DESIGN_KSAFE(1);
=> SELECT MARK_DESIGN_KSAFE(2);