DESCRIPTION:
Unable to reinstall the Data Repository
CONFIGURATION:
2.5
3 node Vertica cluster
DETAILS:
*The Data Repository and Vertica were previously installed. They were uninstalled and now are being reinstalled again.
The installation of the Data Repository/Vertica fails when trying to create the database. Below are the errors you will see in the install log:
Starting Vertica on all nodes. Please wait, databases with large catalogs may take a while to initialize.
Node Status: v_drdata_node0001: (DOWN)
Node Status: v_drdata_node0001: (DOWN)
Node Status: v_drdata_node0001: (DOWN)
Node Status: v_drdata_node0001: (DOWN)
Node Status: v_drdata_node0001: (DOWN)
Node Status: v_drdata_node0001: (DOWN)
Node Status: v_drdata_node0001: (DOWN)
Node Status: v_drdata_node0001: (DOWN)
Node Status: v_drdata_node0001: (DOWN)
Node Status: v_drdata_node0001: (DOWN)
ERROR: Database did not start cleanly on initiator node!
Stopping all nodes
Error: Database did not start cleanly on initiator node! Stopping all nodes
CAUSE:
A temporary file named /tmp/4803 laid down by Spread and owned by the dradmin user was left behind from the previous install.
1. On each of the 3 Vertica nodes, verify that Spread is running: ps -eaf | grep spread
2. On each of the 3 Vertica nodes, verify password-less SSH for the root and dradmin users between the three nodes:
example: ssh dradmin@localhost pwd
This would be to verify that dradmin can ssh into node1 from node1 itself. Repeat the test on all nodes, for both root and the dradmin user.
3. Check for a file named /tmp/4803 on all three nodes. When Vertica was uninstalled / shut down, it may not have cleaned out that file. The install may have tried (and failed) to then create the database as a single node.
*If that file was indeed present on one or more nodes, after removing it, go back to node1 and go into the admintools UI. There may not be a database listed, but select the option to drop the database. If it does list one and allows the option to drop it - drop it and exit admintools.
4. Run the install script on node1 again. As the temp file owned by Spread will no longer be present, the install should be able to start Spread and continue through the database creation.
ADDITIONAL INFORMATION:
In case PAM (Pluggable Authentication Module) is being used on the network, you may find that you need to have the dradmin user removed from the PAM authentication server so the id was only local. This will allowed the /tmp/4803 socket file to be created with the correct user and permissions. Otherwise, you may find that the /tmp/4803 file is owned by a user with a numerical value (this value is the PAM account number for the dradmin).