Reinstall of the Data Repository fails
search cancel

Reinstall of the Data Repository fails

book

Article ID: 32367

calendar_today

Updated On:

Products

CA Infrastructure Management CA Performance Management - Usage and Administration

Issue/Introduction

DESCRIPTION:

Unable to reinstall the Data Repository

 

CONFIGURATION:

2.5

3 node Vertica cluster

 

DETAILS:

*The Data Repository and Vertica were previously installed. They were uninstalled and now are being reinstalled again.

The installation of the Data Repository/Vertica fails when trying to create the database. Below are the errors you will see in the install log:


Starting Vertica on all nodes. Please wait, databases with large catalogs may take a while to initialize.

 
Node Status: v_drdata_node0001: (DOWN)

Node Status: v_drdata_node0001: (DOWN)

Node Status: v_drdata_node0001: (DOWN)

Node Status: v_drdata_node0001: (DOWN)

Node Status: v_drdata_node0001: (DOWN)

Node Status: v_drdata_node0001: (DOWN)

Node Status: v_drdata_node0001: (DOWN)

Node Status: v_drdata_node0001: (DOWN)

Node Status: v_drdata_node0001: (DOWN)

Node Status: v_drdata_node0001: (DOWN)

ERROR: Database did not start cleanly on initiator node!

Stopping all nodes

Error: Database did not start cleanly on initiator node! Stopping all nodes

 

CAUSE:

A temporary file named /tmp/4803 laid down by Spread and owned by the dradmin user was left behind from the previous install.



Environment

Release: IMDAGG99000-2.5-Infrastructure Management-Data Aggregator
Component:

Resolution

1. On each of the 3 Vertica nodes, verify that Spread is running: ps -eaf | grep spread

2. On each of the 3 Vertica nodes, verify password-less SSH for the root and dradmin users between the three nodes:

 example: ssh dradmin@localhost pwd

This would be to verify that dradmin can ssh into node1 from node1 itself. Repeat the test on all nodes, for both root and the dradmin user.

3. Check for a file named /tmp/4803 on all three nodes. When Vertica was uninstalled / shut down, it may not have cleaned out that file. The install may have tried (and failed) to then create the database as a single node.

*If that file was indeed present on one or more nodes, after removing it, go back to node1 and go into the admintools UI. There may not be a database listed, but select the option to drop the database. If it does list one and allows the option to drop it - drop it and exit admintools.

4. Run the install script on node1 again. As the temp file owned by Spread will no longer be present, the install should be able to start Spread and continue through the database creation.

 

ADDITIONAL INFORMATION:

In case PAM (Pluggable Authentication Module) is being used on the network, you may find that you need to have the dradmin user removed from the PAM authentication server so the id was only local. This will allowed the /tmp/4803 socket file to be created with the correct user and permissions. Otherwise, you may find that the /tmp/4803 file is owned by a user with a numerical value (this value is the PAM account number for the dradmin).