And you have symptoms like this;
The Analysis page on the collector is hanging (update circle running) and showing error;
"Could not open database connection. Error in SqlCommand at Execute:"
On the SA Admin page the collector status is stopped and it cannot be started (action failed).
On the MTP Admin page under <Maintenance> - <Processes> You can see that all processes are stopped. When trying to start a message appears;
"Request to maintenance daemon failed. Make sure nqmaintd is running".
Try starting the /etc/init.d/nqmaintd and then see if can start all processes in the web GUI.
But then the database is not starting.
In <Administration> - <Maintenance> - <Database Status> there is an error
"Database usage not available because the database is down" and the DB "Metrics" has Status "UNKNOWN".
After a reboot the DB status is still in a state of "INITIALIZING".
Sometimes it is necessary to drop and recreate the DB.
If you have any doubts at all about the state of your MTP database please raise a case with CA support. However, if you feel your status is as above, proceed.
You may have to drop the database manually and ensure ALL Vertica processes are stopped before starting up a new installation
When it calls for manually killing the processes use the command kill -9 <pid>
The following section gives additional troubleshooting information if the steps above to recreate the vertica database are not successful.
- Even though Vertica runs on localhost, we have sometimes seen problems when there are oddities/inconsistencies with DNS resolution of the MTP host name. The best way we've found to resolve this is to add an entry in the /etc/hosts file to make sure the MTP host name is properly resolved.
- On the MTP, add an entry to the /etc/hosts file for the MTP's Ip address and hostname.
Note: you must have root privileges to edit this file; when logged in using the netqos account, you will need to prefix the edit command with sudo: sudo vi /etc/hosts
An example of the entry that gives the IP followed by the fully qualified domain name followed by the short name would be: 00.00.0.00 mtp1.netqos.local mtp1
- Confirm that the MTP's IP address and hostname can now be resolved consistently by performing the following commands:
nslookup <ip address>
ping -a <hostname>
- Stop the processes that directly access the Vertica database.
sudo /sbin/service nqwatchdog stop
sudo /sbin/service nqinspectoragentd stop
- Manually drop the database (including making sure that the
/nqxfs/vertica/capture folder is removed).
su - dbadmin -c "/opt/vertica/bin/adminTools -t drop_db -d capture"
(when prompted for password, enter 'dbadmin'). Note syntax of command is different. It is 'su dash'
sudo rm -r /nqxfs/vertica/capture
- Confirm that there is no Vertica database process running. To find whether a Vertica process is
running, use the following command:
ps -ef | grep vertica
If the Vertica process is running, it will display a line similar to the following:
dbadmin 9047 1 2 Jul02 ? 04:25:30 /opt/vertica/bin/vertica -C capture -D /nqxfs/vertica/capture/v_capture_node0001_catalog -h 127.0.0.1 -p 5433
If there is a process running, kill it using the Linux kill <pid> command
(e.g. kill 9047 would kill the above process).
Note: If there was a Vertica process running that you had to kill, repeat
Step 3 to ensure that the database has been dropped.
- Restart the Vertica spread daemon.
sudo /sbin/service spreadd stop
sudo /sbin/service spreadd start
- Recreate the database.
sudo /opt/NetQoS/install/setupVertica.sh --new
- Restart nqinspectoragentd and nqwatchdog
sudo /sbin/service nqinspectoragentd start
sudo /sbin/service nqwatchdog start
The steps above should work successfully to drop and recreate the vertica database.
If in doubt about the status of your MTP, please raise a support case before considering the recreate steps.