The ADA Multi-Port Monitor stores short term application performance metrics in a Vertica database. This database can fail or become corrupt for several reasons and may need to be rebuilt. In the event that a rebuild is necessary, it is best to completely remove Vertica and rebuild it.
Vertica corruption can happen for many reasons, the most prominent being:
1) The Multi-Port Monitor was rebooted, crashed, or halted without the database being in a 'STOP' state.
2) The /nqxfs filesystem may have become corrupt
3) The Multi-Port was upgraded to 10.3 (known issue)
1)Even though Vertica runs on localhost, we have sometimes seen problems when there are oddities/inconsistencies with DNS resolution of the MTP host name. The best way we’ve found to resolve this is to add an entry in the /etc/hosts file to make sure the MTP host name is properly resolved.
a) On the MTP, add an entry to the /etc/hosts file for the MTP’s IP address and hostname. Note: you must have root privileges to edit this file; when logged in using the netqos account, you will need to prefix the edit command with sudo:
sudo vi /etc/hosts
An example of the entry that gives the IP followed by the fully qualified domain name followed by the short name would be:
10.0.0.1 mtp1.example.local mtp1
b) Confirm that the MTP’s IP address and hostname can now be resolved consistently by performing the following commands:
hostname
nslookup <ip address>
nslookup <hostname>
ping –a <hostname>
2) Stop the processes that directly access the Vertica database.
sudo /opt/NetQoS/scripts/stopprocs.sh
3) Confirm that there is no Vertica database processes running. To find whether a Vertica process is running, use the following command:
ps -ef | grep vertica
If the Vertica process is running, it will display a line similar to the following:
dbadmin 9047 1 2 Jul02 ? 04:25:30 /opt/vertica/bin/vertica –C capture –D /nqxfs/vertica/capture/v_capture_node0001_catalog –h 127.0.0.1 –p 5433
If there is a process running, kill it using the Linux kill -9 <pid> command (e.g. kill -9 9047 would kill the above process).
4) Manually drop the database (including making sure that the /nqxfs/vertica/capture folder is removed). Type out the following, do not copy
su - dbadmin -c "/opt/vertica/bin/adminTools -t drop_db -d capture"
(when prompted for password, enter ‘dbadmin’). Note syntax of command is different. It is ‘su dash’
For versions prior to 10.6 run:
sudo rm -r /nqxfs/vertica/capture
For 10.6 and later it should be
sudo rm -r /nqxfs/nq_runtime/vertica/capture
5) Restart the Vertica spreadd daemon. **This step is for 10.3 and below ONLY**
sudo /sbin/service spreadd stop
sudo /sbin/service spreadd start
6) Recreate the database.
sudo /opt/NetQoS/install/setupVertica.sh –new
7) Restart the MTP services
sudo /opt/NetQoS/scripts/startprocs.sh