Monitor Data Repository node processes for node outage
Article ID: 217325


Updated On:


DX NetOps CA Performance Management - Usage and Administration


When the DX NetOps Performance Management (PM) Data Repository database node(s) go down, how can we monitor the Vertica processes?

What Vertica processes on the nodes for a Data Repository database cluster will go down when a node leaves the cluster?

Monitoring Vertica processes for a down node.


All supported DX NetOps Performance Management releases


Need to be alerted when a database node goes down.


The following samples were taken from node0001 in a three node cluster. Default install paths shown.

Process list for a running node.

[root@node0001_HostName ~]# ps -ef | grep vertica
dradmin    7618      1  0 Apr30 ?        00:00:00 /bin/bash /opt/vertica/agent/ /opt/vertica/config/users/dradmin/agent.conf
dradmin    7637   7618  0 Apr30 ?        09:24:20 /opt/vertica/oss/python/bin/python ./
dradmin   37070      1  0 Apr30 ?        02:37:42 /opt/vertica/spread/sbin/spread -c /loddisk/data/drdata/v_drdata_node0001_catalog/spread.conf -D /opt/vertica/spread/tmp
dradmin   37072      1  6 Apr30 ?        2-22:38:35 /opt/vertica/bin/vertica -D /loddisk/data/drdata/v_drdata_node0001_catalog -C drdata -n v_drdata_node0001 -h <node0001_IP_Address> -p 5433 -P 4803 -Y ipv4 -S 10263370
dradmin   37107  37072  0 Apr30 ?        00:19:45 /opt/vertica/bin/vertica-udx-zygote 13 3 37072 debug-log-off /loddisk/data/drdata/v_drdata_node0001_catalog/UDxLogs 60 14 0

Process list for the same node with the database down on the node.

[root@node0001_HostName ~]# ps -ef | grep vertica
dradmin    7618      1  0 Apr30 ?        00:00:00 /bin/bash /opt/vertica/agent/ /opt/vertica/config/users/dradmin/agent.conf
dradmin    7637   7618  0 Apr30 ?        09:24:21 /opt/vertica/oss/python/bin/python ./

The missing spread, vertica, and vertica-udx-zygote services indicate the database is down on the node.

If using process monitoring tools, the missing processes can be used as a trigger to indicate the database is down on that node.

Additional Information

An alternative to monitoring the processes is working with Events available in PM by default. These can be set up to send emails to users when they are raised via Notification Rules in PM.

The following are the currently available system based Data Repository State Events from the Data Aggregator Data Source.

Create Notification "Data Repository State" 

Select Next and Next

In Data Source select:
CA Performance Center
Data Aggregator@...



Select Event Type

Select Next

You can enable the Email or/and Sent Trap


Note that in a single node database being down, by the time the Data Aggregator Event engine recognizes the problem and tries to raise the default Events it may already be shutting down due to loss of the database. That would result in no Events being raised.