CA PM Vertica single node down in 3.5 or later
search cancel

CA PM Vertica single node down in 3.5 or later

book

Article ID: 71705

calendar_today

Updated On:

Products

CA Infrastructure Management CA Performance Management - Usage and Administration

Issue/Introduction

After restarting of DR, a Vertica node may remain 'DOWN", even after checking that the Firewall had all relevant ports open and the network connections were good.

Environment

CAPM 3.5
RedHat 7.X

Cause

The following possible problems may cause this:

  1. Error message found in Db.log

    <DATE> <TIME> SP_connect: unable to connect via UNIX socket to /opt/vertica/spread/tmp/4803 (pid=7165): Error: No such file or directory

    For example, the following file was missing in the folder /opt/vertica/spread/tmp srw-rw-rw- 1 dradmin verticadba 0 Jan 25 15:37 4803

  2. Check Vertica ports are not blocked by FireWall:

    • Port 22 (TCP protocol)
    • Port 4803 (TCP and UDP protocol)
    • Port 4804 (UDP protocol)
    • Port 5433 (TCP protocol) Remote access is required to this port.
    • Port 5434 (TCP protocol)
    • Port 6543 (UDP protocol)
  3. Check that Vertica processes are running:

    # ps -ef|grep -i vertica 

Resolution

If the issue is Cause 1 from above, then a COLD REBOOT of the Vertica server may help to resolve this. 

This is due to the ’s’ bit indicating a UNIX/LINUX file type of ‘socket’. A socket file is not a regular file, it's more like an IP address. A socket file is created by the system when a program attempts to bind to a unix domain socket (by calling a TCP socket Bind function).

This type of network socket is one that is internal to one computer and is used primarily for inter-process communications. The system then associates this special file with the socket file descriptor that the program bound, or more specifically, the "inode" to which that file descriptor refers.

An inode is a data structure that describes objects (such as files or directories) in Unix type filesystems. After its creation, the program that created the socket ‘file’ does not interact with the socket via the filename. Instead, it communicates via the inode that is referenced by the files.

So you cannot create it manually and changing it after it has been created by the system will not achieve anything since you would only be changing the set of names that point at that inode. Which means you can connect to the listening program at a new name or manually created file.

The error in the DB.log; <DATE> <TIME> SP_connect: unable to connect via UNIX socket to /opt/vertica/spread/tmp/4803 (pid=7165): Error: No such file or directory, Shows that the problem is that when attempting to start Vertica, the OS does not allow it to create the 4803 socket file since the run-time data for a previous running session has locked this and has not been cleared. The /var/run directory is where run-time variable data is located.

This should be cleared out at each boot of the system. So the quickest way to clear this would be to reboot the OS on which the Vertica node is running.