Excessive SSH Connections from Vertica Node Using pmadmin User in DX NetOps Performance Management
search cancel

Excessive SSH Connections from Vertica Node Using pmadmin User in DX NetOps Performance Management

book

Article ID: 437200

calendar_today

Updated On:

Products

Network Observability CA Performance Management

Issue/Introduction

An issue has been identified where a Vertica node within a Data Repository cluster generates an excessive volume of SSH connection attempts to other nodes using the pmadmin user. These connections are frequently closed, leading to:

  • High volume of log entries in /var/log/secure (e.g., Connection closed by authenticating user pmadmin [preauth]).
  • Increased consumption of system resources.
  • Log noise that can mask other critical system events.

The source of these connections often uses sshpass to attempt password-based authentication when passwordless SSH is not configured for the pmadmin user.

Environment

  • Product: DX NetOps Performance Management (Performance Center, Data Aggregator, Data Repository).
  • Data Repository: Vertica cluster (typically three nodes).
  • Configuration: Data Aggregator installed using the non-root pmadmin user.

Cause

The excessive SSH attempts are typically not caused by standard Performance Management monitoring. The Data Aggregator monitors the Data Repository via JDBC on port 5433, not through SSH loops.

In reported cases, the root cause was a hung instance of the dr_validate.sh script. This script is part of the Data Repository validation process and uses sshpass for automated connectivity checks. If the process becomes stuck in an infinite loop, it will continuously attempt to authenticate, causing the observed log volume.

Resolution

To resolve this issue, identify and terminate the hung validation process on the source Vertica node:

  1. Identify the Source Node: Check the system logs on the receiving node to identify the IP address of the source node initiating the connections.
  2. Locate the Rogue Process: Log into the source node and run the following command to identify the process:
    bash
    ps -ef | grep dr_validate.sh
    Look for a process that has accumulated a significant amount of CPU time (e.g., several days), indicating it is stuck.
  3. Verify sshpass Activity: Use the following command to see the repeated execution in real-time:
    bash
    watch -n 1 'ps -ef | grep sshpass'
  4. Terminate the Process: Once the PID is identified (e.g., 339250), terminate the hung process:
    bash
    kill -9 [PID]
  5. Verify Resolution: Monitor /var/log/secure to ensure the "Connection closed" entries for the pmadmin user have stopped.

Additional Information

Log into the source node and trace what is executing ssh against target node:

  • crontab -l -u pmadmin (to check for custom automated tasks).

  • systemctl list-timers --all (to check for systemd timers).

  • watch -n 1 'ps -ef | grep sshpass' (to catch the process execution in real-time, since your logs show it triggering every few seconds).