Troubleshooting DSM managed PostgreSQL clusters which experience performance, connectivity, or hanging query issues
search cancel

Troubleshooting DSM managed PostgreSQL clusters which experience performance, connectivity, or hanging query issues

book

Article ID: 410984

calendar_today

Updated On:

Products

VMware Data Services Manager

Issue/Introduction

DSM-managed PostgreSQL clusters may experience various performance or connectivity issues that require in-depth system-level troubleshooting with direct visibility into PostgreSQL processes and their network namespace. One or more of the following symptoms may indicate the need for node-level diagnostics:

  • Clients cannot establish connections to PostgreSQL server, connection timeouts, or SSL handshake failures
  • Database connectivity and performance issue indicating network interface experiencing high load from potenitally few queries
  • PostgreSQL client processes consuming excessive CPU resources or causing high disk I/O load
  • Queries hang indefinitely without apparent database-level deadlocks or lock waits
  • Network Traffic Analysis Required: Need to monitor specific TCP sockets or network traffic patterns
  • Need real-time monitoring of system resources to identify performance bottlenecks

Environment

  • VMware Data Services Manager (DSM 9.0.1 onwards)
  • PostgreSQL clusters managed by DSM

Cause

DSM-managed PostgreSQL clusters may experience system-level issues that cannot be diagnosed through standard database monitoring tools.
These issues require
 access to low-level system debugging utilities to analyze network traffic, process behavior, system calls, and resource utilization patterns.

Resolution

Overview

DSM provides a debugging tool (dsm-debug) that enables administrators to access command-line certain diagnostic utilities on PostgreSQL cluster node for in-depth troubleshooting.
The tool creates ephemeral containers that share the same process and network namespace as the PostgreSQL nodes, providing full system-level visibility while maintaining security through command restrictions.

Prerequisites

  • Root SSH access to the DSM Appliance and DSM Admin privileges
  • Knowledge of cluster name and namespace

Important Notices

  • This is a feature intended for ad-hoc diagnostic purposes only
  • Not suitable for persistent monitoring and must not be used for automation
  • The CLI provides no backwards compatibility guarantees (CLI may change/be dropped/replaced)
  • The debug session share the same resources as the Postgres container - including memory, cpu, network. 
  • The debug session commands that generate write disk or network IO are restricted

Accessing Debug Tools

 

 

  1. SSH into the DSM Appliance VM.

    1. If you need to reset the password follow this guide
  2. Type dsm-debug --help  to see all options
  3. Start debugging session 
    1. Connect to a specific PostgresSQL Cluster - for example cluster with name "my-pg" in namespace "my-namespace" :   
      dsm-debug --data-service-name my-pg --namespace my-namespace 
    2. For multi node clusters. You can connect to a specific node by specifying --node-index (correspond to the index as seen in database nodes topology)
      dsm-debug --data-service-name my-pg --namespace my-namespace --node-index 1 
  4. Run any of the available diagnostic commands.
  5. Type exit to terminate debugging session

Session Management

  • Session Limits: Maximum 800 sessions can be started at total. After that the Database node need to be restart to reset the count.
  • Audit Logging: All executed commands are automatically logged to /var/log/tdm/provider/audit/dsm-debug-audit.log
  • Isolation: Each session runs in isolated and restricted environment which share name process and network namespace as the postgres node. 

Available Diagnostic Commands

The debug session provides access to a restricted set of pre-approved diagnostic commands:

Process Monitoring:

  • ps - Display running processes
  • top - Real-time process viewer
  • htop - Interactive process viewer
  • pidof - Find process IDs by name
  • pgrep - Search for processes by name
  • pmap - Display memory map of processes

System Resource Monitoring:

  • free - Display memory usage
  • df - Display filesystem usage
  • du - Display directory space usage
  • vmstat - Display virtual memory statistics
  • iostat - Display I/O statistics
  • lscpu - Display CPU information
  • lsblk - Display block devices
  • lsmem - Display memory information

Network Analysis:

  • tcpdump - Capture network packets
  • iftop- Interactive network traffic monitor
  • nload - Display network usage
  • hostname - Display system hostname
  • ss – Display socket statistics

System Tracing and Analysis:

  • strace - Trace system calls and signals
  • lsof - List open files and network connections
  • iotop-c - Display I/O usage by process

PostgreSQL-Specific Monitoring:

  • pg_activity - Real-time PostgreSQL activity monitor (to connect specify host and user pg_activity -h localhost -U postgres )

File Operations and Text Processing:

  • echo, ls, find, stat, basename, dirname, pwd - File and directory operations and information 
  • cat, head, tail, grep, sort, uniq, cut, wc, diff, tr - File content viewing and searching 

System Information:

  • uname, uptime, date, printenv - System information

Troubleshooting Examples

Connection Problems 

 
# Check network connectivity and listening ports ss -tulpn | grep 5432 lsof -i :5432 # Monitor real-time network traffic tcpdump -i any port 5432 # Analyze network interface load nload

Process analysis

 
# Find PostgreSQL processes pgrep postgres pidof postgres # Examine process memory usage pmap $(pidof postgres) # List open files and connections lsof -p $(pidof postgres)