The cell cluster's health status is degraded and appears as 'SSH PROBLEM' in the VMware Cloud Director VAMI interface
search cancel

The cell cluster's health status is degraded and appears as 'SSH PROBLEM' in the VMware Cloud Director VAMI interface

book

Article ID: 382406

calendar_today

Updated On:

Products

VMware Cloud Director

Issue/Introduction

  • The health status of the VCD cell cluster is reported as degraded with an 'SSH PROBLEM' status in VAMI.

         

  • Attempts to access the VCD provider portal result in an HTTPS 404 error.
  • /opt/vmware/vcloud-director/logs/cell-runtime.log shows the error similar to:

| ERROR | Thread-0 (ActiveMQ-scheduled-threads) | VCDBroadcastEndpoint | Error during broadcast for local cell: ########-####-####-####-############ | org.postgresql.util.PSQLException: FATAL: no pg_hba.conf entry for host "###.###.###.###", user "vcloud", database "vcloud", SSL encryption

Environment

VMware Cloud Director 10.5
VMware Cloud Director 10.4.2

Cause

This issue occurs when there is a failure in communication between the database nodes, specifically due to a missing entry in the pg_hba.conf file for the standby cells, leading to the 'SSH PROBLEM'.

Resolution

Verify Node Status

  • Use the VCD API to check the status of the affected node via Postman:

GET https://<unreachable_standby_host_IP>:5480/api/1.0.0/nodes

  • The response should indicate that the node with IP ###.###.###.### is untrusted or unreachable.



Update pg_hba.conf File

NOTE: First verify NFS Transfer Directory Accessibility From each Node.

Sometimes the appliance-sync services is running (systemctl status appliance-sync.service to verify), but unable to sync the nodes because the transfer directory is inaccessible by one or more nodes.
On each database node try to navigate to /opt/vmware/vcloud-director/data/transfer.  If this hangs then the mount is not accessible on that node and the networking and NFS teams need to be engaged to determine the cause and fix.
Once the transfer directory is available again, give the nodes a few minutes to allow the appliance-sync.service to sync the nodes, and the issue may resolve after the pg_hba.conf files are synced, skip to Restart VCD Services below.

  • On the primary cell, add the missing entries for the standby cells in the pg_hba.conf file. The necessary entries should be in the following format:

#   TYPE   DATABASE     USER       ADDRESS      METHOD
    host   vcloud       vcloud    IP/SubnetMask    md5
    host   vcloud       vcloud    IP/SubnetMask    md5

Restart VCD Services

  • Once the pg_hba.conf entries are added, restart the VCD services on all cells.
  • Confirm that the services are running correctly and the cells are able to communicate with each other.

Verify Resolution

  • After restarting the services, verify that the VCD health status returns to HEALTHY in the VAMI interface.
  • Test accessing the VCD provider portal again to ensure the HTTPS 404 error is resolved.

Additional Information