Cluster Health reports 'SSH PROBLEM' and Failover mode is reported 'INDETERMINATE' in the VMware Cloud Director VAMI
search cancel

Cluster Health reports 'SSH PROBLEM' and Failover mode is reported 'INDETERMINATE' in the VMware Cloud Director VAMI

book

Article ID: 415143

calendar_today

Updated On:

Products

VMware Cloud Director

Issue/Introduction

  • The cluster health status of VCD(VMware Cloud Director) database is reported with 'SSH PROBLEM' status and the failover mode as 'INDETERMINATE' in VAMI(<https://VCDIP:5480>. 



  • Running the command '/opt/vmware/vpostgres/current/bin/repmgr cluster show' from primary VCD database cell(VCD-01) reports the following errors:

    root@VCD-01 [ ~ ]# sudo -i -u postgres /opt/vmware/vpostgres/current/bin/repmgr cluster show
     ID    | Name           | Role    | Status    | Upstream         | Location | Priority | Timeline | Connection string
    -------+----------------+---------+-----------+------------------+----------+----------+----------+---------------------------------------------------------------------------------
     123  | VCD-01 | primary | * running |          | default  | 100      | 2        | host=#.#.#.# user=repmgr dbname=repmgr gssencmode=disable connect_timeout=2
     456 | VCD-02 | standby | ? running | ? VCD-01 | default  | 100      |          | host=#.#.#.# user=repmgr dbname=repmgr gssencmode=disable connect_timeout=2
     789 | VCD-03 | standby | ? running | ? VCD-01 | default  | 100      |          | host=#.#.#.# user=repmgr dbname=repmgr gssencmode=disable connect_timeout=2

    WARNING: following issues were detected
      - unable to connect to node "VCD-02" (ID: 456)
      - unable to connect to node "VCD-03" (ID: 789)

  • Running the above command on both the standby cells(i.e. VCD-02 and VCD-03) returns no errors for the cluster:

    root@VCD-02 [ ~ ]# sudo -i -u postgres /opt/vmware/vpostgres/current/bin/repmgr cluster show
     ID    | Name           | Role    | Status    | Upstream       | Location | Priority | Timeline | Connection string
    -------+----------------+---------+-----------+----------------+----------+----------+----------+---------------------------------------------------------------------------------
     123| VCD-01 | primary | * running |        | default  | 100      | 2        | host=#.#.#.# user=repmgr dbname=repmgr gssencmode=disable connect_timeout=2
     456| VCD-02 | standby |   running | VCD-01 | default  | 100      | 2        | host=#.#.#.# user=repmgr dbname=repmgr gssencmode=disable connect_timeout=2
     789| VCD-03 | standby |   running | VCD-01 | default  | 100      | 2        | host=#.#.#.# user=repmgr dbname=repmgr gssencmode=disable connect_timeout=2

  • Entries for NFS share is present in /etc/fstab of primary cell, however SSH session gets hung when running the command 'df -h' and trying to navigate or list contents of  '/opt/vmware/vcloud-director/data/transfer' .

Environment

VMware Cloud Director 10.x

Cause

This issue occurs when VCD database cells in the cluster cannot access NFS share mounted on /opt/vmware/vcloud-director/data/transfer directory.

Resolution

To resolve this issue, ensure VCD database cell has connectivity to it's NFS share mounted at /opt/vmware/vcloud-director/data/transfer directory. 
If the NFS share is accessible over the network and the VCD cell cannot access the directory, cell reboot may be required to restore access.