Error: "SSH PROBLEM" Cluster Health status and "INDETERMINATE'" Failover Mode reported
search cancel

Error: "SSH PROBLEM" Cluster Health status and "INDETERMINATE'" Failover Mode reported

book

Article ID: 415143

calendar_today

Updated On:

Products

VMware Cloud Director

Issue/Introduction

  • The cluster health status of the VMware Cloud Director(VCD) database is reporting a "SSH PROBLEM" status and the failover mode is "INDETERMINATE" in the VAMI (https://VCD_cell_IP:5480). 



  • Running the command "repmgr cluster show" from the primary VCD database cell reports the following errors:

    root@VCD-01 [ ~ ]# repmgr cluster show
     ID    | Name           | Role    | Status    | Upstream         | Location | Priority | Timeline | Connection string
    -------+----------------+---------+-----------+------------------+----------+----------+----------+---------------------------------------------------------------------------------
     123  | VCD-01 | primary | * running |          | default  | 100      | 2        | host=#.#.#.# user=repmgr dbname=repmgr gssencmode=disable connect_timeout=2
     456 | VCD-02 | standby | ? running | ? VCD-01 | default  | 100      |          | host=#.#.#.# user=repmgr dbname=repmgr gssencmode=disable connect_timeout=2
     789 | VCD-03 | standby | ? running | ? VCD-01 | default  | 100      |          | host=#.#.#.# user=repmgr dbname=repmgr gssencmode=disable connect_timeout=2

    WARNING: following issues were detected
      - unable to connect to node "VCD-02" (ID: 456)
      - unable to connect to node "VCD-03" (ID: 789)

  • Running the above command on both the standby cells(i.e. VCD-02 and VCD-03) returns no errors for the cluster:

    root@VCD-02 [ ~ ]# repmgr cluster show
     ID    | Name           | Role    | Status    | Upstream       | Location | Priority | Timeline | Connection string
    -------+----------------+---------+-----------+----------------+----------+----------+----------+---------------------------------------------------------------------------------
     123| VCD-01 | primary | * running |        | default  | 100      | 2        | host=#.#.#.# user=repmgr dbname=repmgr gssencmode=disable connect_timeout=2
     456| VCD-02 | standby |   running | VCD-01 | default  | 100      | 2        | host=#.#.#.# user=repmgr dbname=repmgr gssencmode=disable connect_timeout=2
     789| VCD-03 | standby |   running | VCD-01 | default  | 100      | 2        | host=#.#.#.# user=repmgr dbname=repmgr gssencmode=disable connect_timeout=2

  • Entries for NFS share is present in /etc/fstab of primary cell, however SSH session gets hung when running the command "df -h" and trying to navigate or list contents of  "/opt/vmware/vcloud-director/data/transfer".

Environment

VMware Cloud Director 10.6.x

Cause

This issue occurs when the VCD database cells in the cluster cannot access NFS share mounted on the /opt/vmware/vcloud-director/data/transfer directory.

Resolution

To resolve this issue, ensure the VCD database cell has connectivity to it's NFS share mounted on the /opt/vmware/vcloud-director/data/transfer directory. 

  1. To verify currently mounted filesystems run these commands:

    mount

    df -h

    If it's mounted then the NFS share will appear in the output mounted on /opt/vmware/vcloud-director/data/transfer

  2. Verify the content of /etc/fstab to confirm the NFS mount appears.

    cat /etc/fstab

  3. Attempt to mount all filesystems listed in /etc/fstab using this command:

    mount -a

  4. Confirm if the Cloud Director cell can connect to the NFS server on the configured port using CURL as follows:

    curl -v telnet://<nfs_server_hostname>:<port>

    Note: the default NFS port is 2049

If the NFS share is accessible over the network and the VCD cell still cannot access the directory, VCD cell reboot may be required to restore access.Refer to Perform an Orderly Shutdown and Startup of Your VMware Cloud Director  Appliance Cluster.