Tasks are very slow
search cancel

Tasks are very slow

book

Article ID: 399428

calendar_today

Updated On:

Products

VMware Cloud Director

Issue/Introduction

  • Tasks are taking a long time to complete. 
  • The cell.log on the primary cell is showing java heap exhaustion, "Out of Memory" errors, or "Timed_Waiting".
  • Linux commands are failing on the primary cell. 
  • The cell.log on the standby cells appear healthy.
  • Rebooting the primary cell resolves the issue temporarily. 

Environment

  • 10.x

Resolution

Redeploy the Primary Cell. 

  1. Reboot the primary cell to temporarily resolve the communication issues with the standby cells. This should allow the repmgr cluster to show Healthy status.

  2. Switchover to Standby. Promote a healthy Standby cell to Primary via Appliance Management UI.

    Switch the Roles of Your Primary and a Standby VMware Cloud Director Cell in a Database High Availability Cluster


  3. Removed the former Primary cell from the repmgr cluster and powered off the associated VM.

    Recover from a VMware Cloud Director Appliance Standby Cell Failure in a High Availability Cluster


  4. Cleaned up residual configuration and data, including entries in postgresql.conf, appliance-nodes and cells folder located in the transfer directory.

    • /var/vmware/vpostgres/current/pgdata/postgresql.conf: Backup this file then remove the failed Primary cell from the line synchronous_standby_names = ''

    • /opt/vmware/vcloud-director/data/transfer/appliance-nodes: In the appliance-nodes folder you will see additional folders with uuid's. You can cat the repmgr-node-name to determine which uuid folder needs to be removed. Remove the failed Primary cell folder.

    • /opt/vmware/vcloud-director/data/transfer/cells: In the cells folders you will see additional folders with uuids. You can connect to the database to determine which uuid folder needs to be removed.

      # sudo -i -u postgres psql vcloud
      # select name, uuid from cells;


  5. Execute a rolling reboot of the remaining cells to ensure stability and consistency across the environment.
    Reboot the Standby, perform a Standby Switchover, then reboot the next Standby cell. 


  6. Deploy a new Standby cell in place of the failed Primary. 

    VMware Cloud Director Appliance Deployments and Database High Availability Configuration