vRA 7.x APIs return error 500 when connecting to Postgres DB
search cancel

vRA 7.x APIs return error 500 when connecting to Postgres DB

book

Article ID: 327432

calendar_today

Updated On:

Products

VMware Aria Suite

Issue/Introduction

Symptoms:

  • Various automated tasks within the vRealize Automation Appliance Management Interface (VAMI) time out with commands left in a QUEUED state because Management Agent(s) do not pick up tasks from the virtual appliance.  See below for a list of symptoms:
    • Log bundle generation with connection refused error similar to
      Failed to start log bundle collection: ('Connection aborted.', error(111, 'Connection refused')
    • Applying a Cumulative Update fails with an error similar to the below:
      Exception occurred while applying selfpatch.
      <Date> 09:32:06,404 - __main__ - ERROR : 259 - ('Command execution of all commands did not finish within the defined timeout time.\nCommand execution result:\nCommand id: <ID_number>\n Type: upgrade-management-agent\n Node id: <id_Number>\n Node host: ComponentNodeFQDN\n Result: \n Result description: \n Status: QUEUED\n\n', 'Error executing command')
    • Unable to change an IaaS certificate from the VAMI Certificates tab.  When the Manager Service is selected the UI spins indefinitely, displaying a message similar to
      VAMI is loading Manager Service host...
    • /var/log/vmware/vcac/vcac-config.log contains messages with mentions of expired x509
      <Date>T14:38:27.919762-04:00 vraFQDN [database-failover-agent][3876]: 
      <Date> 14:38:27 GetMasterStatus(): error querying database (localhost): x509: certificate has expired or is not yet valid
    • Management Agent's All.log contains errors similar to
      [UTC:YYYY-MM-DD 08:36:45 Local:2021-09-29 10:36:45] [Error]: [sub-thread-Id="8" context="" token=""] DynamicOps.Common.Client.HtmlResponseException: Internal Server Error (500)
      Request:
      GET https://<FQDN>:5480/config/nodes/<Node_Id>/commands/next-command
      Response:
      Failure: Internal server error.
    • Running the command vra-command list-nodes --components within a SSH session to any VA returns errors similar to
      Error running list-nodes: Get http://localhost/api/node?components=: dial unix /var/run/vra-command-agent.sock: connect: connection refused



Environment

VMware vRealize Automation 7.x
VMware vRealize Orchestrator 7.x

Cause

The Postgres certificate generated during the initial installation has expired. Its validity period is 5 years.

Resolution

VMware is aware of this issue.  See the below workaround for further information.

Workaround:

Replace the default Postgres self-signed certificate

Note: Do not copy or store manually backed up configuration files within /storage/db or /storage/db/pgdata.
  1. To prevent undesired failover, on all nodes starting from the Replicas and then on the Primary, run the following commands to stop services
    1. For vRealize Automation 7.x
      service psql-manager stop; service vcac-server stop; service horizon-workspace stop; service vco-server stop; service vco-configurator stop
    2. For external vRealize Orchestrator 7.x
      service psql-manager stop; service vco-server stop; service vco-configurator stop
  2. SSH / PuTTy into the primary appliance
    1. Run the following command to source the certificate_generate command from certificates.inc
      . /usr/lib/vmware-bootstrap-vrva-base/certificates.inc
      
    2. Generate new certificates
      certificate_generate --key "/tmp/server.key" --cert "/tmp/server.crt"
    3. Backup current Postgres certificates
      cp /storage/db/pgdata/server.crt /tmp/server.crt.bak
      cp /storage/db/pgdata/server.key /tmp/server.key.bak
    4. Replace the current certificate /storage/db/pgdata/server.crt and private key /storage/db/pgdata/server.key with the generated ones.
      cp /tmp/server.crt /storage/db/pgdata/server.crt
      cp /tmp/server.key /storage/db/pgdata/server.key
    5. Run the following command to set the server.crt and server.key ownership to postgres:users
      chown postgres:users /storage/db/pgdata/server.key /storage/db/pgdata/server.crt
    6. Run the following command to set a 600 mask for server.key
      chmod 600 /storage/db/pgdata/server.key
    7. Run the following command to set a 644 mask for server.crt
      chmod 644 /storage/db/pgdata/server.crt
    8. Run the following command to set /storage/db/root.crt as a hardlink to /storage/db/pgdata/server.crt and ownership to postgres:users
      ln -f /storage/db/pgdata/server.crt /storage/db/root.crt 
      
      chmod 644 /storage/db/root.crt
    9. Restart the vPostgres service
      service vpostgres stop; service vpostgres start
  3. SSH / PuTTy into each replica node and run the following
    1. Copy the new certificates using SCP or another file transfer utility such as WinSCP.
    2. Replace the current certificate /storage/db/pgdata/server.crt and private key /storage/db/pgdata/server.key with the copied certificates from the primary.
    3. Repeat Steps 2.5 through 2.8.
    4. Reset the replica DB
      rm -rf /tmp/psql-set-replica
      vcac-vami psql-set-replica -M PrimaryFQDN
  4. Start the psql-manager service first on the Primary node then on the replica nodes
    service psql-manager start
  5. On all appliance nodes (order is not relevant) start the rest of the services stopped in Step 1
    1. For vRA appliance(s) run
      service horizon-workspace start && \
      base64 -d <<< "IyEvYmluL2Jhc2gKZWNobyAnV2FpdGluZyBmb3IgaG9yaXpvbiBzZXJ2aWNlIHRvIHN0YXJ0Li4uJwpmb3IgaSBpbiB7MS4uMTIwfQpkbwogICBzdGF0dXNfY29kZT1gY3VybCAtLW1heC10aW1lIDMwIC1vIC9kZXYvbnVsbCAtcyAtdyAiJXtodHRwX2NvZGV9XG4iICdodHRwOi8vbG9jYWxob3N0OjgwODAvU0FBUy9BUEkvMS4wL1JFU1Qvc3lzdGVtL2hlYWx0aCdgCiAgIFtbICIke3N0YXR1c19jb2RlfSIgPT0gIjIiKiBdXSAmJiBicmVhawogICBlY2hvICJIb3Jpem9uIGlzIHN0aWxsIHN0YXJ0aW5nLi4uIgogICBzbGVlcCA1CmRvbmUKCmlmIFtbICIke3N0YXR1c19jb2RlfSIgPT0gIjIiKiBdXTsKdGhlbgogICBlY2hvICdIb3Jpem9uIHNlcnZpY2Ugc3RhcnRlZCBzdWNjZXNzZnVsbHkhJwplbHNlCiAgIGVjaG8gJ0hvcml6b24gc2VydmljZSBkaWQgbm90IHN0YXJ0IHdpdGhpbiB0aGUgZXhwZWN0ZWQgcGVyaW9kLiBDaGVjayBzdGF0dXMgb2YgaG9yaXpvbi13b3Jrc3BhY2Ugc2VydmljZSBhbmQgbG9ncyBpbiAvdmFyL2xvZy92bXdhcmUvaG9yaXpvbi8gZm9yIG1vcmUgZGV0YWlscy4nCiAgIGV4aXQgMQpmaQo=" | sh - && \
      service vco-server start && \
      service vco-configurator start && \
      service vcac-server start
    2. For vRO appliance(s) run
      service vco-server start; service vco-configurator start


Additional Information

The validity of the current certificate can be verified by running this command:

openssl x509 -in /storage/db/pgdata/server.crt -text -noout | more