vRealize Automation 8.x GA environments OLDER than 335 days from the first deployment, fail if redeployed or during upgrade due to an Identity service certificate issue
search cancel

vRealize Automation 8.x GA environments OLDER than 335 days from the first deployment, fail if redeployed or during upgrade due to an Identity service certificate issue

book

Article ID: 318921

calendar_today

Updated On: 02-23-2024

Products

VMware Aria Suite

Issue/Introduction

Symptoms:
  • The deployment is consistently failing at Populating initial identity-service data where the error is a 500 response from a cURL request towards the Identity service.
  • Deploy.log will contain errors similar to
    curl: (22) The requested URL returned error: 500 Internal Server Error
  • The Identity service logs will also contain a NullPointerException error similar to the below
    java.lang.NullPointException: null
      at ...identity.sevice.impl.CryptoServiceImpl.getPublicKey


Environment

VMware vRealize Automation 8.0.x
VMware vRealize Automation 8.3.x
VMware vRealize Automation 8.1.x
VMware vRealize Automation 8.2.x
VMware vRealize Automation 8.4.x

Cause

The Identity service certificate and its public key is rotated 30 days before expiration.  The expiration time of the public key is 365 days.  On the 335th day, a new key pair is generated.  However, when this happens, the new key pair is not updated in the database, only in-memory.  This leads to a service failure whenever the Identity service is restarted, during an upgrade or during a re-deployment of services.

Note:  The problem occurs in all releases prior to vRealize Automation 8.4.x where the environments are older than 11 months (335 days) and have recently restarted the Identity service.

Resolution

VMware is currently aware of this issue.  See the workaround below to mitigate this issue.

Workaround:
To workaround this issue, the old certificate chain should be deleted from the Identity service database so a valid one is generated on service startup.
  1. Login to one of vRealize Automation appliance(s)
  2. Stop the currently running application services by running:
    /opt/scripts/svc-stop.sh
  3. Backup only the identity service database data by dumping it into a file
    cd /root
    vracli db dump identity-db > identity-db-data.dump
  4. Log in to the identity service database by running: vracli dev psql identity-db and typing yes for recording this session.
  5. Delete the data stored in the following two tables
    delete from identity_keystore_alias where 1=1;
    delete from identity_keystore where 1=1;
Note:  Ensure all PSQL statements are ended with a semi-colon.
  1. Make sure you don't have any data left in the two tables above
    select * from identity_keystore_alias;
    should return 0 records
    select * from identity_keystore;
    should return 0 records
Note:  Ensure all PSQL statements are ended with a semi-colon.
  1. Once this is done, quit the psql console by typing
    \q
  2. Run
    /opt/scripts/deploy.sh


Additional Information

Impact/Risks:
  • All valid tokens will be invalidated.
  • All stored tokens will become invalid. Thus, causing currently long-running provisioning or approval requests to fail.
  • The only mitigation of this problem is an 8-hour maintenance window to ensure that all tokens have expired, which will cause the services to refresh the tokens.
  • Newly triggered provisioning requests etc., directly after the workaround is applied, will NOT be affected.