vRealize Automation 8.x GA environments OLDER than 335 days from the first deployment, fail if redeployed or during upgrade due to an Identity service certificate issue
searchcancel
vRealize Automation 8.x GA environments OLDER than 335 days from the first deployment, fail if redeployed or during upgrade due to an Identity service certificate issue
book
Article ID: 318921
calendar_today
Updated On: 02-23-2024
Products
VMware Aria Suite
Issue/Introduction
Symptoms:
The deployment is consistently failing at Populating initial identity-service data where the error is a 500 response from a cURL request towards the Identity service.
Deploy.log will contain errors similar to
curl: (22) The requested URL returned error: 500 Internal Server Error
The Identity service logs will also contain a NullPointerException error similar to the below
java.lang.NullPointException: null
at ...identity.sevice.impl.CryptoServiceImpl.getPublicKey
The Identity service certificate and its public key is rotated 30 days before expiration. The expiration time of the public key is 365 days. On the 335th day, a new key pair is generated. However, when this happens, the new key pair is not updated in the database, only in-memory. This leads to a service failure whenever the Identity service is restarted, during an upgrade or during a re-deployment of services.
Note: The problem occurs in all releases prior to vRealize Automation 8.4.x where the environments are older than 11 months (335 days) and have recently restarted the Identity service.
Resolution
VMware is currently aware of this issue. See the workaround below to mitigate this issue.
Workaround: To workaround this issue, the old certificate chain should be deleted from the Identity service database so a valid one is generated on service startup.
Login to one of vRealize Automation appliance(s)
Stop the currently running application services by running:
/opt/scripts/svc-stop.sh
Backup only the identity service database data by dumping it into a file
cd /root
vracli db dump identity-db > identity-db-data.dump
Log in to the identity service database by running: vracli dev psql identity-db and typing yes for recording this session.
Delete the data stored in the following two tables
delete from identity_keystore_alias where 1=1;
delete from identity_keystore where 1=1;
Note: Ensure all PSQL statements are ended with a semi-colon.
Make sure you don't have any data left in the two tables above
select * from identity_keystore_alias;
should return 0 records
select * from identity_keystore;
should return 0 records
Note: Ensure all PSQL statements are ended with a semi-colon.
Once this is done, quit the psql console by typing
\q
Run
/opt/scripts/deploy.sh
Additional Information
Impact/Risks:
All valid tokens will be invalidated.
All stored tokens will become invalid. Thus, causing currently long-running provisioning or approval requests to fail.
The only mitigation of this problem is an 8-hour maintenance window to ensure that all tokens have expired, which will cause the services to refresh the tokens.
Newly triggered provisioning requests etc., directly after the workaround is applied, will NOT be affected.