Orchestrator 8.18.1 is unreachable and returns a 503 status code

Products

VCF Operations/Automation (formerly VMware Aria Suite)

Issue/Introduction

Orchestrator is configured with vSphere Authentication provider.
Orchestrator 8.18.1 UI is unreachable and returns a 503 status code when attempts are made to login.
Issue typically occurs approx. 7 days after service restart.
Issue self resolves after a number of minutes without any intervention.
executing any workflow will fail with Error:
Certificate is not in CA store or is invalid.Certificate is not in CA store or is invalid.No trusted certificate found
trying to review the connected Orchestrator from vSphere client will show the Error:

Problem in communication with one or more Automation Orchestrator servers
The vco-server-app.log contains Invalid credentials messages similar to

2025-05-11T07:07:17.470Z INFO vco [host='vco-app-' thread='tokenLifetimeMonitorScheduler-1' user='' org='' trace=''] {} com.vmware.o11n.security.sso.support.SamlTokenLifetimeService - Renewing of security tokens activated for 4 tokens expiring between 2025-05-04 08:27:17.440 and 2025-05-18 06:02:17.440.
2025-05-11T07:07:17.470Z INFO vco [host='vco-app-' thread='tokenLifetimeMonitorScheduler-1' user='' org='' trace=''] {} com.vmware.o11n.security.sso.support.SamlTokenLifetimeService - Calling token renew for: delegatable id: <id>, saml id: _<samlid>, expiration date: 2025-05-12T19:51:43.001+0000, tenant vsphere.local
2025-05-11T07:07:17.476Z WARN vco [host='vco-app-' thread='tokenLifetimeMonitorScheduler-1' user='' org='' trace=''] {} javax.xml.bind - Using non-standard property: javax.xml.bind.context.factory. Property javax.xml.bind.JAXBContextFactory should be used instead.
2025-05-11T07:07:17.555Z ERROR vco [host='vco-app-' thread='tokenLifetimeMonitorScheduler-1' user='' org='' trace=''] {} com.vmware.vim.sso.client.impl.SoapBindingImpl - SOAP fault
com.sun.xml.ws.fault.ServerSOAPFaultException: Client received SOAP Fault from server: Invalid credentials Please see the server log to find more detail regarding exact cause of the failure.
    at com.sun.xml.ws.fault.SOAP11Fault.getProtocolException(SOAP11Fault.java:163) ~[jaxws-rt-2.3.5.jar:2.3.5]

2025-05-11T07:07:17.557Z WARN vco [host='vco-app-' thread='tokenLifetimeMonitorScheduler-1' user='' org='' trace=''] {} ch.dunes.util.StubbornRetrier - Retry skipped because of non-retryable error: Provided credentials are not valid.
2025-05-11T07:07:17.557Z WARN vco [host='vco-app-' thread='tokenLifetimeMonitorScheduler-1' user='' org='' trace=''] {} com.vmware.o11n.security.sso.support.SamlTokenLifetimeService - Unable to renew token: delegatable id: <id>, saml id: _<samlID>, expiration date: 2025-05-12T19:51:43.001+0000, reason: java.lang.RuntimeException: Retry skipped because of non-retryable error: Provided credentials are not valid.

Environment

Aria Automation Orchestrator 8.x

Cause

The SAML Service Account Ephemeral certificate is not renewed in time.

Resolution

The issue is resolved in upcoming releases Aria Automation 8.18.1 cumulative update #3 & vCF Automation 9.0.1.0.

To workaround the issue on Aria Automation 8.x

As best practice pre-caution snapshot the Aria Automation Orchestrator appliance(s) before proceeding

1. SSH into the Aria Automation Orchestrator appliance as root user.

2. Run the following command to remove property forcing database configuration:

base64 -d <<< "a3ViZWN0bCBleGVjIC1uIHByZWx1ZGUgIiQodnJhY2xpIHN0YXR1cyB8IGpxIC1yICcuZGF0YWJhc2VOb2Rlc1tdIHwgc2VsZWN0KC5Sb2xlPT0icHJpbWFyeSIpLiJOb2RlIG5hbWUiJyB8IGN1dCAtZCAnLicgLWYgMSkiIC0tIGNocHN0IC11IHBvc3RncmVzIHBzcWwgXAogICAgLWQgInZjby1kYiIgXAogICAgLWMgIgogICAgICBERUxFVEUgRlJPTSB2bW9fY29uZmlnaXRlbQogICAgICBXSEVSRSBuYW1lPSdjb20udm13YXJlLm8xMW4uZm9yY2UtZGF0YWJhc2UtY29uZmlndXJhdGlvbic7CiAgICAgICIgJiYgXAp2cmFjbGkgY2x1c3RlciBleGVjIC0tIHNlZCAtaSAnL2NvbS52bXdhcmUubzExbi5mb3JjZS1kYXRhYmFzZS1jb25maWd1cmF0aW9uL2QnIC9kYXRhL3Zjby91c3IvbGliL3Zjby9hcHAtc2VydmVyL2NvbmYvdm1vLnByb3BlcnRpZXM=" | bash -

3. Use the legacy command to re-register the authentication with the following steps:

A) Login into the Orchestrator server container

kubectl exec -itn prelude $(kubectl get pod -n prelude -l app=vco-app -o jsonpath="{.items[0].metadata.name}") -c vco-server-app -- bash

B) Install the legacy configuration tool

rpm -i --nodeps vco-cfg-cli.rpm && cp /usr/lib/vco/app-server/deploy/vco/WEB-INF/lib/* /usr/lib/vco-cli/lib

C) Re-configure the authentication. Replace VC_URL, VC_USERNAME, VC_PASSWORD, VC_ADMIN_GROUP, VC_ADMIN_GROUP_DOMAIN, VC_TENANT with the corresponding vCenter URL, Administrator Username, Administrator Password, Orchestrator Admin Group, Orchestrator Admin Group Domain and vCenter Default Tenant (everything except the password can be found by running vracli vro authentication outside the container)

/usr/lib/vco-cli/bin/vro-configure-inner.sh authentication-vsphere --register --lsUrl "${VC_URL}" --username "${VC_USERNAME}" --password "${VC_PASSWORD}" --adminGroup "${VC_ADMIN_GROUP}" --adminGroupDomain ${VC_ADMIN_GROUP_DOMAIN} --tenant ${VC_TENANT}

4. Restart the orchestrator server and exit from the container.

kill 1

5. Wait for orchestrator to start

kubectl -n prelude get pods -w

To revert the workaround when patch/cumulative update #3 is released you simply follow the steps in the documentation to reconfigure the authentication provider by leveraging the vracli vro authentication command.

Additional Information

To revert the workaround follow the steps in the documentation to reconfigure the authentication provider by leveraging the vracli vro authentication command.