Logins to the UI intermittently fail with 502 Bad Gateway exceptions while acquiring access tokens from the Identity Service after upgrading to vRA 8.5.0
search cancel

Logins to the UI intermittently fail with 502 Bad Gateway exceptions while acquiring access tokens from the Identity Service after upgrading to vRA 8.5.0

book

Article ID: 318377

calendar_today

Updated On:

Products

VMware Aria Suite

Issue/Introduction

This article applies to vRealize Automation 8.5.0 GA only.

Symptoms:
  • After upgrading to vRA 8.5 GA login to UI intermittently fails and services logs contain 502 Bad Gateway exceptions while trying to acquire access tokens from Identity service.
  • Errors from the identity-service within the identity-service-app-xxxxxxxxx-xxxxx pod contains errors similar to
    2021-09-02T13:42:30.824Z ERROR identity-service [host='identity-service-app-9b97cb5f7-n9mdv' thread='reactor-http-epoll-1' user='' org='' trace='<UUID>'] reactor.netty.http.server.HttpServer.error:319 - [id:0xa55a54c7,L:/10.244.0.128:8080 - R:/10.244.0.126:45626] 
    java.io.UncheckedIOException: java.nio.file.FileSystemException: /tmp/synchronoss-file-upload-10801716997771936944: Too many links


Environment

VMware vRealize Automation 8.5.x

Cause

This issue is caused by an external framework that the identity-service is using. For every request posted to the identity-service for acquisition of access tokens, an empty subdirectory is created in the identity pods under /tmp.

The /tmp directory has a maximum hard limit of 64999 subdirectories that can be created within.  Once this limit is reached, the identity-service returns 502 error responses as new subdirectories cannot be created under /tmp.

Resolution

This issue is resolved in vRealize Automation 8.5.1 and later.

Workaround:

This workaround consists of a script that cleans up the unused directories within the identity pods every 12 hours.

Prerequisites

  • Create simultaneous without memory snapshots for each appliance in the cluster.

Procedure

  1. SSH / PuTTy into one appliance in the cluster with root
  2. Run the following command
    echo 'IyEvYmluL2Jhc2gKCmlmIFsgIiQoY2F0IC9vcHQvc2NyaXB0cy9zdGF0ZV9lbmZvcmNlbWVudC5zaCB8IGdyZXAgImNsZWFudXBfaWRlbnRpdHlfcG9kc191bnVzZWRfZGlycyIgfCB3YyAtbCkiID09ICIxIiBdCnRoZW4KICAgIGVjaG8gIlRoZSBLQiBmb3IgY2xlYW51cCBvZiB1bnVzZWQgZGlycyBpbiBpZGVudGl0eSBwb2RzIGlzIGFscmVhZHkgYXBwbGllZC4iCiAgICBleGl0IDAKZmkKCnZyYWNsaSBjbHVzdGVyIGV4ZWMgLS0gYmFzaCAtYyAiZWNobyAnSXlFdlltbHVMMkpoYzJnS0NpTWdRMjl3ZVhKcFoyaDBJQ2hqS1NBeU1ESXhJRlpOZDJGeVpTd2dTVzVqTGlBZ1FXeHNJSEpwWjJoMGN5QnlaWE5sY25abFpDNEtJd29qSUZSb2FYTWdZMjlrWlNCcGN5Qm1iM0lnZFhObElHWnliMjBnWW5WcGJIUXRhVzRnWVhWMGIyMWhkR2xqSUhONWMzUmxiWE11SUVSdklHNXZkQ0JqWVd4c0lHbDBJR1p5YjIwS0l5QXpjbVFnY0dGeWRIa2djM2x6ZEdWdGN5d2djbVYxYzJVZ2IzSWdjbVZ3Y205a2RXTmxMZ29qQ2dvaklGUm9hWE1nYzJOeWFYQjBJR2x6SUc5dWJIa2dZWEJ3YkdsallXSnNaU0JtYjNJZ2RsSkJJRGd1TlNCSFFTQjJaWEp6YVc5dUlHRnVaQ0JwZENCemFHOTFiR1FnYm05MElHSmxJSFZ6WldRZ2IyNGdZVzU1SUc5MGFHVnlJSFpTUVNCMlpYSnphVzl1Y3k0S0NtWjFibU4wYVc5dUlHeHZaeWdwSUhzS0lDQnNiMk5oYkNCdGMyYzlJaVF4SWdvZ0lHeHZZMkZzSUd4bGRtVnNQU0lrTWlJS0lDQnNiMk5oYkNCa2REMGtLR1JoZEdVZ0p5c2xXUzBsYlMwbFpDQWxTRG9sVFRvbFV5Y3BDZ29nSUdWamFHOGdJbHNrYkdWMlpXeGRXeVJrZEYwZ0pHMXpaeUlLZlFvS1puVnVZM1JwYjI0Z2JHOW5YMmx1Wm04b0tTQjdDaUFnYkc5bklDSWtNU0lnSWtsT1JrOGlDbjBLQ25ObGRDQXJaUW9LYkc5blgybHVabThnSWtOc1pXRnVhVzVuSUhWd0lHbGtaVzUwYVhSNUlIQnZaSE1nZFc1MWMyVmtJR1JwY25NdUxpNGlDbWxtSUZzZ0xXWWdMM1poY2k5MmJYZGhjbVV2Y0hKbGJIVmtaUzlwWkdWdWRHbDBlUzF6ZG1NdmJHRnpkQzFqYkdWaGJuVndJRjBLZEdobGJnb2dJR2xtSUZzZ0lpUW9abWx1WkNBdmRtRnlMM1p0ZDJGeVpTOXdjbVZzZFdSbEwybGtaVzUwYVhSNUxYTjJZeThnTFc1aGJXVWdiR0Z6ZEMxamJHVmhiblZ3SUMxMGVYQmxJR1lnTFcxdGFXNGdMVGN5TUNCOElIZGpJQzFzS1NJZ1BUMGdJakVpSUYwS0lDQjBhR1Z1Q2lBZ0lDQnNiMmRmYVc1bWJ5QWlTV1JsYm5ScGRIa2djMlZ5ZG1salpTQnplVzVqYUhKdmJtOXpjeTBxSUhOMVltUnBjbVZqZEc5eWFXVnpJR2hoZG1VZ1lXeHlaV0ZrZVNCaVpXVnVJR05zWldGdVpXUWdkWEFnYVc0Z2RHaGxJSEJoYzNRZ01USWdhRzkxY25NdUlnb2dJQ0FnWlhocGRDQXdDaUFnWm1rS1pta0tDbWxrWlc1MGFYUjVYM05sY25acFkyVmZjRzlrYzE5c2FXNWxQU1FvYTNWaVpXTjBiQ0JuWlhRZ2NHOWtjeUF0YmlCd2NtVnNkV1JsSUMxc0lHRndjRDFwWkdWdWRHbDBlUzF6WlhKMmFXTmxMV0Z3Y0NBdExXOTFkSEIxZEQxcWMyOXVjR0YwYUQxN0xtbDBaVzF6TGk1dFpYUmhaR0YwWVM1dVlXMWxmU2tLQ2tsR1V6MG5JQ2NnY21WaFpDQXRjaUF0WVNCd2IyUnpJRHc4UENBaUpHbGtaVzUwYVhSNVgzTmxjblpwWTJWZmNHOWtjMTlzYVc1bElnb0tabTl5SUhCdlpDQnBiaUFpSkh0d2IyUnpXMEJkZlNJS1pHOEtJQ0FnSUdsa1pXNTBhWFI1WDNCdlpGOXpkR0YwZFhNOUpDaHJkV0psWTNSc0lHZGxkQ0J3YjJSeklDMXVJSEJ5Wld4MVpHVWdJaVJ3YjJRaUlDMHRiM1YwY0hWMFBXcHpiMjV3WVhSb1BYc3VjM1JoZEhWekxuQm9ZWE5sZlNrS0NpQWdJQ0JwWmlCYklDSlNkVzV1YVc1bklpQTlQU0FpSkdsa1pXNTBhWFI1WDNCdlpGOXpkR0YwZFhNaUlGMEtJQ0FnSUhSb1pXNEtJQ0FnSUNBZ0lDQWpJRVJsYkdWMFpTQnpkV0lnWkdseWN5QnBiaUF2ZEcxd0lHOXNaR1Z5SUhSb1lXNGdNaUJvYjNWeWN3b2dJQ0FnSUNBZ0lHeHZaMTlwYm1adklDSkVaV3hsZEdsdVp5QjFiblZ6WldRZ1pHbHljeUJtY205dElDUndiMlFpQ2lBZ0lDQWdJQ0FnYTNWaVpXTjBiQ0JsZUdWaklDMXBkQ0F0YmlCd2NtVnNkV1JsSUNJa2NHOWtJaUF0TFNCbWFXNWtJQzkwYlhBZ0xXNWhiV1VnSjNONWJtTm9jbTl1YjNOektpY2dMVzF0YVc0Z0t6RXlNQ0F0WkdWc1pYUmxJREkrTDJSbGRpOXVkV3hzSUh4OElIUnlkV1VLSUNBZ0lHWnBDbVJ2Ym1VS0NuWnlZV05zYVNCamJIVnpkR1Z5SUdWNFpXTWdMUzBnWW1GemFDQXRZeUFpYld0a2FYSWdMWEFnTDNaaGNpOTJiWGRoY21VdmNISmxiSFZrWlM5cFpHVnVkR2wwZVMxemRtTTdJSFJ2ZFdOb0lDOTJZWEl2ZG0xM1lYSmxMM0J5Wld4MVpHVXZhV1JsYm5ScGRIa3RjM1pqTDJ4aGMzUXRZMnhsWVc1MWNDSUtDZz09JyB8IGJhc2U2NCAtZCA+IC9vcHQvc2NyaXB0cy8yODMzMTYxX2NsZWFudXBfaWRlbnRpdHlfcG9kc191bnVzZWRfZGlycy5zaCAmJiBjaG1vZCAreCAvb3B0L3NjcmlwdHMvMjgzMzE2MV9jbGVhbnVwX2lkZW50aXR5X3BvZHNfdW51c2VkX2RpcnMuc2g7IGVjaG8gJy9vcHQvc2NyaXB0cy8yODMzMTYxX2NsZWFudXBfaWRlbnRpdHlfcG9kc191bnVzZWRfZGlycy5zaCcgPj4gL29wdC9zY3JpcHRzL3N0YXRlX2VuZm9yY2VtZW50LnNoIgoK' | base64 -d > /root/kb-identity-pods-cleanup.sh && chmod +x /root/kb-identity-pods-cleanup.sh && /root/kb-identity-pods-cleanup.sh && rm /root/kb-identity-pods-cleanup.sh

Validate the change

  1. Verify the shell script/opt/scripts/cleanup_identity_pods_unused_dirs.sh exists on each appliance in the cluster.
  2. Verify the shell script/opt/scripts/state_enforcement.sh contains the following in the last line
    cat /opt/scripts/state_enforcement.sh
  3. Verify that the next 2 state-enforcement pods within kube-system namespace complete successfully by running
    kubectl get pods -n kube-system