Alarm "Failed to update the TRUSTED ROOT certificates for the vSphere Autodeploy Service" occurred
search cancel

Alarm "Failed to update the TRUSTED ROOT certificates for the vSphere Autodeploy Service" occurred

book

Article ID: 409890

calendar_today

Updated On:

Products

VMware vCenter Server

Issue/Introduction

Symptoms

  • After vCenter server certificate replacement operation or the trusted roots import into vCenter server, vSphere Autodeploy Service failed to load the new certificates.

  • You can see the following 503 errors in certificatemanagement-svcs.log:

    /var/log/vmware/certificatemanagement/certificatemanagement-svcs.log
    YYYY-MM-DDTHH:MM:SS [pool-20-thread-1 [] ERROR com.vmware.certificatemanagement.notifications.AsyncNotifier  opId=] Failed to notify AUTODEPLOY on http://localhost:1080/external-vecs/http1/localhost/6501/vmw/rbd/config/refresh-certificates, retrying again.
    YYYY-MM-DDTHH:MM:SS [pool-20-thread-1 [] ERROR com.vmware.certificatemanagement.notifications.AsyncNotifier  opId=] Failed to notify AUTODEPLOY on http://localhost:1080/external-vecs/http1/localhost/6501/vmw/rbd/config/refresh-certificates, retrying again.
    YYYY-MM-DDTHH:MM:SS [pool-20-thread-1 [] ERROR com.vmware.certificatemanagement.notifications.AsyncNotifier  opId=] Failed to notify AUTODEPLOY on http://localhost:1080/external-vecs/http1/localhost/6501/vmw/rbd/config/refresh-certificates, retrying again.
    YYYY-MM-DDTHH:MM:SS [pool-20-thread-1 [] ERROR com.vmware.certificatemanagement.notifications.AsyncNotifier  opId=] Final error while notifying AUTODEPLOY on http://localhost:1080/external-vecs/http1/localhost/6501/vmw/rbd/config/refresh-certificates
    java.lang.Exception: Failed to notify AUTODEPLOY on http://localhost:1080/external-vecs/http1/localhost/6501/vmw/rbd/config/refresh-certificates HTTP Error code: 503 Failed HTTP error message : Service Unavailable ErrorStream: no healthy upstream
  • Autodeploy service is not running.

    # service-control --status rbd
    Stopped:
     rbd
  • Stop the Autodeploy service using the service-control --stop --all command.

    /var/log/vmware/cloudvm/service-control.log

    YYYY-MM-DDTHH:MM:SS INFO service-control ********** Start ['--stop', '--all', '--ignore'] **********
  • Unsubscribe operation was failed during shutdown in service-control --stop --all command.

    /var/log/vmware/rbd/rbd-watchdog-linux.log

    YYYY-MM-DDTHH:MM:SS [251670:MainThread]INFO:rbd_watchdog_linux:Unsubsribing from NDC
    YYYY-MM-DDTHH:MM:SS [251670:MainThread]ERROR:rbd_watchdog_linux:Failed to unsubscribe from NDC
    Traceback (most recent call last):
      File "/var/lib/rbd/bin/rbd_watchdog_linux.py", line 486, in main
        refreshcertsutil.NonDisruptiveCerts.unsubscribe()
      File "bora/install/vmvisor/autodeploy/site-packages/vmware/rbd/utils/refreshcertsutil.py", line 164, in unsubscribe
      File "bora/install/vmvisor/autodeploy/site-packages/vmware/rbd/utils/vapiutil.py", line 179, in createVsphereClient
      File "bora/install/vmvisor/autodeploy/site-packages/vmware/rbd/utils/svcaccountutil.py", line 176, in getStsHokSamlAssertion
      File "/usr/lib/vmware/site-packages/pyVim/ssov2.py", line 72, in get_hok_saml_assertion_for_service_user
        hok_token = self.perform_request(soap_message, public_key, private_key,
      File "/usr/lib/vmware/site-packages/pyVim/sso.py", line 264, in perform_request
        webservice.endheaders()
      File "/usr/lib/python3.10/http/client.py", line 1278, in endheaders
        self._send_output(message_body, encode_chunked=encode_chunked)
      File "/usr/lib/python3.10/http/client.py", line 1038, in _send_output
        self.send(msg)
      File "/usr/lib/python3.10/http/client.py", line 976, in send
        self.connect()
      File "/usr/lib/python3.10/http/client.py", line 942, in connect
        self.sock = self._create_connection(
      File "/usr/lib/python3.10/socket.py", line 845, in create_connection
        raise err
      File "/usr/lib/python3.10/socket.py", line 833, in create_connection
        sock.connect(sa)
    ConnectionRefusedError: [Errno 111] Connection refused

Environment

vCenter Server 7.x
vCenter Server 8.x

Cause

This issue occurs because the autodeploy service remains subscribed to the certificatemanagement service even after the autodeploy service has stopped.

Subscriptions to the certificatemanagement service are created when a service starts and removed when the service stops.
The certificatemanagement service notifies its subscribers whenever a certificate is renewed or imported.

However, when running the service-control --stop --all command, the certificatemanagement service stops before the rbd service.
This causes the unsubscribe operation to fail, leaving the subscription active.

As a result, the certificatemanagement service attempts to send certificate update notifications to the already stopped rbd service, resulting in failed notifications with 503 error and triggering an alarm.

 

Resolution

Broadcom VCF engineering is aware of this issue and working towards a fix.


Workaround:
To workaround this issue, start and stop autodeploy service manually.

service-control --start rbd && service-control --stop rbd