NSX managers rollback from 4.2.1.3 to 4.1.0.2 failed due to bad sslv3 certificates
search cancel

NSX managers rollback from 4.2.1.3 to 4.1.0.2 failed due to bad sslv3 certificates

book

Article ID: 392615

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • NSX managers rollback from 4.2.1.3 to 4.1.0.2 failed in Step 3 with the following error message


  • The following error message is seen in the NSX Manager in the files /var/log/syslog
    <timestamp> <FQDN of NSX-Mgr> NSX 6827 SYSTEM [nsx@6876 comp="nsx-manager nsx-policy-manager nsx-controller" subcomp="rollback-main" level="ERROR"] Step step3_exit_rollback failed at trigger_search_resync.

Environment

4.2.1.3.0.24533884

Cause

This issue may occur if multiple services (e.g., Proton, CBM) on NSX managers experience SSL exceptions when connecting to the Corfu server.
For ex., the below log from /var/log/pronton/nsxapi.log below indicates an SSLv3 bad certificate alert

####-##-##T##:##:##.###Z  WARN netty-# NettyClientRouter ###### userEventTriggered: unhandled event SslHandshakeCompletionEvent(javax.net.ssl.SSLHandshakeException: error:14094412:SSL routines:ssl3_read_bytes:sslv3 alert bad certificate)


To find out the presence of such errors, check the NSX manager log files in /var/log
Search in /var/log of NSX manager using the command:

# find . -name '*.log' | xargs grep -l "alert bad certificate"

Then check the log files listed (if any) to see if "alert bad certificate" errors are happening around the timestamp when the issue occurred. 

Resolution

If any alerts due to a bad certificate are seen from the logs, then the certs on NSX managers will have to be fixed for the Corfu connections to come up fine.

Run the "carr-1.x" script on the current setup. This script will attempt to fix the certificates and will likely perform a database recovery operation, which may take some time. Please wait for the script to finish executing. Once it has completed, check the script log to review the operations and activities that were performed.

Refer: Using Certificate Analyzer, Results and Recovery (CARR) Script to fix certificate related issues in NSX

Run the step3_exit_rollback command on all three nodes. If the same issue persists, restart the search service as root using the command `/etc/init.d/search restart` on each node, then try running the "step3_exit_rollback" command again on all nodes.

Still, if issues are observed, collect 'carr.log' that is created in the folder where the "start.sh" script is located, along with the NSX managers support bundles, and raise a Broadcom Support case.

Refer: Create Broadcom Support Case