NSX managers rollback from 4.2.1.3 to 4.1.0.2 failed due to bad sslv3 certificates
search cancel

NSX managers rollback from 4.2.1.3 to 4.1.0.2 failed due to bad sslv3 certificates

book

Article ID: 392615

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

NSX managers rollback from 4.2.1.3 to 4.1.0.2 failed in Step3 with below error message

Below error is seen in syslogs of NSX managers 

####-##-##T##:##:##.###Z <FQDN of NSX-Mgr> NSX 6827 SYSTEM [nsx@6876 comp="nsx-manager nsx-policy-manager nsx-controller" subcomp="rollback-main" level="ERROR"] Step step3_exit_rollback failed at trigger_search_resync.

Environment

4.2.1.3.0.24533884

Cause

This issue could occur if multiple services (For ex., Proton, CBM) on NSX managers encounter SSL exceptions while connecting to Corfu server

For ex., below logs from nsxapi indicate an sslv3 bad certificate alert

####-##-##T##:##:##.###Z  WARN netty-# NettyClientRouter ###### userEventTriggered: unhandled event SslHandshakeCompletionEvent(javax.net.ssl.SSLHandshakeException: error:14094412:SSL routines:ssl3_read_bytes:sslv3 alert bad certificate)

To find out presence of such errors check the NSX manager log files in /var/log

Search in /var/log of NSX manager using command : >> find . -name '*.log' | xargs grep -l "alert bad certificate"

Then check the log files listed (if any) to see if "alert bad certificate" errors are happening around the timestamp when the issue occurred. 

Resolution

If any alerts due to bad certificate is seen from the logs, then certs on NSX managers will have to be fixed for the Corfu connections to come up fine.

Execute carr-1.x script on the setup in current state, which will attempt to fix the certs. Most likely it will perform DB recovery operation, which will take some time. Wait for the script to complete its execution. Check the script log for its operation and activities performed.

Refer KB : KB 369034

Run the step3_exit_rollback command on all three nodes. If the same issue persists, restart the search service as root using the command `/etc/init.d/search restart` on each node, then try running the `step3_exit_rollback` command again on all nodes.

Still if issues are observed, collect 'carr.log' that is created in the folder where the start.sh script is located along with the NSX managers support bundles and raise a Broadcom Support case 

Refer KB Link : Create Broadcom Support Case