/var/log/syslog
<timestamp> <FQDN of NSX-Mgr> NSX 6827 SYSTEM [nsx@6876 comp="nsx-manager nsx-policy-manager nsx-controller" subcomp="rollback-main" level="ERROR"] Step step3_exit_rollback failed at trigger_search_resync.
4.2.1.3.0.24533884
This issue may occur if multiple services (e.g., Proton, CBM) on NSX managers experience SSL exceptions when connecting to the Corfu server.
For ex., the below log from /var/log/pronton/nsxapi.log below indicates an SSLv3 bad certificate alert
####-##-##T##:##:##.###Z WARN netty-# NettyClientRouter ###### userEventTriggered: unhandled event SslHandshakeCompletionEvent(javax.net.ssl.SSLHandshakeException: error:14094412:SSL routines:ssl3_read_bytes:sslv3 alert bad certificate)
To find out the presence of such errors, check the NSX manager log files in /var/log
Search in /var/log of NSX manager using the command:
# find . -name '*.log' | xargs grep -l "alert bad certificate"
Then check the log files listed (if any) to see if "alert bad certificate" errors are happening around the timestamp when the issue occurred.
If any alerts due to a bad certificate are seen from the logs, then the certs on NSX managers will have to be fixed for the Corfu connections to come up fine.
Run the "carr-1.x" script on the current setup. This script will attempt to fix the certificates and will likely perform a database recovery operation, which may take some time. Please wait for the script to finish executing. Once it has completed, check the script log to review the operations and activities that were performed.
Run the step3_exit_rollback command on all three nodes. If the same issue persists, restart the search service as root using the command `/etc/init.d/search restart` on each node, then try running the "step3_exit_rollback" command again on all nodes.
Still, if issues are observed, collect 'carr.log' that is created in the folder where the "start.sh" script is located, along with the NSX managers support bundles, and raise a Broadcom Support case.
Refer: Create Broadcom Support Case