Prechecks failed for the Active Global Manager upgrade
search cancel

Prechecks failed for the Active Global Manager upgrade

book

Article ID: 421798

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • NSX-T Federation sites are being upgraded to NSX 4.2.2.x.
  • After re-entering the LM site credential under location manager, the connection status remained disconnected. 
  • Standby GM shows sync status as "not available" and Local Manager shows sync status as "disconnected".


  • Upgrade of standby fails with error "Active GM cannot be upgraded first, please upgrade the standby GM then proceed further"

Environment

  • VMware NSX-T 4.1.2.x
  • VMware NSX-T 4.2.2.x

Cause

  • Connection to LM and GM went down and came back up after few seconds. As a result, in GM Replicator To Replicator stub to remote site goes down and comes up. Stub down thread went to idle and by the time it completed its task, connection got established Stub-up thread completed. Stub down thread resumed after sleep and marked the connection to be down

Resolution

  • Check the network connectivity between remote site and local site via ping.
  • Ensure port TCP/1236 traffic is allowed between the local and remote sites.
  • Ensure the async-replicator service is running on both local and remote sites.
    Invoke the GET /api/v1/node/services/async_replicator/status NSX API or the get service async_replicator NSX CLI command to determine if the service is running.
    If not running, invoke the POST /api/v1/node/services/async_replicator?action=restart NSX API or the restart service async_replicator NSX CLI to restart the service.
  • Check /var/log/async-replicator/ar.log to see if there are errors reported
  • Validate the below logs from Active/Standby GM:- 
    cd /var/log/gmanager/gmanager.log
    
    ####.##.##.##  INFO http-nio-#########-exec-2 RemoteEdgeClusterServiceImpl 6462 - [nsx@6876 comp="global-manager" level="INFO" reqId="####.####.####.####" subcomp="global-manager" username=#######] Was not able to get data from remote site ##############. Error org.springframework.web.client.ResourceAccessException: I/O error on GET request for "https://#.#.#.#/policy/api/v1/ui-controller/overall-edge-clusters-rtep-status": PKIX path building failed: java.security.cert.CertPathBuilderException: Unable to find certificate chain.; nested exception is javax.net.ssl.SSLHandshakeException: PKIX path building failed: java.security.cert.CertPathBuilderException: Unable to find certificate chain..
    ####.####.#####  INFO http-nio-####.####.####-exec-2 NsxKeyStoreFile 6462 SYSTEM [nsx@6876 comp="global-manager" level="INFO" reqId="####.####.####.####" subcomp="global-manager" username=##########] Create Key Store with file /home/secureall/secureall/.store/.bluelane_truststore
  • Validate the below logs from the Active global manager:-
    appl-proxy-rpc.log from Active GM 
    
    ####.####.#### ####.######## NSX 1488 - [nsx@6876 comp="global-manager" subcomp="appl-proxy" s2comp="nsx-net" tid="1517" level="INFO"] StreamSocket[963712 Open f:61 i:271469838 ? -> ssl://#.#.#.#:1236] async_connect
    ####.####.#### ####.######## NSX 1488 - [nsx@6876 comp="global-manager" subcomp="appl-proxy" s2comp="nsx-net" tid="1517" level="INFO"] StreamSocket[963712 Open f:61 i:271469838 ? -> ssl://#.#.#.#:1236] on_connect 336151576-tlsv1 alert unknown ca
  • Validate the below logs from Standby global manager:-
    appl-proxy-npc.log from Standby GM
    
    ####################### NSX 1962 - [nsx@6876 comp="global-manager" subcomp="appl-proxy" s2comp="nsx-net" tid="2018" level="ERROR" errorCode="NET4"] NetTransport[1] Accept on endpoint 'ssl://#.#.#.#1236' failed with error 167772294-certificate verify failed (SSL routines) from remote endpoint 'ssl-tcp://10.173.1.4:45454'
    ####################### NSX 1962 - [nsx@6876 comp="global-manager" subcomp="appl-proxy" s2comp="nsx-rpc" tid="2018" level="WARNING"] RpcTransport[1] Accept on 'ssl://#.#.#.#:1236' failed with error 167772294-certificate verify failed (SSL routines)

 

Note - If all the logs are matching with the Active/Standby GM then run the latest CARR script to resolve expired/Unknown certificates and update thumbprint on LM's.

CARR script - Using Certificate Analyzer, Results and Recovery (CARR) Script to fix certificate related issues in NSX

Additional Information