Edge VM MPA connectivity is down after certificate replacement
search cancel

Edge VM MPA connectivity is down after certificate replacement

book

Article ID: 389595

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  1. After replacing the Certificate on the manager we will see an "MPA Connect" error on the Edge similar to the following 

 

 

  1. On the manager node, you will see entries similar to the following in the /var/log/proton/nsxapi.log file:

 

XXXX-XX-XXTXX:XX:XX.XXXZ  INFO UfoIndexer-BatchExecutor-search_manager-2 EdgeTNValidationUtils 5296 FABRIC [nsx@XXXX comp="nsx-manager" level="INFO" subcomp="manager"] Set FN state error MPA disconnected TRANSPORT_NODE_SYNC_PENDING

 

XXXX-XX-XXTXX:XX:XX.XXXZ  INFO UfoIndexer-BatchExecutor-search_manager-2 EdgeTNValidationUtils 5296 FABRIC [nsx@XXXX comp="nsx-manager" level="INFO" subcomp="manager"] [entId=/infra/sites/default/enforcement-points/default/edge-transport-node/0000-0000-0000-00] Edge either in error state, not ready or mpa disconnected, failure code: 0,state:MPA_DISCONNECTED, mpa_connection: false

 

-----------------------------------------------

 

XXXX-XX-XXTXX:XX:XX.XXXZ ERROR WrapperStartStopAppMain TrustStoreServiceImpl XXXXXX SYSTEM [nsx@6876 comp="nsx-manager" errorCode="MP100" level="ERROR" subcomp="manager"] Failed to sync certificate between DB and disk for profile: profileName: Message Bus Client for K8S Platform, serviceType: K8S_MSG_CLIENT, preProcessor: com.vmware.nsx.management.cloudnative.pre_processor.KafkaMsgClientCertPreProcessor, postProcessor: null, uniqueUse: false, clusterCertificate: true, requiresPrivateKey: true, nodeTypes: [global-manager, nsx-manager, nsx-shared], alias: k8s-msg-client, keyStorePath: /home/secureall/secureall/.store/.bluelane_keystore, keyStorePasswordPath: /config/http/.http_cert_pw

 

 

  1. On the manager node, the files under /etc/vmware/nsx-appl-proxy/ have permissions similar to the following (ls -la /etc/vmware/nsx-appl-proxy/)

 

-rw-r-----  1 uproton    uproton    1.7K XXX XX 16:22 appl-proxy-privkey.pem

-rw-r-----  1 uproton    uproton    1.7K XXX XX 22:20 appl-proxy-privkey.pem.

-rw-r-----  1 uproton    uproton    1.7K XXX XX 22:15 appl-proxy-privkey.pem.

-rw-rw-r--  1 appl-proxy appl-proxy 1.3K XXX XX 22:15 appl-proxy-ar-cert.pem

-rw-r-----  1 uproton    uproton    1.3K XXX XX 22:15 appl-proxy-ar-cert.pem.

-rw-rw-r--  1 appl-proxy appl-proxy 1.7K XXX XX 22:15 appl-proxy-ar-privkey.pem

-rw-r-----  1 uproton    uproton    1.7K XXX XX 22:15 appl-proxy-ar-privkey.pem

 

 

  1. On the faulty edge, messages similar to the following are seen in the /var/log/syslog file:

 

XXXX-XX-XXTXX:XX:XX.XXXZ  NSX XXXX - [nsx@xxx comp="nsx-edge" subcomp="nsx-proxy" s2comp="nsx-net" tid="XXXX" level="INFO"] StreamSocket[754 Open f:64 i:xxxxxxxxxxx? -> ssl://#.#.#.#:1234] on_connect xxxxxxxxxxx-certificate verify failed (SSL routines)

XXXX-XX-XXTXX:XX:XX.XXXZ  NSX XXXX - [nsx@xxx comp="nsx-edge" subcomp="nsx-proxy" s2comp="nsx-net" tid="XXXX" level="WARNING"] StreamConnection[754 Connecting to ssl://#.#.#.#:1234 sid:754] Couldn't connect to 'ssl://<ip_of_the_manager> (error: xxxxxxxxxxx-certificate verify failed (SSL routines))

XXXX-XX-XXTXX:XX:XX.XXXZ NSX XXXX - [nsx@xxxx comp="nsx-edge" subcomp="nsx-proxy" s2comp="nsx-net" tid="XXXX" level="WARNING"] StreamConnection[754 Error to ssl://#.#.#.#:1234 sid:-1] Error xxxxxxxxxxx-certificate verify failed (SSL routines)

XXXX-XX-XXTXX:XX:XX.XXXZ NSX XXXX- [nsx@xxx comp="nsx-edge" subcomp="nsx-proxy" s2comp="nsx-rpc" tid="XXXX" level="WARNING"] RpcConnection[754 Connecting to ssl://#.#.#.#:1234 0] Couldn't connect to ssl://#.#.#.#:1234 (error: xxxxxxxxxx-certificate verify failed (SSL routines))

 

Cause

Changing the certificates on the manager nodes may cause some discrepancy on the manager certificate thumbprint.

Resolution

Workaround

  1.  Get the certificate thumbprint from each manager node.
    Manager> get certificate api thumbprint
  2. SSH to the faulty edge node as the admin user.
  3. Run the following commands:
    push host-certificate <manager-IP-FQDN> username <username> thumbprint <cert-api-thumbprint-of-manager> password <password>

    sync-aph-certificates <manager-IP-FQDN> username <username> thumbprint <cert-api-thumbprint-of-manager> password <password>
  4. Repeat Step 3 for each manager node thumbprint.
  5. Switch to root (st en, enter root password when prompted)
  6. Run the following command
    /etc/init.d/nsx-proxy restart

     

Note : After performing above workaround, If required Sync Edge configuration from NSX GUI , System -- > Nodes -- > Edge transport nodes  --> Select the Edge node

then Drop down,  Actions --> Sync Edge Node Configuration and check the "Configuration State" status comes up as "Success" after sync with NSX Manager

Additional Information