After replacing APH-TN or APH-AR certificates, connections between Manager nodes or between GM and LM are disconnected
search cancel

After replacing APH-TN or APH-AR certificates, connections between Manager nodes or between GM and LM are disconnected

book

Article ID: 373270

calendar_today

Updated On:

Products

VMware NSX Networking

Issue/Introduction

APH-TN certificate issue

  • You create new certificates and replaced APH-TN certificates with the API POST api/v1/trust-management/certificates/<certificate_id>?action=apply_certificate&service_type=APH_TN&node_id=<manager_node_id>
  • The certificates has been successfully replaced.
  • get managers command shows some connections to the other Manager nodes are not Connected.
    nsx01> get managers
    Mon Jul 29 2024 UTC 08:52:06.357
    - <IP address of Manager 1>     Standby (NSX-RPC)
    - <IP address of Manager 2>     Connected (NSX-RPC) *
    - <IP address of Manager 3>     Standby (NSX-RPC)
    Note: Before NSX 4.2, connection to the node itself is always "Standby". Pay attention to other nodes.
    It is expected that the connections to the other 2 Manager nodes are "Connected" if everything is fine.
  • You have trouble in collecting support bundles of the other Manager nodes.
  • In /var/log/vmware/appl-proxy-rpc.log, you see connections from other Manager nodes continually fail due to certificate validation failure.
    <Timestamp> <Hostname> NSX 1812 - [nsx@6876 comp="nsx-manager" subcomp="appl-proxy" s2comp="nsx-net" tid="1846" level="INFO"] StreamSocket[1276822 Init f:-1 i:-1 ssl://0.0.0.0:1234 <- ?] Created
    <Timestamp> <Hostname> NSX 1812 - [nsx@6876 comp="nsx-manager" subcomp="appl-proxy" s2comp="nsx-net" tid="1846" level="WARNING"] Certificate validation: couldn't find SHA256 digest '<SHA256 digest of the APH-TN certificate of another Manager node>' in local trust store
    <Timestamp> <Hostname> NSX 1812 - [nsx@6876 comp="nsx-manager" subcomp="appl-proxy" s2comp="nsx-net" tid="1846" level="ERROR" errorCode="NET1111"] Certificate validation failed: 18-self signed certificate#012Certificate:#012    Data:#012        Version: 3 (0x2)#012        Serial Number: <Serial number of the certificate> #012    Signature Algorithm: sha256WithRSAEncryption#012        Issuer: CN=VMware-NSX-ApplProxyHub; O=VMware Inc.; L=Palo Alto; ST=California; C=US#012        <snip>
    <Timestamp> <Hostname> NSX 1812 - [nsx@6876 comp="nsx-manager" subcomp="appl-proxy" s2comp="nsx-net" tid="1846" level="ERROR" errorCode="NET4"] NetTransport[1] Accept on endpoint 'ssl://0.0.0.0:1234' failed with error 336105606-certificate verify failed
    <Timestamp> <Hostname> NSX 1812 - [nsx@6876 comp="nsx-manager" subcomp="appl-proxy" s2comp="nsx-rpc" tid="1846" level="WARNING"] RpcTransport[1] Accept on 'ssl://0.0.0.0:1234' failed with error 336105606-certificate verify failed
    <Timestamp> <Hostname> NSX 1812 - [nsx@6876 comp="nsx-manager" subcomp="appl-proxy" s2comp="nsx-net" tid="1846" level="INFO"] StreamConnection[1276821 Closing on ssl://0.0.0.0:1234 sid:1276821] Closing (reason: network error)
    <Timestamp> <Hostname> NSX 1812 - [nsx@6876 comp="nsx-manager" subcomp="appl-proxy" s2comp="nsx-net" tid="1846" level="INFO"] StreamConnection[1276821 Closed on ssl://0.0.0.0:1234 sid:-1] Closed (reason: network error, error: 0-Success)
    <Timestamp> <Hostname> NSX 1812 - [nsx@6876 comp="nsx-manager" subcomp="appl-proxy" s2comp="nsx-net" tid="1846" level="INFO"] StreamConnection[1276821 Deleted on ssl://0.0.0.0:1234 sid:-1] Pending callback count [0]
    <Timestamp> <Hostname> NSX 1812 - [nsx@6876 comp="nsx-manager" subcomp="appl-proxy" s2comp="nsx-net" tid="1846" level="INFO"] StreamSocket[1276821 Closing f:62 i:508046837 ssl://0.0.0.0:1234 <- <IP address of another Manager node>:37220] DoClose

APH-AR certificate issue

  • You create new certificates and replaced APH-TN certificates with the API POST api/v1/trust-management/certificates/<certificate_id>?action=apply_certificate&service_type=APH&node_id=<manager_node_id>
  • The certificates has been successfully replaced.
  • GM to LM state sync gets disconnected.

Environment

VMware NSX 4.1.x

VMware NSX 4.2.x

Cause

The subject of APH-TN and AHP-AR certificates of each node needs to be unique.

If multiple certificates have the same subject, only one certificate can be trusted and connections from other nodes are rejected due to certificate validation failure.

Resolution

Make sure the subject of the APH-TN and APH-AR certificates are unique.

  1. Retrieve APH UUID of each Manager node from the file /etc/vmware/nsx-appl-proxy/appl-proxy-public-cfg.json
    root@nsx01:~# cat /etc/vmware/nsx-appl-proxy/appl-proxy-public-cfg.json
    { "uuid" : "53d73525-aab1-4632-b87e-70c2f84abfb6" }
  2. Create self-signed certificates or CSRs to be signed and include the UUID in CN like below.
    VMware-NSX-ApplProxyHub/UID=<UUID obtained in the step 1.>
  3. Import signed certificates if you have the CSRs signed.
  4. Apply the certificates.
    • APH-TN certificates: POST api/v1/trust-management/certificates/<certificate_id>?action=apply_certificate&service_type=APH_TN&node_id=<manager_node_id>
    • APH-AR certificates: POST api/v1/trust-management/certificates/<certificate_id>?action=apply_certificate&service_type=APH&node_id=<manager_node_id>