NSX 4.x reverse-proxy fails to load API certificate
search cancel

NSX 4.x reverse-proxy fails to load API certificate

book

Article ID: 314345

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

Symptoms:
  • NSX UI/API may only work when directed to the VIP IP/FQDN.
  • TLS handshakes to the node on TCP/443 fails even locally. In curl you may see the following error when attempting to make an API call against the node:
    • curl: (35) OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to <FQDN>:443
  • NSX UI log support bundles may fail to any and all manager nodes except the one that currently holds the VIP.
  • Running /etc/init.d/envoy status as root on the NSX manager node reveals log lines similar to the following reported by systemd:
/home/secureall/secureall/.store/.tomcat_cert.pem should start with -----BEGIN 
  • In /var/log/proxy/envoy.log you see log lines similar to as follows:
https-node-v4-local: Failed to load certificate chain from <inline>
  • In /var/log/proton/nsxapi.log after hitting the failure condition and attempting to update the API cert you see stack traces similar to the following. These indicate that the code was checking the certificate thumbprint to determine if a different certificate was used before overwriting the file but in this case it failed to parse or read the content of the existing file.
2023-07-18T18:22:02.587Z ERROR org.corfudb.runtime.collections.streaming.StreamPollingScheduler-worker-3 ResumeStreamListener 59976 SYSTEM [nsx@6876 comp="nsx-manager" errorCode="MP4" level="ERROR" subcomp="manager"] Exception caught during streaming processing. Re-subscribe this listener to latest timestamp
java.lang.NullPointerException: null
 at com.vmware.nsx.management.common.trust.TrustUtil.getThumbprint(TrustUtil.java:58) ~[nsx-common-util.jar:?]
 at com.vmware.nsx.management.common.trust.TrustUtil.getThumbprintNoThrow(TrustUtil.java:69) ~[nsx-common-util.jar:?]
 at com.vmware.nsx.management.truststore.model.CertificateProfilePEM.store(CertificateProfilePEM.java:41) ~[?:?]
 at com.vmware.nsx.management.truststore.service.impl.TrustStoreServiceImpl.updateCertificateForProfile(TrustStoreServiceImpl.java:1627) ~[?:?]
 at com.vmware.nsx.management.truststore.service.impl.TrustStoreServiceImpl.onNext(TrustStoreServiceImpl.java:1578) ~[?:?]
 at com.vmware.nsx.persistence.ResumeStreamListener.onNextEntry(ResumeStreamListener.java:88) ~[?:?]
 at com.vmware.nsx.persistence.FullSyncStreamListener.onNextEntry(FullSyncStreamListener.java:20) ~[?:?]
 at com.vmware.nsx.management.truststore.service.impl.TrustStoreServiceImpl.onNextEntry(TrustStoreServiceImpl.java:129) ~[?:?]
 at org.corfudb.runtime.collections.streaming.StreamingTask.lambda$null$4(StreamingTask.java:163) ~[?:?]
 at io.micrometer.core.instrument.composite.CompositeTimer.record(CompositeTimer.java:79) ~[?:?]
 at org.corfudb.common.metrics.micrometer.MicroMeterUtils.time(MicroMeterUtils.java:113) ~[?:?]
 at org.corfudb.runtime.collections.streaming.StreamingTask.lambda$produce$5(StreamingTask.java:163) ~[?:?]
 at java.util.Optional.ifPresent(Optional.java:159) ~[?:1.8.0_352]
 at org.corfudb.runtime.collections.streaming.StreamingTask.produce(StreamingTask.java:163) ~[?:?]
 at org.corfudb.runtime.collections.streaming.StreamingTask.run(StreamingTask.java:192) ~[?:?]
 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_352]
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_352]
 at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_352]
 
 


Environment

VMware NSX 4.x

Cause

  • When importing a certificate with extra information (extra attributes) outside of PEM encoding and applying it to be the API certificate, the NSX Manager cannot correctly parse the new certificate.
  • If the Envoy service was restarted, UI and API endpoint stops accepting requests. Once the system gets in this state, applying a different certificate won't resolve the issue even though the API shows the new certificate has been applied as Envoy won't pick up the new certificate.
  • See workaround section for the recovery procedure.

Resolution

This is currently impacting in all 4.x releases and will be fixed in a future release.

Workaround:
Please open a service request with VMware GSS NSX support and refer to this article in order to implement workaround steps.

Additional Information

Impact/Risks:

  • NSX UI/API to individual Manager Cluster nodes is inaccessible if the certificate with the extra attributes is applied to the API service type of a node and Envoy service is restarted on that node. This is due to the API certificate being used for TCP/443 communication to that specific manager node. 
    • If the same certificate is used for all 3 nodes in the NSX manager cluster, the UI/API may be unavailable to those IP/FQDNs and may only be accepting requests via the VIP IP/FQDN.
  • Support bundle collection via the UI for any non-VIP manager nodes affected may fail.
  • During NSX upgrades repo_sync may fail to complete and the upgrade cannot proceed.

 

If you are contacting Broadcom support about this issue, please provide the following:

  • NSX Manager log bundles
  • Text of any error messages seen in NSX GUI or command lines pertinent to the investigation

Handling Log Bundles for offline review with Broadcom support