Symptoms:
- NSX UI/API may only work when directed to the VIP IP/FQDN.
- TLS handshakes to the node on TCP/443 fails even locally. In curl you may see the following error when attempting to make an API call against the node:
curl: (35) OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to <FQDN>:443
- NSX UI log support bundles may fail to any and all manager nodes except the one that currently holds the VIP.
- Running /etc/init.d/envoy status as root on the NSX manager node reveals log lines similar to the following reported by systemd:
/home/secureall/secureall/.store/.tomcat_cert.pem should start with -----BEGIN
- In /var/log/proxy/envoy.log you see log lines similar to as follows:
https-node-v4-local: Failed to load certificate chain from <inline>
- In /var/log/proton/nsxapi.log after hitting the failure condition and attempting to update the API cert, you see stack traces similar to the below. These indicate that the code was checking the certificate thumbprint to determine if a different certificate was used before overwriting the file, but in this case it failed to parse or read the content of the existing file.
2023-07-18T18:22:02.587Z ERROR org.corfudb.runtime.collections.streaming.StreamPollingScheduler-worker-3 ResumeStreamListener 59976 SYSTEM [nsx@6876 comp="nsx-manager" errorCode="MP4" level="ERROR" subcomp="manager"] Exception caught during streaming processing. Re-subscribe this listener to latest timestamp
java.lang.NullPointerException: null
at com.vmware.nsx.management.common.trust.TrustUtil.getThumbprint(TrustUtil.java:58) ~[nsx-common-util.jar:?]
at com.vmware.nsx.management.common.trust.TrustUtil.getThumbprintNoThrow(TrustUtil.java:69) ~[nsx-common-util.jar:?]
at com.vmware.nsx.management.truststore.model.CertificateProfilePEM.store(CertificateProfilePEM.java:41) ~[?:?]
at com.vmware.nsx.management.truststore.service.impl.TrustStoreServiceImpl.updateCertificateForProfile(TrustStoreServiceImpl.java:1627) ~[?:?]
at com.vmware.nsx.management.truststore.service.impl.TrustStoreServiceImpl.onNext(TrustStoreServiceImpl.java:1578) ~[?:?]
at com.vmware.nsx.persistence.ResumeStreamListener.onNextEntry(ResumeStreamListener.java:88) ~[?:?]
at com.vmware.nsx.persistence.FullSyncStreamListener.onNextEntry(FullSyncStreamListener.java:20) ~[?:?]
at com.vmware.nsx.management.truststore.service.impl.TrustStoreServiceImpl.onNextEntry(TrustStoreServiceImpl.java:129) ~[?:?]
at org.corfudb.runtime.collections.streaming.StreamingTask.lambda$null$4(StreamingTask.java:163) ~[?:?]
at io.micrometer.core.instrument.composite.CompositeTimer.record(CompositeTimer.java:79) ~[?:?]
at org.corfudb.common.metrics.micrometer.MicroMeterUtils.time(MicroMeterUtils.java:113) ~[?:?]
at org.corfudb.runtime.collections.streaming.StreamingTask.lambda$produce$5(StreamingTask.java:163) ~[?:?]
at java.util.Optional.ifPresent(Optional.java:159) ~[?:1.8.0_352]
at org.corfudb.runtime.collections.streaming.StreamingTask.produce(StreamingTask.java:163) ~[?:?]
at org.corfudb.runtime.collections.streaming.StreamingTask.run(StreamingTask.java:192) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_352]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_352]
at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_352]