Cloud Director 10.x Addon "TMC Self managed" running on CSE environment might return the error "API error: Unavailable: Please try again later"
error:
you see frequent restarts for the pods (kafka and Prometheus) in tmc-local namespace
Kafka logs show:
[2159-09-09 01:08:58,596] ERROR [broker-0-to-controller-heartbeat-channel-manager]: Request BrokerHeartbeatRequestData(brokerId=0, brokerEpoch=4, currentMetadataOffset=45928312, wantFence=false, wantShutDown=false) failed due to authentication error with controller (kafka.server.BrokerToControllerRequestThread)
> org.apache.kafka.common.errors.SslAuthenticationException: SSL handshake failed
> Caused by: javax.net.ssl.SSLHandshakeException: PKIX path validation failed: java.security.cert.CertPathValidatorException: validity check failed
> at java.base/sun.security.ssl.Alert.createSSLException(Alert.java:131)
> at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:378)
> at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:321)
> at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:316)
> at java.base/sun.security.ssl.CertificateMessage$T13CertificateConsumer.checkServerCerts(CertificateMessage.java:1357)
> at java.base/sun.security.ssl.CertificateMessage$T13CertificateConsumer.onConsumeCertificate(CertificateMessage.java:1232)
> at java.base/sun.security.ssl.CertificateMessage$T13CertificateConsumer.consume(CertificateMessage.java:1175)
> at java.base/sun.security.ssl.SSLHandshake.consume(SSLHandshake.java:396)
> at java.base/sun.security.ssl.HandshakeContext.dispatch(HandshakeContext.java:480)
> at java.base/sun.security.ssl.SSLEngineImpl$DelegatedTask$DelegatedAction.run(SSLEngineImpl.java:1277)
> at java.base/sun.security.ssl.SSLEngineImpl$DelegatedTask$DelegatedAction.run(SSLEngineImpl.java:1264)
> at java.base/java.security.AccessController.doPrivileged(AccessController.java:712)
> at java.base/sun.security.ssl.SSLEngineImpl$DelegatedTask.run(SSLEngineImpl.java:1209)
> at org.apache.kafka.common.network.SslTransportLayer.runDelegatedTasks(SslTransportLayer.java:435)
> at org.apache.kafka.common.network.SslTransportLayer.handshakeUnwrap(SslTransportLayer.java:523)
> at org.apache.kafka.common.network.SslTransportLayer.doHandshake(SslTransportLayer.java:373)
> at org.apache.kafka.common.network.SslTransportLayer.handshake(SslTransportLayer.java:293)
> at org.apache.kafka.common.network.KafkaChannel.prepare(KafkaChannel.java:178)
> at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:543)
> at org.apache.kafka.common.network.Selector.poll(Selector.java:481)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:571)
> at kafka.common.InterBrokerSendThread.pollOnce(InterBrokerSendThread.scala:78)
> at kafka.server.BrokerToControllerRequestThread.doWork(BrokerToControllerChannelManager.scala:418)
> at org.apache.kafka.server.util.ShutdownableThread.run(ShutdownableThread.java:127)Cloud Director 10.5.x
TMC 1.0
Internal certificates used for Kafka has expired. This cert typically has a 6 month validity.
This is a known issue on TMC Self Managed addon 1.0. for Cloud Director.
Contact technical support and note this Knowledge Article ID (398520) in the problem description. For more information, see How to Submit a Support Request