Upon upgrading a TCA 3.x system (which has previously been migrated from TCA 2.x) to TCA 3.4, all the VIMs are stuck in Unknown state with the Event Mesh status as Disconnected.
3.4
As part of security enhancements in VMware Telco Cloud Automation 3.4, all internal components were switched to Strict Hostname Checks which requires the Certificate of the system to clearly contain the FQDN as the CN, or as part of the SAN field within the certificate.
Kafka Event Mesh was one such service. This relies on the externalAddress field within the OVF Properties of the TCA Appliance to decide which hostname / IP to reach out to for the corresponding TCA-CPs. It is recommended to provide the system FQDN here.
If this value is unset / left blank during deployment, the system defaults this field to the IP Address. During Migration of TCA 2.x to 3.x, customers typically left this value as blank (as it was optional back then) - and thus many production systems might have this value being defaulted to the IP Address. This causes Kafka Event Mesh to communicate over IPs. With the stricter checks and validations in place for TCA 3.4, a pre-requisite would be to have the TCA certificate contain both the IP Address and the FQDN as part of the SAN (Subject Alternative Names) field.
Workaround
systemctl status tca-deploy
kubectl patch kafka edge -n tca-cp-cn --type='json' -p='[{"op": "replace", "path": "/spec/kafka/listeners/1/configuration/brokers/0/advertisedHost", "value":"<TCA_CP_FQDN_TOBE_REPLACED>"}]'
kubectl rollout restart deployment event-mesh-connect-connect -n tca-mgr