VIMs are stuck in Unknown state after upgrade
search cancel

VIMs are stuck in Unknown state after upgrade

book

Article ID: 409778

calendar_today

Updated On:

Products

VMware Telco Cloud Automation

Issue/Introduction

Upon upgrading a TCA 3.x system (which has previously been migrated from TCA 2.x) to TCA 3.4, all the VIMs are stuck in Unknown state with the Event Mesh status as Disconnected.

Environment

3.4

Cause

As part of security enhancements in VMware Telco Cloud Automation 3.4, all internal components were switched to Strict Hostname Checks which requires the Certificate of the system to clearly contain the FQDN as the CN, or as part of the SAN field within the certificate.

Kafka Event Mesh was one such service. This relies on the externalAddress field within the OVF Properties of the TCA Appliance to decide which hostname / IP to reach out to for the corresponding TCA-CPs. It is recommended to provide the system FQDN here.

If this value is unset / left blank during deployment, the system defaults this field to the IP Address. During Migration of TCA 2.x to 3.x, customers typically left this value as blank (as it was optional back then) - and thus many production systems might have this value being defaulted to the IP Address. This causes Kafka Event Mesh to communicate over IPs. With the stricter checks and validations in place for TCA 3.4, a pre-requisite would be to have the TCA certificate contain both the IP Address and the FQDN as part of the SAN (Subject Alternative Names) field.

Resolution

Workaround

  1. Add IP Address to SAN
  2. Add the IP Address to the SAN fields within the TCA certificate
  3. Apply this certificate on the TCA-CP and import this certificate in TCA
  4. Change externalAddress to FQDN
  5. Execute the following after the system has been upgraded to TCA 3.4:
    • Ensure that tca-deploy service is successful and not in an activating state by executing the following command:
      systemctl status tca-deploy
    • Execute the following command to change the Kafta Event Mesh to communicate over FQDN:
      kubectl patch kafka edge -n tca-cp-cn --type='json' -p='[{"op": "replace", "path": "/spec/kafka/listeners/1/configuration/brokers/0/advertisedHost", "value":"<TCA_CP_FQDN_TOBE_REPLACED>"}]'
  6. Perform the above steps for all the TCA-CPs
  7. Restart the 'eventmesh' pod in the TCA-M appliance by executing the following command:
    kubectl rollout restart deployment event-mesh-connect-connect  -n tca-mgr