VIMs are stuck in Unknown state after upgrade

search cancel

VIMs are stuck in Unknown state after upgrade

book

Article ID: 409778

calendar_today

Updated On:

Products

VMware Telco Cloud Automation

Issue/Introduction

Upon upgrading a TCA 3.x system (which has previously been migrated from TCA 2.x) to TCA 3.4, all the VIMs are stuck in Unknown state with the Event Mesh status as Disconnected.

Environment

3.4

Cause

As part of security enhancements in VMware Telco Cloud Automation 3.4, all internal components were switched to Strict Hostname Checks which requires the Certificate of the system to clearly contain the FQDN as the CN, or as part of the SAN field within the certificate.

Kafka Event Mesh was one such service. This relies on the externalAddress field within the OVF Properties of the TCA Appliance to decide which hostname / IP to reach out to for the corresponding TCA-CPs. It is recommended to provide the system FQDN here.

If this value is unset / left blank during deployment, the system defaults this field to the IP Address. During Migration of TCA 2.x to 3.x, customers typically left this value as blank (as it was optional back then) - and thus many production systems might have this value being defaulted to the IP Address. This causes Kafka Event Mesh to communicate over IPs. With the stricter checks and validations in place for TCA 3.4, a pre-requisite would be to have the TCA certificate contain both the IP Address and the FQDN as part of the SAN (Subject Alternative Names) field.

Resolution

Workaround

Add IP Address to SAN
Add the IP Address to the SAN fields within the TCA certificate
Apply this certificate on the TCA-CP and import this certificate in TCA
Change externalAddress to FQDN
Execute the following after the system has been upgraded to TCA 3.4:
- Ensure that tca-deploy service is successful and not in an activating state by executing the following command:
```
systemctl status tca-deploy
```
- Execute the following command to change the Kafta Event Mesh to communicate over FQDN:
```
kubectl patch kafka edge -n tca-cp-cn --type='json' -p='[{"op": "replace", "path": "/spec/kafka/listeners/1/configuration/brokers/0/advertisedHost", "value":"<TCA_CP_FQDN_TOBE_REPLACED>"}]'
```
Perform the above steps for all the TCA-CPs
Restart the 'eventmesh' pod in the TCA-M appliance by executing the following command:
```
kubectl rollout restart deployment event-mesh-connect-connect  -n tca-mgr
```

Feedback

thumb_up Yes

thumb_down No