TCA was migrated from TCA 2.3 to 3.2.
TCA 2.3 had Airgap and Harbor partner system with FQDN in upper case. No issues were observed with this configuration on TCA 2.3.
Below Error observed while creating/updating the cluster:
Error is due to Endpoint monitor reconciling failure, requeuing" tcakuberneterscluster="test-tkg-harbor-01" namespace="test-tkg-harbor-01" error="failed to apply endpoint monitor for airgap appliance: failed to create the endpoint: Endpoint.monitoring.telco.vmware.com \"TESTAIRGAP.example.com.443\" is invalid: metadata.name: Invalid value: \"TESTAIRGAP.example.com.443\": a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9][a-z0-9])?(
.[a-z0-9]([-a-z0-9][a-z0-9])?)*')"
3.2
The cause of this issue is that when doing workload cluster lifecycle operations like creation/updating/ etc, in TCA 3.2 the management cluster will try to create custom resources known as "endpoints" on the target workload cluster. The endpoint entry to be created uses the address/FQDN as part of its name. If the address contain uppercase letters the name of the endpoint would be invalid as a custom resource name. So the endpoint creation fails and the cluster stuck in "processing" status.
Also if the vCenter FQDN has uppercase letters, same issue will be encountered.
For clarification:
Script is applied as a workaround for this issue. Below is the high level explanation on what the workaround script does:
So for existing workload cluster stuck in "processing" status, management cluster will now see monitor operator is disabled so it won't try to create any "endpoints" for it. Similarly, for future newly created workload clusters, management cluster will also skip creating any "endpoints."
Please follow the below steps for applying the script:
When the --restore option is provided, the script will just change the TBR to use the old addon configuration.
The impact of the script is the "health" column in the "connected endpoint" section in the cluster's detail page will display the value as "Unknown". When clicking the "View Health Details" link next to the "Unknown" health status, an error message would be shown saying the endpoint is not found. It is expected since we have disabled the creation of the endpoint.
The "status" column won't be impacted. And the "connected endpoint" page in the administration page won't be impacted neither.