This article outlines a situation where deployment of databases in a HA topology (Single-vSphere cluster HA/Cross-vSphere cluster HA) fails while non-HA/Single-server databases deploy without issue.
dsm-tsql-provisioner-service.log (located at /var/log/tdm/provider/containers/dsm-tsql-provisioner-service.log on DSM provider appliance) may display repeated messages similar to:
"Waiting for a Node with spec.providerID vsphere://xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx to exist"
Example:
YYYY-MM-DDThh:mm:ssZ","reason":"KCPMachinesReady","message":"ReadyUnknown - * Machine <machine name>:\n * NodeHealthy: Waiting for a Node with spec.providerID vsphere://xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx to exist\n * Control plane components: Waiting for a Node with spec.providerID vsphere://xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx to exist\n * EtcdMemberHealthy: Waiting for a Node with spec.providerID vsphere://xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx to exist"},
Additionally, upon logging into a Control Plane node via SSH (REF: 'Step 1' from article Data Services Manager Workload Cluster Certificate Expiration and Renewal), a DNS resolution and/or ping failure to/from VCSA's FQDN/IP address can be observed, but only when issued from containers spawned in HA topology.
user@<machine name> [ ~ ]$ ping fqdn.of.VCSAping: fqdn.of.VCSA: Temporary failure in name resolution
user@<machine name> [ ~ ]$ ping <IP addr of VCSA>PING <IP addr of VCSA> (<IP addr of VCSA>) 56(84) bytes of data.From 192.168.1.1 icmp_seq=1 Destination Host Unreachable
The above messages indicate that containers are unable to reach the vCenter (to obtain the spec from).
DSM 2.x
DSM 9.x
Issue is caused by the Antrea network ranges overlapping with networks used for any core services (DNS, AD, etc.) within the infrastructure.
To resolve this issue, user can follow KB article How to change the Workload CIDR in Data Services Manager and change the workloadNetworkCidr parameter used by Antrea to a range that does not overlap with any core services within the infrastructure.
NOTE: This change only affects any new clusters provisioned following this change, existing deployments will not be affected.