DSM fails to deploy Postgres/MySQL workloads in HA topologies (Single/Cross vSphere cluster HA)
search cancel

DSM fails to deploy Postgres/MySQL workloads in HA topologies (Single/Cross vSphere cluster HA)

book

Article ID: 428629

calendar_today

Updated On:

Products

VMware Data Services Manager

Issue/Introduction

This article outlines a situation where deployment of databases in a HA topology (Single-vSphere cluster HA/Cross-vSphere cluster HA) fails while non-HA/Single-server databases deploy without issue. 

dsm-tsql-provisioner-service.log (located at /var/log/tdm/provider/containers/dsm-tsql-provisioner-service.log on DSM provider appliance) may display repeated messages similar to:

"Waiting for a Node with spec.providerID vsphere://xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx to exist" 

Example:

YYYY-MM-DDThh:mm:ssZ","reason":"KCPMachinesReady","message":"ReadyUnknown - * Machine <machine name>:\n  * NodeHealthy: Waiting for a Node with spec.providerID vsphere://xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx to exist\n  * Control plane components: Waiting for a Node with spec.providerID vsphere://xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx to exist\n  * EtcdMemberHealthy: Waiting for a Node with spec.providerID vsphere://xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx to exist"},

Additionally, upon logging into a Control Plane node via SSH (REF: 'Step 1' from article Data Services Manager Workload Cluster Certificate Expiration and Renewal), a DNS resolution and/or ping failure to/from VCSA's FQDN/IP address can be observed, but only when issued from containers spawned in HA topology.

user@<machine name> [ ~ ]$ ping fqdn.of.VCSA
ping: fqdn.of.VCSA: Temporary failure in name resolution

user@<machine name> [ ~ ]$ ping <IP addr of VCSA>
PING <IP addr of VCSA> (<IP addr of VCSA>) 56(84) bytes of data.
From 192.168.1.1 icmp_seq=1 Destination Host Unreachable

The above messages indicate that containers are unable to reach the vCenter (to obtain the spec from).  

Environment

DSM 2.x

DSM 9.x

Cause

Issue is caused by the Antrea network ranges overlapping with networks used for any core services (DNS, AD, etc.) within the infrastructure.

Resolution

To resolve this issue, user can follow KB article How to change the Workload CIDR in Data Services Manager and change the workloadNetworkCidr parameter used by Antrea to a range that does not overlap with any core services within the infrastructure. 

NOTE: This change only affects any new clusters provisioned following this change, existing deployments will not be affected. 

Additional Information

How to change the Workload CIDR in Data Services Manager

Data Services Manager Workload Cluster Certificate Expiration and Renewal