VCF 9 Automation install failing with LCMVSPHERECONFIG1000095 - Failed to create services platform cluster.
search cancel

VCF 9 Automation install failing with LCMVSPHERECONFIG1000095 - Failed to create services platform cluster.

book

Article ID: 418755

calendar_today

Updated On:

Products

VCF Operations/Automation (formerly VMware Aria Suite)

Issue/Introduction

  • New VCFA 9 install fails with error "Failed to create services platform cluster."
  • Issue may also occur when upgrading from 8.18.1
  • Upgrade stuck on Step 6 for more than 2 hours.
  • Fleet Management -> Lifecycle logs (vmware_vrlcm.log) show the below errors:

retrying after error: deployment failed
release: vmsp-platform/vmsp-global-config failed: StateError:Could not determine release state: unable to determine cluster state: [Certificate/vmsp-platform/seaweedfs-client-cert dry-run failed (InternalError): Internal error occurred: failed calling webhook "webhook.cert-manager.io": failed to call webhook: Post "https://cert-manager-webhook.vmsp-platform.svc:443/validate?timeout=30s": dial tcp ###.###.###.###:443: connect: connection refused, Certificate/vmsp-platform/seaweedfs-filer-cert dry-run failed (InternalError): Internal error occurred: failed calling webhook "webhook.cert-manager.io": failed to call webhook: Post "https://cert-manager-webhook.vmsp-platform.svc:443/validate?timeout=30s": dial tcp ###.###.###.###:443: connect: connection refused, Certificate/vmsp-platform/seaweedfs-master-cert dry-run failed (InternalError): Internal error occurred: failed calling webhook "webhook.cert-manager.io": failed to call webhook: Post "https://cert-manager-webhook.vmsp-platform.svc:443/validate?timeout=30s": dial tcp ###.###.###.###:443: connect: connection refused, Certificate/vmsp-platform/seaweedfs-volume-cert dry-run failed (InternalError): Internal error occurred: failed calling webhook "webhook.cert-manager.io": failed to call webhook: Post "https://cert-manager-webhook.vmsp-platform.svc:443/validate?timeout=30s": dial tcp ###.###.###.###:443: connect: connection refused]
retrying after error: deployment timed out: context deadline exceeded
Error: failed to apply the PackageDeployment: context deadline exceeded

 

Environment

VCF Automation 9.0.

Cause

This issue can occur when there is no connectivity on port 443 between the Automation nodes (appliances) and the vCenter they are being deployed to.

To verify the cause:

  1. SSH to the new VCFA appliance as user vmware-system-user.
  2. Export kubeconfig details to use kubectl commands:
    • export KUBECONFIG=/etc/kubernetes/admin.conf
  3. Find the full name of the "vsphere-cpi-" pod:
    • kubectl get pods -n kube-system | grep -i "vsphere-cpi"
  4. View the logs from the vsphere-cpi-<####> pod:
    • kubectl logs -n kube-system vsphere-cpi-<####>
  5. Search for error:
    • Cannot connect to vCenter with err: Post "https://<VCENTER_FQDN>:443/sdk": dial tcp <VCENTER_IP>:443: connect: connection timed out

Resolution

Enable access on port 443 between the Automation nodes (appliances) and the vCenter they are being deployed to.

Additional Information

VMware Ports and Protocols - Automation
https://ports.broadcom.com/home/VMware-Cloud-Foundation-Automation