VCF Management Day 2 operations that create or replace a node fail because the template is corrupted or could not be found
search cancel

VCF Management Day 2 operations that create or replace a node fail because the template is corrupted or could not be found

book

Article ID: 434922

calendar_today

Updated On:

Products

VMware Cloud Foundation VCF Automation

Issue/Introduction

The VCF Services Runtime utilizes the vcf-services-runtime-template-<version>.<ob-number> VM template in vCenter to provision both control plane and worker nodes.

If this VM template is deleted, moved, or corrupted, any downstream provisioning task requiring new node creation—such as a scale-up operation—will fail until the template is properly restored.

Impact:

  • No impact on existing nodes: Already-running control plane and worker nodes will continue to operate normally.
  • New node creation fails: Cluster API (CAPI) is unable to clone from a missing or invalid template.
  • Day-N operations are blocked: Operations such as scale-up, node rollout (e.g., disk size change, machine type change), and replacing failed nodes will fail.

Symptoms

Day 2 Action failures:

  • You execute a Day 2 operation that involves creating or replacing a node (e.g. scale-up components, install certain day 2 components, add replicas for components). The operation may immediately fail or remain in a "running" or "pending" state until it times out.
  • In VCF OPS UI the day 2 operation will fail with an error message like the following:

Error Code: LCMVSPHERECONFIG1000095
LCMVSPHERECONFIG1000095
Failed to create services platform cluster. Refer to /var/log/vricm/vmsp_bootstrap_xxxxx.log for more details.
<datetime> role VCF Services Platform exists
<datetime> role VCF Services Platform Admin exists 
successfully added global permissions for user <service_user> 
successfully added global permissions for user <admin_service_user>
govc:/<datacenter>/<host>/<cluster> not found 
<datetime> ERROR : Not all ESXi Hosts in the cluster /<datacenter>/<host>/<cluster> are connected to the datastore /<datastore_path>. 
ERR:INIT0001 -Validating configuration

vCenter Events:

  • Recent Tasks: Look for a failed "Clone virtual machine" task indicating reasons such as "source template not found," "virtual machine not found," or "invalid state" during a platform node creation attempt.
  • Events: Check the cluster, resource pool, or the folder housing the VCF Services Runtime VMs and templates for events detailing failed clone operations or a missing/invalid source VM/template.
  • VMs and Templates View: In the designated VCF Services Runtime deployment folder, the template vcf-services-runtime-template-<version>.<ob-number> may be entirely missing or show a status of (orphaned) or (inaccessible).

Environment

  • VCF Management Services Runtime 9.1.0.0
  • VCF Automation 9.1.0.0
  • VMware Cloud Foundation 9.1

 

Cause

VCF Services Runtime Day 2 operations fail because the requisite vCenter VM runtime template is missing, moved, or corrupted. This prevents the Cluster API (CAPI) from cloning the template to provision new nodes.

Resolution

Repopulate the VCF Services Runtime VM template in vCenter utilizing the Staging API workflow. This procedure triggers a fresh synchronization of the component from the repository to the vCenter environment.

Prerequisites

  • Access: Network connectivity to the VCF Services Runtime Platform Gateway (Management API).
  • Repository: The depot manifest URL must be accessible from the VCF Services Runtime.
  • Authentication: The administrative credentials utilized during cluster bring-up in VCF Installer / VCF Ops.

Procedure

Step 1: Download the script and the requirements file

  • Download the below files from the attachments section of this kb
    • vcf_template_remediation.py
    • requirements.txt

Step 2: Find the required information to execute the script

vcenter_ip_or_fqdn - IP address or FQDN for vCenter
vcenter_username - vCenter username
vcenter_password - vCenter password
vcf_services_runtime_fqdn - Can be found in VCF Operations → Build → Lifecycle → VCF Management → Components → VCF Services Runtime → VCF services runtime FQDN
vcf_services_runtime_password - Password used when installing VCF services runtime
depot_manifest_url
  • The depot manifest is a YAML file that defines the VCF Services Runtime platform package (including the VM template). It must be hosted on an accessible HTTP(S) repository (URL typically provided via the customer's release process) so the cluster can retrieve configuration data during its lifecycle operations.
  • It typically follows this naming convention:

<base-url>/<path>/depot-manifest-vmsp-platform-<version>.<ob-number>.yaml

  • The base-url is typically the 'Fleet gateway FQDN' or 'Fleet Depot Service' endpoint, the same value configured as "Fleet gateway FQDN" or "Fleet Depot Service Endpoint" (or equivalent) in VCF Installer / VCF Ops.
  • path: The path to the depot manifest (e.g. /depot-service/content-gateway/PROD/COMP/VSP/)
  • version: The specific release version of the platform (e.g. 9.1.0.0).
  • ob-number: The official build number associated with the release (e.g. 25370367)

Example: https://<fleet-fqdn>/depot-service/content-gateway/PROD/COMP/VSP/depot-manifest-vmsp-platform-9.1.0.0.25370367.yaml

Step 3: Execute the script

1. Run these commands sequentially to create a isolated Python virtual environment, activate it, and install all of the script required dependencies.
   python3 -m venv venv
   source venv/bin/activate
   pip3 install -r requirements.txt


2. Run the Management Template Remediation Script
   python3 vcf_template_remediation.py \
        --vcenter-host <vcenter_ip_or_fqdn> \
        --vcenter-user <vcenter_username> \
        --vcenter-password <vcenter_password> \
        --platform-fqdn <vcf_services_runtime_fqdn> \
        --platform-admin-password <vcf_services_runtime_password> \
        --depot-manifest-url <depot_manifest_url> \
        [--force]

 

Note: If a previous Day 2 operation failed, the current steps may have already resolved the issue (e.g., for scale-up operations); otherwise, retry the operation.

Attachments

requirements.txt get_app
vcf_template_remediation.py get_app