Detect Auto-Repair on Errors for all Container Service Extension Clusters
search cancel

Detect Auto-Repair on Errors for all Container Service Extension Clusters


Article ID: 325530


Updated On:


VMware Cloud Director


It is possible that a provisioned cluster can go into an error state due to a known issue.

If the Auto Repair on Errors feature is activated on the cluster, that cluster can get deleted and recreated, which causes disruption of workloads on that cluster.
This article will help providers identify the scope of clusters that may be affected by this issue and update the cluster definitions to avoid it.

The auto-repair flag was added to Container Service Extension (CSE) to retry cluster creation when temporary errors (e.g.; timeouts) occur. The functionality is not disabled when the cluster reaches the Available state.


VMware Cloud Director 10.x


This issue is resolved in Container Service Extension 4.1.1 found here at VMware Downloads.

If you are unable to upgrade, use the script to identify which clusters have the auto-repair flag enabled. After identifying the affected clusters, visit the settings page for each cluster to disable this setting.

Environment Variables

export VCD_URL=           #
export VCD_USER=          # administrator

export https_proxy=       #

CLI Options

<org_name>             # Print usage for this organization
-A,--all-orgs          # Iterate over all organizations and print usage

-k,--insecure          #
--cacert path          #
--capath path          #

--debug                # Print all commands to the console. Warning: this will expose passwords and API tokens.


Note: Add --cacert /path/to/ca-certificates.pem if you are using self-signed certificates for VCD. You may alternatively use -k if you want to skip certificate validation.


Execute ./ -A to print a report of all clusters and a PASS/WARN/FAIL result based on the auto-repair flag.

  • PASS = entity.spec.vcdKe.autoRepairOnErrors is not set on the Cluster.
  • WARN = entity.spec.vcdKe.autoRepairOnErrors is set on a Cluster that is in provisioning or error state.
  • FAIL = entity.spec.vcdKe.autoRepairOnErrors is set on a provisioned Cluster.

Example Output
<Flag State>  . . . <Org Name>/<Cluster Name> - <Reason>
PASS ... solutions/harbor
PASS ... solutions/tmc01
WARN ... alpha/development - The auto-repair flag is enabled and the cluster is in error state.
PASS ... alpha/banking
FAIL ... alpha/services
FAIL ... bravo/development - The auto-repair flag is enabled and the cluster is provisioned.
PASS ... bravo/banking
PASS ... bravo/services


Clusters that return a FAIL result should be updated immediately to disable the auto-repair flag. Clusters that return a WARN result should be evaluated to determine if changes are necessary.

These steps may be taken by the cluster author or a system user with appropriate privileges.
  1. Open the VMware Cloud Director UI.
  2. Browse to More -> Kubernetes Container Clusters.
  3. Click on the name of the affected Cluster.
  4. Click on Settings.
  5. Disable the Auto Repair on Errors option.
  6. Click Save.
  7. Repeat these steps for all affected Clusters.
  8. Execute the script again to confirm the changes are reflected in the results.


detect-cluster-autorepair get_app
detect-cluster-autorepair get_app