tkgi update-cluster <CLUSTER_NAME> --num-nodes #' fail with the errors like:Instance update failed: There was a problem completing your request. Please contact your operations team providing the following information: service: p.pks, service-instance-guid: ########-####-####-####-025b71431500, broker-request-id: ########-####-####-####-fc770262cb8a, task-id: 184660776, operation: update, error-message: Network 'pks-########-####-####-####-9c0952c88ace' refers to an unknown availability zone 'AZ2'sudo monit summary', you see the metrics-server service is not in Running state./var/vcap/sys/log/director/metrics-server.stderr.log report errors like:`check_validity_of_subnet_availability_zone': Network 'pks-########-####-####-####-9c0952c88ace' refers to an unknown availability zone 'AZ2' (Bosh::Director::NetworkSubnetUnknownAvailabilityZone)TKGI version is not relevant to this failure condition.
This issue occurs due to a synchronization mismatch between the Global BOSH Cloud Config and the Brokered Cluster Configurations managed by TKGI.
In a TKGI environment, BOSH uses a layered configuration model. When an Availability Zone (AZ) is removed from the BOSH Director tile in Opsman, it is deleted from the Global Cloud Config. However, TKGI manages individual Kubernetes clusters as independent BOSH deployments, each possessing its own Named Cloud Config and Deployment Manifest.
If an existing cluster specifically the apply-addons errand or worker node pools still references the deleted AZ, the BOSH Director will fail validation. This creates a "deadlock" state:
To resolve this, you must temporarily restore the deleted AZs to the BOSH Director to allow the stale manifests to be updated and redeployed.
The BOSH Director must recognize the missing AZ names to process any manifest updates.
bosh -e <env> cloud-config > bosh_cc_new.ymlbosh -e <env> cloud-config > bosh_cc_orig.ymlaz well as the network section in the cloud-config. Applying the cleaned cluster manifests in Step 2.3 will report where in the global cloud-config these values need to be edited. This can be used for reference if needed.bosh -e <env> update-cloud-config bosh_cc_new.ymlbosh -d service-instance_<ID> manifest > service-instance_<ID>_manifest.ymlbosh -d service-instance_<ID> deploy service-instance_<ID>_manifest.ymlUpdating the BOSH manifest manually does not update the TKGI database. You must trigger a sync to update the Named Cloud Configs managed by the TKGI broker.
tkgi update-cluster <CLUSTER_NAME> --num-nodes <unchanged_current_count>tkgi update-cluster <CLUSTER_NAME> --compute-profile <PROFILE_NAME> --node-pool-instances "<POOL_NAME>:<COUNT>"bosh -e <env> update-cloud-config bosh_cc_orig.ymlIn TKGI, the BOSH Director doesn't just manage one big system; it manages a fleet of independent Kubernetes clusters. Each layer must be valid for the layer above it to function.
When you execute tkgi update-cluster, you are triggering a synchronized orchestration between the TKGI API (the Broker) and the BOSH Director. This process ensures that the "Blueprint" (Manifest) and the "Scaffold" (Named Cloud Config) are updated simultaneously.
bosh configs | egrep "pivotal-container-service|pks"