Supervisor is stuck in "Configuration Status: Configuring" with an error indicating the configured Control Plane VMs NSX Edge Cluster is invalid. From the Supervisor management menu when checking the supervisor you would see belowInitialized vSphere resourcesDeployed Control Plane VMsConfigured Control Plane VMs • NSX Edge Cluster <cluster name> is invalid.Configured Core Supervisor Services
From VCF OPs if completing an upgrade of the workload vCenter you would see "Error: Cluster domain-c<id>is not ready yet." in the Pre-check.
VCF 9.0
vSphere Kubernetes Service
If the NSX edge cluster is created via Policy API, the edge cluster policy ID and the NSX API ID would be different. From logs /var/log/vmware/wcp/wcpsvc.logdebug wcp [nsxt/validator.go:77] [opID=69ac9240-85726ca4-14e1-4f15-85f9-ca7a06168874] Processing policy edge cluster edge-wrk-cl-01 with nsx_id <policy id>...error wcp [kubelifecycle/kubenodeconfig.go:277] [opID=69ac9240-85726ca4-14e1-4f15-85f9-ca7a06168874] Edge Cluster '<edge cluster name>' does not have corresponding Policy ID. Err: Unknown edge cluster <edge cluster name>error wcp [kubelifecycle/controller.go:2084] [opID=69ac9240-85726ca4-14e1-4f15-85f9-ca7a06168874] Error creating master node config. Err NSX Edge Cluster <edge cluster name> is invalid.error wcp [kubelifecycle/controller.go:2381] [opID=69ac9240-85726ca4-14e1-4f15-85f9-ca7a06168874] Error configuring API server on cluster <id> NSX Edge Cluster <edge cluster name> is invalid. warning wcp [kubelifecycle/controller.go:1103] [opID=69ac9240-85726ca4-14e1-4f15-85f9-ca7a06168874] Unable to configure agent in cluster domain-c10. Err NSX Edge Cluster <edge cluster name> is invalid.
Checking network_provider_settings from wcp.cluster_db_configs table shows incorrect valueselect network_provider_settings from wcp.cluster_db_configs; network_provider_settings -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------{"PodCidrs": [{"IP": "10.96.208.0", "Mask": "///+AA=="}], "RoutedMode": false, "EgressCidrs": [{"IP": "###.###.##.##", "Mask": "////AA=="}], "IngressCidrs": [{"IP": "###.###.##.##", "Mask": "////AA=="}], "Tier0Gateway": "t0-wrk-01", "TransitCidrs": [{"IP": "###.###.##.##", "Mask": "//8AAA=="}, {"IP": "###.###.##.##", "Mask": "////8A=="}], "EdgeClusterID": "<edge cluster name>", "CrdMacAddressSpec": {"VMMacAddressMappings": {}}, "NamespaceSubnetPrefix": 28, "ClusterDistributedSwitchID": "## ## ## ## ## ## ##-## ## ## ## ##"}(1 row)
Issue resolved in VCF 9.0.2
Update the table to use Management Plane ID (Realized ID) for edge cluster.
Management Plane ID can be confirmed by checking NSX UI > System > Configuration > Fabric > Nodes > Edge clusters
SSH to workload vCenter.
Connect to VCDB
psql -d postres -U VCDB
Run query to update, setting value of EdgeClusterID to MP ID of the cluster instead of Policy ID".
UPDATE wcp.cluster_db_configs SET network_provider_settings = '{"PodCidrs": [{"IP": "####", "Mask": "///+AA=="}], "RoutedMode": false, "EgressCidrs": [{"IP": "####", "Mask": "////AA=="}], "IngressCidrs": [{"IP": "####", "Mask": "////AA=="}], "Tier0Gateway": "t0-wrk-01", "TransitCidrs": [{"IP": "####", "Mask": "//8AAA=="}, {"IP": "####", "Mask": "////8A=="}], "EdgeClusterID": "<cluster id>", "CrdMacAddressSpec": {"VMMacAddressMappings": {}}, "NamespaceSubnetPrefix": 28, "ClusterDistributedSwitchID": "<cluster id>"}'
Restart wcp service
service-control --restart wcp
Supervisor should go back to a running state