vSphere Supervisor stuck in configuring status with errror NSX Edge Cluster <cluster name> is invalid.
search cancel

vSphere Supervisor stuck in configuring status with errror NSX Edge Cluster <cluster name> is invalid.

book

Article ID: 432398

calendar_today

Updated On:

Products

VMware vSphere Kubernetes Service VCF Operations

Issue/Introduction

Supervisor is stuck in "Configuration Status: Configuring" with an error indicating the configured Control Plane VMs NSX Edge Cluster is invalid. From the Supervisor management menu when checking the supervisor you would see below
Initialized vSphere resources
Deployed Control Plane VMs
Configured Control Plane VMs
 • NSX Edge Cluster <cluster name> is invalid.
Configured Core Supervisor Services

From VCF OPs if completing an upgrade of the workload vCenter you would see "Error: Cluster domain-c<id>is not ready yet." in the Pre-check.

Environment

VCF 9.0
vSphere Kubernetes Service

Cause

If the NSX edge cluster is created via Policy API, the edge cluster policy ID and the NSX API ID would be different. From logs /var/log/vmware/wcp/wcpsvc.log

debug wcp [nsxt/validator.go:77] [opID=69ac9240-85726ca4-14e1-4f15-85f9-ca7a06168874] Processing policy edge cluster edge-wrk-cl-01 with nsx_id <policy id>
...
error wcp [kubelifecycle/kubenodeconfig.go:277] [opID=69ac9240-85726ca4-14e1-4f15-85f9-ca7a06168874] Edge Cluster '<edge cluster name>' does not have corresponding Policy ID. Err: Unknown edge cluster <edge cluster name>
error wcp [kubelifecycle/controller.go:2084] [opID=69ac9240-85726ca4-14e1-4f15-85f9-ca7a06168874] Error creating master node config. Err NSX Edge Cluster <edge cluster name> is invalid.
error wcp [kubelifecycle/controller.go:2381] [opID=69ac9240-85726ca4-14e1-4f15-85f9-ca7a06168874] Error configuring API server on cluster <id> NSX Edge Cluster <edge cluster name> is invalid.
 warning wcp [kubelifecycle/controller.go:1103] [opID=69ac9240-85726ca4-14e1-4f15-85f9-ca7a06168874] Unable to configure agent in cluster domain-c10. Err NSX Edge Cluster <edge cluster name> is invalid.

Checking network_provider_settings from wcp.cluster_db_configs table shows incorrect value
select network_provider_settings from wcp.cluster_db_configs;
 network_provider_settings
 
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------
{"PodCidrs": [{"IP": "10.96.208.0", "Mask": "///+AA=="}], "RoutedMode": false, "EgressCidrs": [{"IP": "###.###.##.##", "Mask": "////AA=="}], "IngressCidrs": [{"IP": "###.###.##.##", "Mask": "////AA=="}], "Tier0Gateway
": "t0-wrk-01", "TransitCidrs": [{"IP": "###.###.##.##", "Mask": "//8AAA=="}, {"IP": "###.###.##.##", "Mask": "////8A=="}], "EdgeClusterID": "<edge cluster name>", "CrdMacAddressSpec": {"VMMacAddressMappings": {}}, "Namespace
SubnetPrefix": 28, "ClusterDistributedSwitchID": "## ## ## ## ## ## ##-## ## ## ## ##"}
(1 row)

Resolution

Issue resolved in VCF 9.0.2

Update the table to use Management Plane ID (Realized ID) for edge cluster.
Management Plane ID can be confirmed by checking NSX UI > System > Configuration > Fabric > Nodes > Edge clusters

SSH to workload vCenter.

Connect to VCDB

psql -d postres -U VCDB

Run query to update, setting value of EdgeClusterID to MP ID of the cluster instead of Policy ID".

UPDATE wcp.cluster_db_configs SET network_provider_settings = '{"PodCidrs": [{"IP": "####", "Mask": "///+AA=="}], "RoutedMode": false, "EgressCidrs": [{"IP": "####", "Mask": "////AA=="}], "IngressCidrs": [{"IP": "####", "Mask": "////AA=="}], "Tier0Gateway": "t0-wrk-01", "TransitCidrs": [{"IP": "####", "Mask": "//8AAA=="}, {"IP": "####", "Mask": "////8A=="}], "EdgeClusterID": "<cluster id>", "CrdMacAddressSpec": {"VMMacAddressMappings": {}}, "NamespaceSubnetPrefix": 28, "ClusterDistributedSwitchID": "<cluster id>"}'

 

Restart wcp service

service-control --restart wcp

 

Supervisor should go back to a running state