Symptoms:
Disabling vSphere with Tanzu from Workload Management -> Clusters page hangs forever with the message " info: NSX Resource cleanup is in progress."
Example logs in vCenter under /var/log/vmware/wcp/wcpsvc.log for this error is...2020-10-18T13:59:42.929Z debug wcp [opID=5f7eb0ac-domain-cxxxx] Looking for agencies matching 'vmware-vsc-apiserver' (prefixMatch: true) in set of 0
2020-10-18T13:59:43.493Z debug wcp [opID=5f7eb0ac-domain-cxxxx] Lock found for nsx_policy_cleanup script for cluster domain-cxxxx:f647da2a-6fc2-484c-9987-0ff174199303. Cleanup in progress, retry later. stdout: ERROR: Failed to create sessionID with endpoint https://vcenter.example.com:443/rest/com/vmware/cis/session: Status 401: Error: {"type":"com.vmware.vapi.std.errors.unauthenticated","value":{"error_type":"UNAUTHENTICATED","messages":[{"args":[],"default_message":"Authentication required.","id":"com.vmware.vapi.endpoint.method.authentication.required"}],"challenge":"Basic realm=\"VAPI endpoint\",SIGN realm=5aa05f6e05307b24526237f0639137a2b6ea9d6f,service=\"VAPI endpoint\",sts=\"https://vcenter.example.com/sts/STSService/vsphere.local\""}}
. stderr: Traceback (most recent call last):
File "/usr/lib/vmware-wcp/nsx_policy_cleanup.py", line 1588, in <module>
all_res=options.all_res)
File "/usr/lib/vmware-wcp/nsx_policy_cleanup.py", line 155, in __init__
self.header.update(provider.get_header_value())
File "/usr/lib/vmware-wcp/jwt_session.py", line 425, in get_header_value
token_value = self.get_token()
File "/usr/lib/vmware-wcp/jwt_session.py", line 408, in get_token
jwt_resp, use_old_audience = self._tes_session. \
File "/usr/lib/vmware-wcp/jwt_session.py", line 249, in exchange_for_jwt
session_id = self._retrieve_vapi_session(saml_hok)
File "/usr/lib/vmware-wcp/jwt_session.py", line 241, in _retrieve_vapi_session
raise Exception
Exception
2020-10-18T13:59:43.493Z warning wcp [opID=5f7eb0ac-domain-cxxxx] NSX resource removal did not fully complete for cluster domain-cxxxx. Retrying. Err: NSX cleanup in progress. This operation is part of NSX cleanup and will be retried.
VMware vSphere 7.0.x
This happens due to an uppercase letter in the vCenter hostname causing the JWT (java web token) to NSX to fail.
This issue has been fixed in NSX Container Plugin(NCP) 3.0.1 release.
Workaround:
First create a backup of the script we will be editing.
cp /usr/lib/vmware-wcp/jwt_session.py /usr/lib/vmware-wcp/jwt_session.py.backup
edit /usr/lib/vmware-wcp/jwt_session.py on VCSA, find line
norm_req += "\n" + self._vc_endpoint
(it's 156 in my build but it may differ for you, there's only one such line though)
and change it to:
norm_req += "\n" + self._vc_endpoint.lower()
No need to restart anything. You simply need to wait for the components to clean up again or try to disable the cluster again.