Removing vSphere with Tanzu Supervisor Cluster is stuck at "NSX Resource cleanup is in progress"
search cancel

Removing vSphere with Tanzu Supervisor Cluster is stuck at "NSX Resource cleanup is in progress"

book

Article ID: 400680

calendar_today

Updated On:

Products

VMware vSphere Kubernetes Service

Issue/Introduction

  • When removing or deactivating Supervisor Cluster, the process is stuck at the following step "NSX Resource cleanup is in progress". 

  • wcp service will show the following log entries on vCenter Server 

    Log Location - /var/log/vmware/wcp/wcpsvc.log 

    YYYY-MM-DDTHH:MM:SS info wcp [kubelifecycle/cluster_network.go:198] [opID=#######-#######-#######-#######-#######] NSX cleanup for cluster domain-c####:#####-#####-#####-#####-##### stdout ERROR: Failed to create sessionID with endpoint http://localhost:1080/rest/com/vmware/cis/session: Status 431: Error: <h1>Bad Message 431</h1><pre>reason: Request Header Fields Too Large</pre>
    :
    stderr: Traceback (most recent call last):
      File "/usr/lib/vmware-wcp/nsx_policy_cleanup.py", line 2051, in <module>
        nsx_client = NSXClient(host=options.mgr_ip,
      File "/usr/lib/vmware-wcp/nsx_policy_cleanup.py", line 204, in __init__
        self.header.update(provider.get_header_value())
      File "/usr/lib/vmware-wcp/jwt_session.py", line 429, in get_header_value
        token_value = self.get_token()
      File "/usr/lib/vmware-wcp/jwt_session.py", line 413, in get_token
        exchange_for_jwt()
      File "/usr/lib/vmware-wcp/jwt_session.py", line 250, in exchange_for_jwt
        session_id = self._retrieve_vapi_session(saml_hok)
      File "/usr/lib/vmware-wcp/jwt_session.py", line 242, in _retrieve_vapi_session
        raise Exception
    Exception
    
    YYYY-MM-DDTHH:MM:SS error wcp [kubelifecycle/cluster_network.go:229] [opID=#######-#######-#######-#######-#######] Received error cleaning NCP-created resources for cluster domain-c####:#####-#####-#####-#####-##### on NSX Managers: [IP Addresses of NSX Managers]:443. Err: exit status 1
    nsx_policy_cleanup stdout: ERROR: Failed to create sessionID with endpoint http://localhost:1080/rest/com/vmware/cis/session: Status 431: Error: <h1>Bad Message 431</h1><pre>reason: Request Header Fields Too Large</pre>
    
    nsx_policy_cleanup stderr: Traceback (most recent call last):
      File "/usr/lib/vmware-wcp/nsx_policy_cleanup.py", line 2051, in <module>
        nsx_client = NSXClient(host=options.mgr_ip,
      File "/usr/lib/vmware-wcp/nsx_policy_cleanup.py", line 204, in __init__
        self.header.update(provider.get_header_value())
      File "/usr/lib/vmware-wcp/jwt_session.py", line 429, in get_header_value
        token_value = self.get_token()
      File "/usr/lib/vmware-wcp/jwt_session.py", line 413, in get_token
        exchange_for_jwt()
      File "/usr/lib/vmware-wcp/jwt_session.py", line 250, in exchange_for_jwt
        session_id = self._retrieve_vapi_session(saml_hok)
      File "/usr/lib/vmware-wcp/jwt_session.py", line 242, in _retrieve_vapi_session
        raise Exception
    Exception
    
    YYYY-MM-DDTHH:MM:SS warning wcp [kubelifecycle/controller.go:2436] [opID=#######-#######-#######-#######-#######] NSX resource removal did not fully complete for zone=domain-c#### in cluster=domain-c#### NSX cleanup failed: exit status 1. Retrying. Err: %!v(MISSING)
    YYYY-MM-DDTHH:MM:SS warning wcp [kubelifecycle/controller.go:442] [opID=#######-#######-#######-#######-#######] Unable to disable cluster domain-c####. Err NSX cleanup failed: NSX cleanup failed: exit status 1
    

Environment

VMware vSphere Kubernetes Service

Cause

The token gets added twice to the Auth Header. 

Resolution

  • Open an SSH session to the vCenter Server and login as root account. 
  • Create a backup of the following file :

    • cp /usr/lib/vmware-wcp/jwt_session.py /tmp/jwt_session.py

  • Modify the file and comment out the following parameters :
     vi /usr/lib/vmware-wcp/jwt_session.py

    • auth_hdr_values.append(self._append_header_value 
    • token", compressed_token[stared_index:end_index])))

  • Before Modification : 

    start_index = index + single_auth_hdr_len
    end_index = len(compressed_token)
    auth_hdr_values.append(self._append_header_value(
        "token", compressed_token[start_index:end_index]))
    nonce = self._get_nonce()
    auth_hdr_values.append(self._append_header_value("nonce", nonce))
  • After Modification it should look like : 

    start_index = index + single_auth_hdr_len
    end_index = len(compressed_token)
    #auth_hdr_values.append(self._append_header_value(
        #"token", compressed_token[start_index:end_index]))
    nonce = self._get_nonce()
    auth_hdr_values.append(self._append_header_value("nonce", nonce))
  • Restart the wcp service on the vCenter Server 
    • service-control --restart wcp

 

Wait for the Supervisor Cluster Removal process to auto complete now.