Creating Supervisor and Guest cluster through NSX Application Platform (NAPP); Guest cluster creation hangs in "Creating" state with VMOP error "400 Bad Request:" and CLS error "An error occurred: future must be done"
search cancel

Creating Supervisor and Guest cluster through NSX Application Platform (NAPP); Guest cluster creation hangs in "Creating" state with VMOP error "400 Bad Request:" and CLS error "An error occurred: future must be done"

book

Article ID: 319335

calendar_today

Updated On:

Products

VMware vSphere ESXi VMware vSphere Kubernetes Service

Issue/Introduction

Symptoms:

When creating Supervisor and Guest cluster through NSX Application Platform (NAPP); Guest cluster creation hangs in "Creating" state with VMOP error "400 Bad Request:" and CLS error "An error occurred: future must be done"

Environment Information:

  • vCenter version and build: 7.0.3 2099077
  • ESXi version and build: 7.0.3 20842708
  • Type of Load balancer in use: Haproxy
  • NSX Application Platform (NAPP)
  • NSX-T version: 3.2.2.0.0.20737185
  • Supervisor Version:   - v1.22.6+vmware.1-vsc0.0.20-20696196
  • Guest Cluster Version:   - v1.21.6---vmware.1-tkg.1.b3d708a
  • Deployment Type: HAProxy

Errors:

The specific problem signature shows up as follows: 

   1. TKC shows “Creating” in vSphere WCP UI

   2.  vSphere UI -> WCP NS -> k8s events show:

machinehealthcheck for worker and CP shows these reconcile error messages show in vSphere UI:

“error creating client and cache for remote cluster: error creating dynamic rest mapper for remote cluster”


   3. WCPMachine reconcile error message and VM createfailure message for the CP node show in vSphere UI:

“vm is not yet created vmware-system-capw-controller-manager/WCPMachine/“


   4. And you may see errors deploying from content library:

“deploy from content library failed for image “xxxxx”: 400 Bad Request: {“type”.”com.vmware.vapi.std.errors.invalid_argument”.”value”:{“error_type”:”INVALID_ARGUMENT”,”messages”:[{“args”:[“future must be done”],”default_message”:”An error occurrred: future must be done”,”id”:”com.vmware.vdcs.util.unhandled_error”}]}}


   5. VMOP logs on all SV CP VMs show "400 Bad Request", "future must be done" errors:

./master-vm-848127.tgz_extracted/wcp-agent-422e39967b1144172b78c4120e8e2b20-2023-03-23--20.49-80574/var/log/pods/vmware-system-vmop_vmware-system-vmop-controller-manager-7c556d5f9f-2lvv5_da00a708-de95-43ca-b65e-b42e37f8f772/manager/3.log:2023-03-23T20:49:35.074339517Z stderr F E0323 20:49:35.074232    1 virtualmachine_controller.go:263] VirtualMachine "msg"="Failed to reconcile VirtualMachine" "error"="deploy from content library failed for image \"ob-18900476-photon-3-k8s-v1.21.6---vmware.1-tkg.1.b3d708a\": 400 Bad Request: {\"type\":\"com.vmware.vapi.std.errors.invalid_argument\",\"value\":{\"error_type\":\"INVALID_ARGUMENT\",\"messages\":[{\"args\":[\"future must be done\"],\"default_message\":\"An error occurred: future must be done\",\"id\":\"com.vmware.vdcs.util.unhandled_error\"}]}}" "name"="nsx-01/napp-cluster-01-control-plane-zcwjw"


   6. CLS shows "vim.fault.NoPermission":

2023-03-23T21:06:19.965Z | ERROR | 4104bad3-9fb5-4e96-99cc-a1bbf17d56e8-da-40 | cls-simple-activity-20 | EnsureTaskRegisteredActivity | Cannot change state for ManagedObjectReference: type = Task, value = task-2674878, serverGuid = 5d8c93ad-dbda-42e9-a24c-0cb7fcc7aef2 from queued to running. Runtime error reported for task.setState (vim.fault.NoPermission) {
  faultCause = null,
  faultMessage = null,
  object = ManagedObjectReference: type = ResourcePool, value = resgroup-848141, serverGuid = 5d8c93ad-dbda-42e9-a24c-0cb7fcc7aef2,
  privilegeId = Task.Update,
  missingPrivileges = null
}. retrying...

   7. CLS then shows: 

2023-03-23T21:00:08.350Z | DEBUG  | b1e2dc39-880e-45e3-8f6a-6ae44d4b1a21-1b | cls-simple-activity-9   | ApiMethodSkeleton       | Method implementation threw a VMODL2 error
com.vmware.vapi.std.errors.InvalidArgument: InvalidArgument (com.vmware.vapi.std.errors.invalid_argument) => {
    messages = [LocalizableMessage (com.vmware.vapi.std.localizable_message) => {
    id = com.vmware.vdcs.util.unhandled_error,
    defaultMessage = An error occurred: future must be done,
    args = [future must be done],
    params = <null>,
    localized = <null>
}],
    data = <null>,
    errorType = INVALID_ARGUMENT
}
        at sun.reflect.GeneratedConstructorAccessor423.newInstance(Unknown Source) ~[?:?]
<SNIP>

 

   8. The NSX APP appliance shows these errors:

E0323 19:46:24.246564    1 vmprovider.go:214] vsphere "msg"="Clone VirtualMachine failed" "error"="deploy from content library failed for image \"ob-18900476-photon-3-k8s-v1.21.6---vmware.1-tkg.1.b3d708a\": 400 Bad Request: {\"type\":\"com.vmware.vapi.std.errors.invalid_argument\",\"value\":{\"error_type\":\"INVALID_ARGUMENT\",\"messages\":[{\"args\":[\"future must be done\"],\"default_message\":\"An error occurred: future must be done\",\"id\":\"com.vmware.vdcs.util.unhandled_error\"}]}}" "vmName"="nsx-01/napp-cluster-01-control-plane-zcwjw"
E0323 19:46:24.246588    1 virtualmachine_controller.go:748] VirtualMachine "msg"="Provider failed to create VirtualMachine" "error"="deploy from content library failed for image \"ob-18900476-photon-3-k8s-v1.21.6---vmware.1-tkg.1.b3d708a\": 400 Bad Request: {\"type\":\"com.vmware.vapi.std.errors.invalid_argument\",\"value\":{\"error_type\":\"INVALID_ARGUMENT\",\"messages\":[{\"args\":[\"future must be done\"],\"default_message\":\"An error occurred: future must be done\",\"id\":\"com.vmware.vdcs.util.unhandled_error\"}]}}" "
name"="nsx-01/napp-cluster-01-control-plane-zcwjw"

Environment

VMware vSphere 7.0 with Tanzu

Cause

An issue was discovered where there is a problem with the group membership of the vpxd-extension user(s).  

vpxd-extension user(s) may need to be removed from the Administrators group:

Example of what these users might look like:

['vpxd-extension-########-####-####-####-############', > 'vpxd-extension-########-####-####-####-############']


These vpxd-extension-#### user(s) should be added to the ServiceProviderUsers group.

Resolution

Please contact VMware Support to have them assist you in running the fixAdministratorsGroup.py script.
 

This will fix the membership of the vpxd-extension users.