Error "HttpCode 409" returned by pks-nsx-t-prepare-master-vm script during TKGI cluster deploy operation
search cancel

Error "HttpCode 409" returned by pks-nsx-t-prepare-master-vm script during TKGI cluster deploy operation

book

Article ID: 401024

calendar_today

Updated On:

Products

VMware Tanzu Kubernetes Grid Integrated Edition VMware Tanzu Kubernetes Grid Integrated Edition (Core)

Issue/Introduction

  • During TKGI deploy operations (such as upgrade-cluster, or master node recreation), you see the bosh deploy fail on pks-nsx-t-prepare-master-vm pre-start script
  • The  pks-nsx-t-prepare-master-vm logs shows the following error:

    WARN[2025-03-07T01:37:19Z] NSX-T communication config: client tls files not set
    WARN[2025-03-07T01:37:19Z] NSX-T communication config: server tls authentication is disabled
    Submit error, HttpCode 409, retry &{0xc00003edc0 import 30000000000 <nil> <nil>}
    Submit error, HttpCode 409, retry &{0xc00003edc0 import 30000000000 <nil> <nil>}
    Submit error, HttpCode 409, retry &{0xc00003edc0 import 30000000000 <nil> <nil>}
    Submit error, HttpCode 409, retry &{0xc00003edc0 import 30000000000 <nil> <nil>}

 

Cause

This 409 conflict error will be returned from NSX Managers if you rotate the tls-nsx-t cluster cert and the tls-nsx-t cert was uploaded to NSX-T but the Principal Identity user for the cluster didn't get created due to the Superuser PI cert issue introduced by the CARR script.  

Resolution

Please note: The cluster PI user is different than the Global Superuser PI. Each cluster has its own PI user for cluster specific operations. These cluster PI users are created by the Global Superuser PI user when the pks-nsxt-prepare-master-vm pre-start script is run.

 

  • Run the following command to confirm that the PI user for the cluster that uses the tls-nsx-t cert was deleted, you shouldn't get an output if the user was deleted as part of the tls-nsx-t cert rotation 

curl -X GET -u 'admin:<PASSWORD>' -k https://<NSX_MGR_FQDN>/api/v1/trust-management/principal-identities | jq -r '.results[] | select(.name == "pks-<CLUSTER_UUID>")'


Example using fake NSX manager (nsx-manager.domain.com) and admin password (AdminPassword123) and cluster (service-instance_7c87b2d4-####-####-####-d8b1c9202801):

curl -X GET -u 'admin:AdminPassword123' -k https://nsx-manager.domain.com/api/v1/trust-management/principal-identities | jq -r '.results[] | select(.name == "pks-7c87b2d4-####-####-####-d8b1c9202801")'


  • Confirm if the new tls-nsx-t cert was uploaded to the master node that is failing on the pre-start script:
  • If the master node contains the new cert then:
    • Confirm if the cluster tls-nsx-t cert exists in NSX-T or not. (it should as this what causing the 409 error "conflict") 
      • The cert can be found on NSX web client by searching for the cluster ID, then showing Certificates. The tls-nsx-t certificate is the one labeled: pks-<CLUSTER_UUID>

    • Delete the Principal Identity user for the cluster (this is the owner of the tls-nsx-t cert) if it still exists:

 

  • Delete the new tls-nsx-t cert using the curl command from NSX Manager or master node:

# curl -X DELETE -sku 'admin' "https://<NSX_MGR_FQDN>/api/v1/trust-management/certificates/2332836c-####-####-####-859213a8cc17" --header "X-Allow-Overwrite: true"


  • Then create the PI user using the following command, this command will create the user >> import the new tls-nsx-t cert to NSX-T >> then bind them together. 

# pksnsxcli create principal --instance-id 7c87b2d4-####-####-####-d8b1c9202801 --nsx-manager-host <NSX_MGR_IP/FQDN> --username admin --password '<PASSWORD>' --insecure -C /var/vcap/jobs/pks-nsx-t-prepare-master-vm/config/nsx_t_client.crt


  • Confirm that the cluster PI user was created and the nsx-t tls cert was uploaded to nsx-t using the following command:

    # curl -X GET -u 'admin:<PASSWORD>' -k https://<NSX_MGR_FQDN>/api/v1/trust-management/principal-identities | jq -r '.results[] | select(.name == "pks-7c87b2d4-####-####-####-d8b1c9202807")'

     % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                     Dload  Upload   Total   Spent    Left  Speed
    100 55320    0 55320    0     0  1378k      0 --:--:-- --:--:-- --:--:-- 1385k
    {
      "name": "pks-7c87b2d4-####-####-####-d8b1c9202801",
      "node_id": "7c87b2d4-####-####-####-d8b1c9202801",
      "role": "enterprise_admin",
      "certificate_id": "0d99911f-####-####-####-e86ce073d8a6",
      "roles_for_paths": [
        {
          "path": "/",
          "roles": [
            {
              "role": "enterprise_admin"
            }
          ],
          "delete_path": false
        }
      ],
      "is_protected": true,
      "resource_type": "PrincipalIdentity",
      "id": "ff6308e7-####-####-####-2cf752454f7a",
      "display_name": "pks-7c87b2d4-####-####-####-d8b1c9202801",
      "tags": [
        {
          "scope": "pks/cluster",
          "tag": "7c87b2d4-####-####-####-d8b1c9202801"
        }
      ],
      "_create_time": 1742603019296,
      "_create_user": "new-lab-superuser",
      "_last_modified_time": 1742603019296,
      "_last_modified_user": "new-lab-superuser",
      "_system_owned": false,
      "_protection": "REQUIRE_OVERRIDE",
      "_revision": 0
    }


  • Upgrade the cluster using tkgi upgrade-cluster <CLUSTER_NAME> or issue commands similar to the following to recreate the cluster from Bosh manifest. This will register the new certificate with NSX-T and push it to Kubernetes VMs:

    # bosh manifest -d service-instance_<CLUSTER_UUID> > service-instance_<CLUSTER_UUID>.yml

    # bosh deploy -d service-instance_<CLUSTER_UUID> service-instance_<CLUSTER_UUID>.yml