upgrade-cluster, or master node recreation), you see the bosh deploy fail on pks-nsx-t-prepare-master-vm pre-start scriptpks-nsx-t-prepare-master-vm logs shows the following error:WARN[2025-03-07T01:37:19Z] NSX-T communication config: client tls files not setWARN[2025-03-07T01:37:19Z] NSX-T communication config: server tls authentication is disabledSubmit error, HttpCode 409, retry &{0xc00003edc0 import 30000000000 <nil> <nil>}Submit error, HttpCode 409, retry &{0xc00003edc0 import 30000000000 <nil> <nil>}Submit error, HttpCode 409, retry &{0xc00003edc0 import 30000000000 <nil> <nil>}Submit error, HttpCode 409, retry &{0xc00003edc0 import 30000000000 <nil> <nil>}
This 409 conflict error will be returned from NSX Managers if you rotate the tls-nsx-t cluster cert and the tls-nsx-t cert was uploaded to NSX-T but the Principal Identity user for the cluster didn't get created due to the Superuser PI cert issue introduced by the CARR script.
Please note: The cluster PI user is different than the Global Superuser PI. Each cluster has its own PI user for cluster specific operations. These cluster PI users are created by the Global Superuser PI user when the pks-nsxt-prepare-master-vm pre-start script is run.
tls-nsx-t cert was deleted, you shouldn't get an output if the user was deleted as part of the tls-nsx-t cert rotation # curl -X GET -u 'admin:<PASSWORD>' -k https://<NSX_MGR_FQDN>/api/v1/trust-management/principal-identities | jq -r '.results[] | select(.name == "pks-<CLUSTER_UUID>")'
Example using fake NSX manager (nsx-manager.domain.com) and admin password (AdminPassword123) and cluster (service-instance_7c87b2d4-####-####-####-d8b1c9202801):curl -X GET -u 'admin:AdminPassword123' -k https://nsx-manager.domain.com/api/v1/trust-management/principal-identities | jq -r '.results[] | select(.name == "pks-7c87b2d4-####-####-####-d8b1c9202801")'
tls-nsx-t cert was uploaded to the master node that is failing on the pre-start script:
/var/vcap/jobs/pks-nsx-t-prepare-master-vm/config/nsx_t_client.crttls-nsx-t cert exists in NSX-T or not. (it should as this what causing the 409 error "conflict")
tls-nsx-t certificate is the one labeled: pks-<CLUSTER_UUID>tls-nsx-t cert) if it still exists:
# alias pksnsxcli=/var/vcap/packages/pks-nsx-t-cli/bin/pksnsxcli# pksnsxcli delete principal --instance-id <CLUSTER_INSTANCE_ID> --nsx-manager-host <NSX_MGR_HOSTNAME> --username <USERNAME> --password '<PASSWORD>' --insecure"
tls-nsx-t cert using the curl command from NSX Manager or master node:# curl -X DELETE -sku 'admin' "https://<NSX_MGR_FQDN>/api/v1/trust-management/certificates/2332836c-####-####-####-859213a8cc17" --header "X-Allow-Overwrite: true"
tls-nsx-t cert to NSX-T >> then bind them together. # pksnsxcli create principal --instance-id 7c87b2d4-####-####-####-d8b1c9202801 --nsx-manager-host <NSX_MGR_IP/FQDN> --username admin --password '<PASSWORD>' --insecure -C /var/vcap/jobs/pks-nsx-t-prepare-master-vm/config/nsx_t_client.crt
# curl -X GET -u 'admin:<PASSWORD>' -k https://<NSX_MGR_FQDN>/api/v1/trust-management/principal-identities | jq -r '.results[] | select(.name == "pks-7c87b2d4-####-####-####-d8b1c9202807")' % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed100 55320 0 55320 0 0 1378k 0 --:--:-- --:--:-- --:--:-- 1385k{ "name": "pks-7c87b2d4-####-####-####-d8b1c9202801", "node_id": "7c87b2d4-####-####-####-d8b1c9202801", "role": "enterprise_admin", "certificate_id": "0d99911f-####-####-####-e86ce073d8a6", "roles_for_paths": [ { "path": "/", "roles": [ { "role": "enterprise_admin" } ], "delete_path": false } ], "is_protected": true, "resource_type": "PrincipalIdentity", "id": "ff6308e7-####-####-####-2cf752454f7a", "display_name": "pks-7c87b2d4-####-####-####-d8b1c9202801", "tags": [ { "scope": "pks/cluster", "tag": "7c87b2d4-####-####-####-d8b1c9202801" } ], "_create_time": 1742603019296, "_create_user": "new-lab-superuser", "_last_modified_time": 1742603019296, "_last_modified_user": "new-lab-superuser", "_system_owned": false, "_protection": "REQUIRE_OVERRIDE", "_revision": 0}tkgi upgrade-cluster <CLUSTER_NAME> or issue commands similar to the following to recreate the cluster from Bosh manifest. This will register the new certificate with NSX-T and push it to Kubernetes VMs:# bosh manifest -d service-instance_<CLUSTER_UUID> > service-instance_<CLUSTER_UUID>.yml
# bosh deploy -d service-instance_<CLUSTER_UUID> service-instance_<CLUSTER_UUID>.yml