maestro regenerate leaf --all or the maestro regenerate --all commands explicitly discouraged in the TKGI certificate rotation documentation. You see messages similar to the following when running the following command to show the bosh task output:
# bosh task <task-iD>
{"time":1742508521,"stage":"Updating instance","tags":["master"],"total":3,"task":"master/3ec5cd8b-xxxx-xxxx-xxxx-ed6119b4c8e1 (0) (canary)","index":1,"state":"failed","progress":100,"data":{"error":"Action Failed get_task: Task bd055f81-4253-493d-7fbd-9406aa30d45d result: 1 of 8 pre-start scripts failed. Failed Jobs: pks-nsx-t-prepare-master-vm. Successful Jobs: kube-apiserver, etcd, bpm,
Binding the new NSX-T superuser certificate to the Superuser Principal Identity (also references as PI) using "Step 8" from KB How to renew the nsx-t-superuser-certificate used by Principal Identity user will fail with error similar to:
# curl -X POST -u 'admin' -k https://<NSX_MGR_FQDN>/api/v1/trust-management/principal-identities?action=update_certificate -H "Content-Type: application/json" -H "X-Allow-Overwrite: true" -d @bind.jsonEnter host password for user 'admin':{ "httpStatus" : "NOT_FOUND", "error_code" : 600, "module_name" : "common-services", "error_message" : "The requested object : Certificate/ could not be found. Object identifiers are case sensitive."f9fd8b6d-####-####-####-989cecb532f5
Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.
This has been observed on TKGI 1.18 and 1.19 versions, but is not isolated to these versions.
f9fd8b6d-####-####-####-989cecb532f5" by running an older version of the CARR script detailed in the Using Certificate Analyzer Results and Recovery script KB.
pks-nsx-t-prepare-master-vm pre-start script can be edited to show verbose logging and re-run to gather more explicit messaging if needed:
/var/vcap/jobs/pks-nsx-t-prepare-master-vm/bin/pre-startset -eset -ex# /var/vcap/jobs/pks-nsx-t-prepare-master-vm/bin/pre-start
2025-03-04 23:53:21,413 - carr.validations.ver32.stale_certs_validator - MainThread - INFO - stale_certs_validator.py:173 - Found stale Appliance Certificate with id : f9fd8b6d-####-####-####-989cecb532f52025-03-05 00:05:22,599 - carr.interface.cli.cert_hidden_cmd_intf - MainThread - INFO - cert_hidden_cmd_intf.py:45 - Running curl command : curl -k -s -S -X POST -H "Content-Type:application/json" -H "X-NSX-Username:admin" -d '{ "node_id":"{name: '\''f9fd8b6d-####-####-####-989cecb532f5'\'',node_id: '\''a2988b1b-####-####-####-e92bc8dab67e'\'',certificate_id: '\''f9fd8b6d-####-####-####-989cecb532f5'\''}","service_type":"CLIENT_AUTH" }' http://127.0.0.1:7440/nsxapi/api/v1/trust-management/certificates/f9fd8b6d-####-####-####-989cecb532f5?action=release2025-03-05 00:05:22,676 - carr.interface.rest.base_api - MainThread - INFO - base_api.py:92 - path being executed is: DELETE https://<NSX_MGR_FQDN>:443/api/v1/trust-management/certificates/f9fd8b6d-####-####-####-989cecb532f5These steps can be run from an SSH session to the TKGI master node, or the NSX Manager.
# curl -X GET -u 'admin:<PASSWORD>' -k https://<NSX_MGR_FQDN>/api/v1/trust-management/principal-identities | jq -r '.results[]| select(._create_user =="admin")' |grep -E 'name|id|certificate_id'# curl -X GET -u 'admin:AdminPassword123' -k https://nsx-manager.domain.com/api/v1/trust-management/principal-identities | jq -r '.results[]| select(._create_user =="admin")' ||grep -E '"name"|"id"|"certificate_id"'
% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed100 53422 0 53422 0 0 1028k 0 --:--:-- --:--:-- --:--:-- 1043k "name": "new-lab-superuser", "certificate_id": "91abd838-####-####-####-66c2c49a548d", "id": "d5cf6f11-####-####-####-fa6cd69dbb50", "name": "da862e78-####-####-####-48ce93c12648", "certificate_id": "da862e78-###-###-###-48ce93c12648", "id": "bdca6884-####-####-####-a0b3004a43e0", "name": "9949f21d-####-####-####-5623f7fa3b46", "certificate_id": "9949f21d-####-####-####-5623f7fa3b46", "id": "36024c92-####-####-####-b7d0af7db033",,certificate.out that contains all of the certificates stored in NSX-T:# curl -X GET -u 'admin:<PASSWORD>' -k https://<NSX_MGR_FQDN>/api/v1/trust-management/certificates > certificate.outcertificate.out file using the "certificate_id" you got from the step "1a" for each of the PI users, to see if their certificate exist or not; make a note of the PI users whom their certificate is missing
"name": "da862e78-####-####-####-48ce93c12648", "certificate_id": "da862e78-###-###-###-48ce93c12648", "id": "bdca6884-####-####-####-a0b3004a43e0",# curl -X GET -u 'admin:<PASSWORD>' -k https://<NSX_MGR_FQDN>/api/v1/logical-switches | jq -r '.results[]| select(.display_name == "pks-<CLUSTER_UUID>")' | grep -E 'display_name|_create_user'curl -X GET -u 'admin:AdminPAssword123' -k https://nsx-manager.domain.com/api/v1/logical-switches | jq -r '.results[]| select(.display_name == "pks-7c87b2d4-####-####-####-d8b1c9202801")' | grep -E 'display_name|_create_user'"display_name": "pks-7c87b2d4-####-####-####-d8b1c9202801", "_create_user": "new-lab-superuser",
corfu_tool_runner.py. See below a high level summary of the required steps:name", "node_id", and "cert-id" that was in use previously.corfu_tool_runner.py to create a file named pi.out file displaying all Principal Identities. This will provide a "right" and "left" value from the DB.corfu_tool_runner.py to delete the old PI using above "right" and "left" values (this may take up to 5 minutes)..client_truststore in NSX Manager.If, after resolving the issue with Superuser PI certificate, the tkgi upgrade-cluster or the bosh deploy still fail on pks-nsx-t-prepare-master-vm pre-start script and the pks-nsx-t-prepare-master-vm logs shows the following error:
WARN[2025-03-07T01:37:19Z] NSX-T communication config: client tls files not setWARN[2025-03-07T01:37:19Z] NSX-T communication config: server tls authentication is disabledSubmit error, HttpCode 409, retry &{0xc00003edc0 import 30000000000 <nil> <nil>}Submit error, HttpCode 409, retry &{0xc00003edc0 import 30000000000 <nil> <nil>}Submit error, HttpCode 409, retry &{0xc00003edc0 import 30000000000 <nil> <nil>}Submit error, HttpCode 409, retry &{0xc00003edc0 import 30000000000 <nil> <nil>}
This 409 conflict will occur if we rotate the tls-nsx-t cluster cert and the tls-nsx-t cert was uploaded to NSX-T but the PI user for the cluster didn't get created due to the Superuser PI cert issue introduced by the CARR script. Please note: The cluster PI user is different than the Global Superuser PI user addressed in earlier steps. Each cluster has its own PI user for cluster specific operations. These cluster PI users are created by the Global Superuser PI user when the pks-nsxt-prepare-master-vm pre-start script is run.
To resolve this conflict, use this KB