An alarm is raised in the NSX Manager GUI indicating upcoming certificate expiration for transport nodes.
Alarm: Transport Node Certificate Expiration Approaching
During certificate renewal using the CARR script, the process partially succeeds:
The CARR script reports the following error during execution:
This indicates that while the certificate rotation task works for some Edge nodes, it can fail with a 409 Conflict error for others.
From the CARR logs, the following error is observed:
Additionally, no GET API request for the affected Edge node UUID is logged, indicating that the CARR script could not fetch the required certificate ownership details.
25-07-29 13:25:43,424 - carr.recovery.base_recovery_task - MainThread - ERROR - base_recovery_task.py:86 - Error in recover task: <class 'carr.recovery.edge_node_cert_rotation_task.EdgeNodeCertRotationTask'> cert_name: EDGE : error: Error in replacing cert for the client : XXXXXX-XXXX-XXXX-XXXXX-XXXXXX: 409 Client Error: Conflict for url: https://10.XXX.XX.XX:443/api/v1/trust-management/certificates/action/replace-host-certificate/XXXXX-XXX-XXX-XXX-XXXXXXX
From the Proton logs (/var/log/proton/nsxapi.log), when the certificate replacement is attempted, the system throws an InvalidOwnerException:
NSX version: 4.2.1
As a workaround, the certificate for the affected Edge nodes can be manually replaced using API calls with the overwrite option enabled.
Manually replace the host certificate using API
Use the -H "x-allow-overwrite:true" header in the curl command to allow overwriting the existing certificate.
Test the procedure on one Edge node first to confirm success before applying it to the second node.
Refer to the Replacing certificates documentation for steps:
Example:
curl -k -X POST \
-H "Content-Type: application/json" \
-H "x-allow-overwrite:true" \
-u admin:<password> \
https://<nsx-mgr>/api/v1/trust-management/certificates/action/replace-host-certificate/<edge-node-id>
Obtain the private key of the certificate
Run the following command from any NSX Manager (with root credentials):
curl -k -X GET \
-H "Content-Type: application/json" \
-H "X-NSX-Username:admin" \
-H "X-NSX-Groups:superuser" \
"http://127.0.0.1:7440/nsxapi/api/v1/trust-management/certificates/<cert-id>?action=get_private"Steps in sequence
Step 1:
Create a self signed certificate following doc Create-self-signed-certificates
(Note: This will not be a service certificate.)
Step 2: Once the certificate is created, a cert-id is generated. Retrieve the private key and certificate details using the command.
curl -k -X GET -H "Content-Type: application/json" -H 'X-NSX-Username:admin' -H 'X-NSX-Groups:superuser' "http://127.0.0.1:7440/nsxapi/api/v1/trust-management/certificates/<cert-id>?action=get_private
Step 3: Replace the certificate on the Edge node via API or Postman, following the steps in the Importing-certificates doc
The issue is likely due to changes in RBAC (Role-Based Access Control) between the time the Transport Nodes were created and when the certificates were replaced.