Transport Node Certificate Renewal Failure on NSX Edges When Using CARR Script
search cancel

Transport Node Certificate Renewal Failure on NSX Edges When Using CARR Script

book

Article ID: 408004

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

An alarm is raised in the NSX Manager GUI indicating upcoming certificate expiration for transport nodes.

Alarm: Transport Node Certificate Expiration Approaching

During certificate renewal using the CARR script, the process partially succeeds:

The CARR script reports the following error during execution:

 
carr.recovery.base_recovery_task - MainThread - ERROR - base_recovery_task.py:86 - Error in recover task: <class 'carr.recovery.edge_node_cert_rotation_task.EdgeNodeCertRotationTask'> cert_name: EDGE : error: Error in replacing cert for the client : ###-###-##-##-####<clientid>: 409 Client Error: Conflict for url: https://10.###.###.###:443/api/v1/trust-management/certificates/action/replace-host-certificate/###-###-##-##-####<clientid>

This indicates that while the certificate rotation task works for some Edge nodes, it can fail with a 409 Conflict error for others.

 

From the CARR logs, the following error is observed:

 
- DEBUG - connectionpool.py:546 - https://10.XXX.XX.XX:443 "POST /api/v1/trust-management/certificates/action/replace-host-certificate/XXXXX-XXXX-XXXX-XXXX-XXXXXXXXX HTTP/11" 409 None
2025-07-29 13:25:43,423 - carr.interface.rest.base_api - MainThread - ERROR - base_api.py:145 - Response : {
  "httpStatus" : "CONFLICT",
  "error_code" : 223,
  "module_name" : "common-services",
  "error_message" : "Update already in progress."
}

Additionally, no GET API request for the affected Edge node UUID is logged, indicating that the CARR script could not fetch the required certificate ownership details.

25-07-29 13:25:43,424 - carr.recovery.base_recovery_task - MainThread - ERROR - base_recovery_task.py:86 - Error in recover task: <class 'carr.recovery.edge_node_cert_rotation_task.EdgeNodeCertRotationTask'> cert_name: EDGE : error: Error in replacing cert for the client : XXXXXX-XXXX-XXXX-XXXXX-XXXXXX: 409 Client Error: Conflict for url: https://10.XXX.XX.XX:443/api/v1/trust-management/certificates/action/replace-host-certificate/XXXXX-XXX-XXX-XXX-XXXXXXX

From the Proton logs (/var/log/proton/nsxapi.log), when the certificate replacement is attempted, the system throws an InvalidOwnerException:

 
2025-08-11T15:18:48.235Z ERROR http-nio-127.0.0.1-7440-exec-1 OwnershipValidatorImpl 5026 SYSTEM [nsx@6876 comp="nsx-manager" errorCode="MP289" level="ERROR" reqId="XXXXX-XXX-XXX-XXX" subcomp="manager" username="admin"] Principal 'admin' with role '[enterprise_admin]' attempts to delete or modify an object of type nsx$Client it doesn't own. (createUser=nsx_policy, allowOverwrite=null)
2025-08-11T15:18:48.235Z ERROR http-nio-127.0.0.1-7440-exec-1 TxnContext 5026 TX Abort merge: nsx$Client
com.vmware.nsx.management.container.exceptions.InvalidOwnerException: null
  at com.vmware.nsx.management.protection.OwnershipValidatorImpl.checkCallerIsOwner(OwnershipValidatorImpl.java:62) ~[?:?]
  at com.vmware.nsx.persistence.UfoTxn.checkOwnership(UfoTxn.java:885) ~[?:?]
  at com.vmware.nsx.persistence.UfoTxn$MergeCallbackImpl.doMerge(UfoTxn.java:641) ~[?:?]
  at org.corfudb.runtime.collections.TxnContext.merge(TxnContext.java:273) ~[?:?]
 

Environment

NSX version: 4.2.1

Cause

The certificate replacement process can fail for multiple reasons. However, if the failure is specifically associated with the error InvalidOwnership, the outlined workaround is applicable in this case.

 

Resolution

As a workaround, the certificate for the affected Edge nodes can be manually replaced using API calls with the overwrite option enabled.

  1. Manually replace the host certificate using API

    • Use the -H "x-allow-overwrite:true" header in the curl command to allow overwriting the existing certificate.

    • Test the procedure on one Edge node first to confirm success before applying it to the second node.

    Refer to the Replacing certificates documentation for steps:

    Example:

    curl -k -X POST \ -H "Content-Type: application/json" \ -H "x-allow-overwrite:true" \ -u admin:<password> \ https://<nsx-mgr>/api/v1/trust-management/certificates/action/replace-host-certificate/<edge-node-id>
  2. Obtain the private key of the certificate
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           Run the following command from any NSX Manager (with root credentials):

    curl -k -X GET \ -H "Content-Type: application/json" \ -H "X-NSX-Username:admin" \ -H "X-NSX-Groups:superuser" \ "http://127.0.0.1:7440/nsxapi/api/v1/trust-management/certificates/<cert-id>?action=get_private"
     
  3. Steps in sequence

    • Step 1: 

      Create a self signed certificate following doc Create-self-signed-certificates


      (Note: This will not be a service certificate.)

    • Step 2: Once the certificate is created, a cert-id is generated. Retrieve the private key and certificate details using the command.

      curl -k -X GET -H "Content-Type: application/json" -H 'X-NSX-Username:admin' -H 'X-NSX-Groups:superuser' "http://127.0.0.1:7440/nsxapi/api/v1/trust-management/certificates/<cert-id>?action=get_private

    • Step 3: Replace the certificate on the Edge node via API or Postman, following the steps in the Importing-certificates doc

Additional Information

The issue is likely due to changes in RBAC (Role-Based Access Control) between the time the Transport Nodes were created and when the certificates were replaced.