Error: "Failed to attach VIF: Failed to send VIF RPC request" and Tanzu provisioning failures due to expired NSX Transport Node certificate
search cancel

Error: "Failed to attach VIF: Failed to send VIF RPC request" and Tanzu provisioning failures due to expired NSX Transport Node certificate

book

Article ID: 431024

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

Tanzu workload provisioning fails due to virtual network interface (VIF) attachment errors on the ESXi hosts. Observed error messages in vCenter/Tanzu and NSX include:

  • "An error occurred during host configuration: Failed to attach VIF: Failed to send VIF RPC request. Please check if the connectivity between Host and NSX Manager is up."

  • "Failed to attach virtual network interface Unknown to logical switch <LOGICAL_SWITCH_UUID>, message from NSX: Failed to send VIF RPC request."

  • ESXi Transport Nodes report an alarm for "Control Channel to Transport Node Down" in the NSX Manager UI.

Environment

VMware NSX

Cause

The host certificate (/etc/vmware/nsx/host-cert.pem) on the ESXi Transport Node has expired. This prevents the local NSX agents on the host from successfully authenticating and establishing the control channel connection with the NSX Manager Central Control Plane. As a result, network configurations cannot be pushed to the host, leading to VIF attachment failures when Tanzu attempts to provision new workloads.

Resolution

Clear the expired Transport Node certificates and force a regeneration to restore control channel connectivity:

Transport Node certificate has expired and TN is in a disconnected state in NSX:

    1. SSH to the Transport Node as root user
    2. Empty Transport Node certificate and private key

      cat /dev/null > /etc/vmware/nsx/host-cert.pem
      cat /dev/null > /etc/vmware/nsx/host-privkey.pem
    3. Generate a new self-signed TN certificate and key.

      For NSX 4.1.x versions prior to 4.1.2.5:

      a)  Create a temporary openssl config file from the existing openssl config

      cat /etc/vmware/nsx/openssl-proxy.cnf > /tmp/tmp-openssl-proxy.cnf

      b) UUID is extracted and added to the temporary openssl config

      echo "UID = $(grep -o '<uuid>[^<]*' /etc/vmware/nsx/host-cfg.xml | sed 's/<uuid>//')" >> /tmp/tmp-openssl-proxy.cnf

      c) Add extension in the temporary openssl config

      echo -e "[ req_ext ]\nbasicConstraints     = CA:FALSE\nextendedKeyUsage     = clientAuth\nsubjectKeyIdentifier = hash\nauthorityKeyIdentifier = keyid,issuer" >> /tmp/tmp-openssl-proxy.cnf

      d) Replace the certificate, where below -days parameter specifies 3650 days (10 years) validity period

      openssl req -new -newkey rsa:2048 -days 3650 -nodes -x509 -keyout /etc/vmware/nsx/host-privkey.pem -out /etc/vmware/nsx/host-cert.pem -config /tmp/tmp-openssl-proxy.cnf -extensions req_ext

      For NSX 4.1.2.5 and higher restarting nsx-proxy restart creates the new cert-key pair:

      /etc/init.d/nsx-proxy restart

    4. Identify NSX Manager thumbprint, ssh as admin user to NSX Manager

      get certificate api thumbprint

    5. To push the new cert-key pair to the Manager, from root user on the Host or Edge run (Any NSX Manager name or IP can be used)

      Edge
      su admin -c push host-certificate <Manager hostname-or-IP> username admin thumbprint <thumbprint from step 4>
      su admin -c sync-aph-certificates <Manager hostname-or-IP> username admin thumbprint <thumbprint from step 4>

      Host
      nsxcli -c push host-certificate <Manager hostname-or-IP> username admin thumbprint <thumbprint from step 4>
      nsxcli -c sync-aph-certificates <Manager hostname-or-IP> username admin thumbprint <thumbprint from step 4>

    6. In the NSX UI, navigate back to the host under Sytem-->Fabric-->Hosts and resolve the alarm by selecting the host disconnected link.

    7. If the Transport Node status still reflects an error, it is sometimes necessary to restart the nsx-proxy and nsx-opsagent on the Transport Node to restore this connection.

      Edge
      /etc/init.d/nsx-proxy restart
      /etc/init.d/nsx-opsagent-appliance restart

      Host
      /etc/init.d/nsx-proxy restart
      /etc/init.d/nsx-opsagent restart

Additional Information

reference KB: Alarm for "Transport Node Certificate has Expired"