Ingress TLS certificate is removed after TKGI upgrade
search cancel

Ingress TLS certificate is removed after TKGI upgrade

book

Article ID: 327289

calendar_today

Updated On:

Products

VMware NSX Networking

Issue/Introduction

Symptoms:
  • After TKGI upgrade (NCP upgrade from 3.0.2 to 3.1.x)  ingress in the cluster lost its TLS certificate 
  • You see messages similar to the following in the ncp/ncp.stdout.log file on the master node
2021-09-20T21:56:06.980Z eab8ec94-0c51-4e70-88fa-002a5f0895f2 NSX 7933 - [nsx@6876 comp="nsx-container-ncp" subcomp="ncp" level="WARNING"] vmware_nsxlib.v3.client The HTTP request returned error code 409, whereas 201/200 response codes were expected. Response body {'httpStatus': 'CONFLICT', 'error_code': 2038, 'module_name': 'internal-framework', 'error_message': 'Certificate already exists.'}
2021-09-20T21:56:06.980Z eab8ec94-0c51-4e70-88fa-002a5f0895f2 NSX 7933 - [nsx@6876 comp="nsx-container-ncp" subcomp="ncp" level="INFO"] nsx_ujo.ncp.nsx.manager.nsxapi Attempted to import a certificate which has already been imported
2021-09-20T21:56:06.981Z eab8ec94-0c51-4e70-88fa-002a5f0895f2 NSX 7933 - [nsx@6876 comp="nsx-container-ncp" subcomp="ncp" level="WARNING"] nsx_ujo.ncp.nsx.lb_l7_service Secret grafana-enterprise-tls-secret-prod with the same PEM data has been imported, use a different secret instead
2021-09-20T21:56:06.981Z eab8ec94-0c51-4e70-88fa-002a5f0895f2 NSX 7933 - [nsx@6876 comp="nsx-container-ncp" subcomp="ncp" level="INFO"] nsx_ujo.ncp.nsx.lb_l7_service Removing stale cert grafana-enterprise-tls-secret-prod in namespace grafana-enterprise with version number None
2021-09-20T21:56:07.142Z eab8ec94-0c51-4e70-88fa-002a5f0895f2 NSX 7933 - [nsx@6876 comp="nsx-container-ncp" subcomp="ncp" level="INFO"] nsx_ujo.ncp.k8s.service_lb_controller Successfully updated Loadbalancer resources for service ('grafana-enterprise', 'grafana-enterprise-service-prod')

Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.

Environment

VMware NSX-T Data Center

Cause

This issue was due to a new feature added to NCP. The original purpose was to support updating secret so we started tracking secret version number and attach it to cert as a tag.

Before upgrade, the cert had no secret version tag. After upgrade NCP compares current secret version against None, so it did not pass the version check, which is why NCP was considering this cert as stale and removing it.

Under normal circumstance, if it were an actual secret update, it would have re-imported the new certificate. However in this case there was no change in secret, so the cert was just deleted. In Policy it would have no impact.

Resolution

This issue is resolved in NCP 3.1.2.4, which is bundled in TKGI 1.11.6.
For the resolution, please refer to the TKGI 1.11.6 Release Notes ("Issue 2871314").

Workaround:

For a workaround, please refer to the NCP 3.1.2 Release Notes ("Issue 2871314").