TKGI NCP Job Fails with Configure NSX Resources
search cancel

TKGI NCP Job Fails with Configure NSX Resources

book

Article ID: 316957

calendar_today

Updated On:

Products

VMware Tanzu Kubernetes Grid Integrated (TKGi)

Issue/Introduction

If the NSX-T manager certificate has errors you will see failures on TKGI upgrades or installation.

 

  • When running tkgi create-cluster, the ncp job on the master canary looks to be failing when running /usr/local/bin/configure_nsx_resources while trying to create a switching profile for the TKGI cluster.
  • In the below log we can clearly see that an NSX certificate was loaded, a function was executed, and an error is generated with a request to check your NSX configuration. We can deduce that the certificate would be the main culprit considering that was the last action taken prior to the function attempting execution.
  • This can be seen in the ncp_pre-start.stdout.log below:

Creating switching profile election-profile-tkgi-########-####-####-####-ada361ae8156
Successfully created certificate file /tmp/nsx_cert.pem for NSX client connection.

Authenticating with NSX using client certificate loaded at /etc/nsx-ujo/certs/nsx/client.crt and private key loaded at /etc/nsx-ujo/certs/nsx/client.key

WARNING: Failed to execute function create_switchingprofile: Service cluster: 'https://mgr.dev.nsxt.CompanyName.int' is unavailable. Please, check NSX setup and/or configuration, will retry after 5 seconds
Creating switching profile election-profile-tkgi-########-####-####-####-ada361ae8156

 

  • This can be seen in 2 sections of the openssl_cmd.txt logs:

openssl s_client -connect mgr.dev.nsxt.CompanyName.int:443
CONNECTED(00000003)

depth=0 C = US, ST = State, L = City, OU = server, O = CompanyName Worldwide, CN = mgr.dev.nsxt.CompanyName.int
verify error:num=20:unable to get local issuer certificate
verify return:1
depth=0 C = US, ST = State, L = City, OU = server, O = CompanyName Worldwide, CN = mgr.dev.nsxt.CompanyName.int
verify error:num=27:certificate not trusted
verify return:1
depth=0 C = US, ST = State, L = City, OU = server, O = CompanyName Worldwide, CN = mgr.dev.nsxt.CompanyName.int
verify error:num=21:unable to verify the first certificate
verify return:1
---
Certificate chain
 0 s:/C=US/ST=State/L=City/OU=server/O=CompanyName Worldwide/CN=mgr.dev.nsxt.CompanyName.int
   i:/C=BE/O=CompanyName Worldwide/OU=Corporate Security/CN=CompanyName DEV Generic Sub CA1 G2
---

 

SSL-Session:

    Protocol  : TLSv1.2
    Cipher    : ECDHE-RSA-AES256-GCM-SHA384
    Session-ID: ############################################################
    Session-ID-ctx:
    Master-Key: ########################################################################
    Key-Arg   : None
    PSK identity: None
    PSK identity hint: None
    SRP username: None
    Start Time: 1538676671
    Timeout   : 300 (sec)
    Verify return code: 21 (unable to verify the first certificate)
---

 

  • This can also be seen in the ncp_pre-start.stderr.log:

 

No handlers could be found for logger "vmware_nsxlib.v3.cluster"
Traceback (most recent call last):
  File "/usr/local/bin/configure_nsx_resources", line 285, in <module>
    if not args.func(args):
  File "/usr/local/bin/configure_nsx_resources", line 40, in wrapper
    raise e
vmware_nsxlib.v3.exceptions.ServiceClusterUnavailable: Service cluster: 'https://mgr.dev.nsxt.CompanyName.int' is unavailable. Please, check NSX setup and/or configuration

 

  • Download the NSX Manager CA certificate and view the certificate using openssl or crypto shell. You will see CA:FALSE, pathlen:0 in the X509v3 Basic Constraints: critical section.

Environment

VMware Tanzu Kubernetes Grid Integrated 1.x

Cause

  • An error in certificate creation for NSX will cause failures in TKGI installations and upgrades.
  • The root cause of this issue is the below lines in the certificate:

    X509v3 Basic Constraints: critical
    CA:FALSE, pathlen:0

  • You will need to configure your NSX Manager certificate correctly. CA: cannot be FALSE while pathlen: 0.

Resolution

Properly configuring your certificate per industry standards permanently resolves the issue.

Additional Information

https://docs.openssl.org/3.3/man5/x509v3_config/#name 

 

Certificate authority - OpenSSL Basic Constraints - Information Security Stack Exchange

https://security.stackexchange.com/questions/153310/openssl-basic-constraints


Impact/Risks:

Configuring the certificate in accordance with industry standards has no negative impacts on the environment in fact that is the only way to make the software work.