NSX-T Load-Balancers and Virtual Server are not working and in "Unknown" states
search cancel

NSX-T Load-Balancers and Virtual Server are not working and in "Unknown" states

book

Article ID: 318329

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • The traffic is hitting the Virtual Server IP but not transferred to any of the pool members.
  • All the Virtual Servers of a given Load-Balancer are affected.
  • In the NSX-T Edge cli (logged as Admin) the following command is not working: 
NSXTEdge01> get load-balancer ########-####-####-################ status
Internal Error: Query LB Engine Failed.
  • In the NSX-T Edge log (/var/log/syslog) the following errors are seen:
When querying the Load-Balancer status:
2021-07-13T09:03:42.461Z NSXTEdge01 NSX 859 - [nsx@6876 comp="nsx-edge" subcomp="agg-service" tid="3263" level="ERROR" errorCode="MPA13822"] [GetVServerStats] Failed to parse json: Missing required key uuidMissing required key virtual_servers
2021-07-13T09:03:42.461Z NSXTEdge01 - [nsx@6876 comp="nsx-edge" subcomp="agg-service" tid="3263" level="ERROR" errorCode="MPA13820"] [VServerStatsHandler] Cannot get stats for vserver with LBS: ########-####-####-################ VServer: ########-####-####-################
and:
2021-07-13T08:02:17.351996+02:00 NSXTEdge01 NSX 31687 LB [nsx@6876 comp=“nsx-edge” subcomp=“nsx-edge-lb.lb” level=“ERROR”] “query nginx stats encountered an error: 7 b’’”
  • The load-balancer is reporting an encoding issue with one or more certificates:
2021-07-13T06:32:30.915325+02:00 NSXTEdge01 NSX 18326 LOAD-BALANCER [nsx@6876 comp=“nsx-edge” subcomp=“lb” s2comp=“lb” level=“FATAL”] [########-####-####-################] PEM_read_bio_X509(“/config/vmware/edge/lb/etc/########-####-####-################/certs/client_ssl_########-####-####-################_########-####-####-################.crt”) failed (SSL: error:0906D066:PEM routines:PEM_read_bio:bad end line)
  • Rebooting or replacing the Edge doesn't fix this issue.
  • This issue manifests after a NSX-T Edge failover or reboot.

 

 

Environment

VMware NSX-T Data Center
VMware NSX-T Data Center 3.x

Cause

The same certificate with different names is applied to the same Virtual Server. After a restart of the load-balancer service (Failover / Reboot of the NSX-T Edge) the configuration will fail to be loaded.

Resolution

Currently, there is no resolution.

Workaround:
The following workarounds will use this NSX-T Edge log error (/var/log/syslog) as reference:

2021-07-13T06:32:30.915325+02:00 NSXTEdge01 NSX 18326 LOAD-BALANCER [nsx@6876 comp=“nsx-edge” subcomp=“lb” s2comp=“lb” level=“FATAL”] [########-####-####-################] PEM_read_bio_X509(“/config/vmware/edge/lb/etc/########-####-####-################/certs/client_ssl_########-####-####-################_########-####-####-################.crt”) failed (SSL: error:0906D066:PEM routines:PEM_read_bio:bad end line)
Identify the Certificates NSX-T Edge is not able to read: failed (SSL: error:0906D066:PEM routines:PEM_read_bio:bad end line)


There are two ways to check the above:

  • From NSX-T Edge Root access:

If the number of Virtual Server and certificates on this Load-Balancer is important, this method is preferred.

  1. SSH in the NSX-T Edge (Where the T1 Load-balancer service is active).
  2. Navigate to the Load-Balancer certificates folder: cd /config/vmware/edge/lb/etc/<Load-Balancer ID>/certs/
  3. Identify the certificate(s) which is/are referred in the error. 
  4. Run the command "ls -l" and identify the certificate which has a bigger size than the others (> 10k). These certificate files are likely to cause the issue.
  5. Review the content of the certificate: In this example, use the command: "less client_ssl_########-####-####-################_########-####-####-################.crt"

A good certificate will have the following format:

-----BEGIN CERTIFICATE-----
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
-----END CERTIFICATE-----

The certificate format causing this issue will be:

-----BEGIN CERTIFICATE-----
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
-----END CERTIFICATE----------BEGIN CERTIFICATE-----
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
-----END CERTIFICATE-----
-----END CERTIFICATE-----

The above indicates the same certificate has been applied with a different name to the same Virtual Server.
The next steps needed are:

  1. Identify the certificate name with the above file.
  2. Identify on which Virtual Server(s) the certificate is applied.
  3. Edit the Virtual Server(s) and remove the duplicate Certificate (In the SSL configuration).
  4. Repeat the above with the other certificates (if any).
  • From NSX-T REST API:

If the Virtual Server and Certificates causing this issue can be found easily, this method is preferred.

  1. Run the following Policy API: GET https://<policy-mgr>/policy/api/v1/infra/lb-virtual-server/<Virtual Server ID> and gather the output.
  2. Run the following Manager API: GET https://<nsx-mgr>/api/v1/loadbalancer/virtual-servers/########-####-####-################ and gather the output.
  3. The number of certificates in both outputs will be different.
  4. In the Manager API output, the same certificate ID will be present more than once.
  5. Using the NSX-T UI, navigate to the Virtual Server. Edit the Virtual Server(s) and remove the duplicate Certificate (In the SSL configuration).



Additional Information

Impact/Risks:
The Load-balancer is not working, hence none of its Virtual Server are working.