Newly deployed Layer 7 Virtual Server (L7 VS) shows a status of Down. These load balancers are created as part of TKGi deployments.
search cancel

Newly deployed Layer 7 Virtual Server (L7 VS) shows a status of Down. These load balancers are created as part of TKGi deployments.

book

Article ID: 402181

calendar_today

Updated On:

Products

VMware Tanzu Kubernetes Grid Integrated Edition

Issue/Introduction

In Tanzu Kubernetes Grid Integrated Edition (TKGi) deployments using NSX-T, a Layer 7 (L7) Virtual Server may show a status of Down when a TLS ingress is created with a fully qualified domain name (FQDN) longer than 110 characters.

This issue arises due to internal constraints in the NSX-T load balancer, specifically related to handling long Server Name Indication (SNI) values.

Reproduction steps

Create a certificate with SAN DNS longer than 110 characters:

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: secret-tls-mydomain
  namespace: default
spec:
  # Secret names are always required.
  secretName: example-com-tls
  duration: 2160h # 90d
  renewBefore: 360h # 15d
  subject:
    organizations:
    - rws
  dnsNames:
  - subdomain-very-long-host-name-part-example-12345678901234567890123456789012345678901234567890123456.example.fake.example.com
  issuerRef:
    name: ca-issuer
    kind: ClusterIssuer

Create ingress using same hostname and TLS:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
  name: ingress-test-app
  namespace: default
spec:
  rules:
  - host: subdomain-very-long-host-name-part-example-12345678901234567890123456789012345678901234567890123456.example.fake.example.com
    http:
      paths:
      - backend:
          service:
            name: nginx-cluster-service
            port:
              number: 80
        path: /
        pathType: Prefix
  tls:
  - hosts:
    - subdomain-very-long-host-name-part-example-12345678901234567890123456789012345678901234567890123456.example.fake.example.com
    secretName: example-com-tls

Testing the certificate from the server shows default NSX-LB certificate:

openssl s_client -connect 10.xxx.xxx.169:443 -servername subdomain-very-long-host-name-part-example-12345678901234567890123456789012345678901234567890123456.example.fake.example.com </dev/null 2>/dev/null |   openssl x509 -noout -subject -ext subjectAltName
subject=CN = nsx-lb
No extensions in certificate

This indicates that the default NSX LB certificate was served instead of the correct TLS certificate.

In contrast, using a shorter hostname results in the expected certificate:

openssl s_client -connect 10.xxx.xxx.169:443 -servername subdomain-very-long-host-example.example.fake.example.com </dev/null 2>/dev/null |   openssl x509 -noout -subject -ext subjectAltName
subject=O = rws
X509v3 Subject Alternative Name:
    DNS:subdomain-very-long-host-example.example.fake.example.com

 

 

Environment

TKGi 1.2x

NSX below 4.2.2.3

Cause

No errors reported on the NCP services during ingress creation and certificate is present on NSX Certificates page successfully uploaded, however the NSX LB fails.

The NSX-T Load Balancer fails to start due to the length of the SNI hostname (Common Name) used in the certificate, which exceeds the internal limit. The specific issue arises when the total length of the SNI certificate's Common Name exceeds 110 characters, which prevents nginx (used internally by NSX-T LB) from building the server_names_hash due to a limited server_names_hash_bucket_size.

Error Seen in NSX Edge Logs:

/var/log/syslog:
NSX 2023043 LOAD-BALANCER [nsx@6876 comp="nsx-edge" subcomp="lb" s2comp="lb" level="FATAL"] 
[UUID] could not build server_names_hash, you should increase server_names_hash_bucket_size: 128

Reference KB:
https://knowledge.broadcom.com/external/article/306205/layer-7-virtual-server-status-is-down.html

 

Resolution

 

Starting with TKGi 1.21, you can prevent ingress objects with overly long hostnames by applying a ValidatingAdmissionPolicy and binding:
---
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingAdmissionPolicy
metadata:
  name: demo-policy.example.com
spec:
  failurePolicy: Fail
  matchConstraints:
    resourceRules:
      - apiGroups: ["networking.k8s.io"]
        apiVersions: ["v1"]
        operations: ["CREATE", "UPDATE"]
        resources: ["ingresses"]
  validations:
    - message: "Each host in spec.rules must be less than 110 characters."
      expression: >
        has(object.spec.rules) && object.spec.rules.all(r, !has(r.host) || size(r.host) < 110)

---
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingAdmissionPolicyBinding
metadata:
  name: demo-policy-binding
spec:
  policyName: demo-policy.example.com
  matchResources:
    namespaceSelector: {}  # applies to all namespaces
  validationActions: ["Deny"]

Once the ValidatingAdmissionPolicy and the ValidatingAdmissionPolicyBinding are applied kubernetes will deny any create or modify attempts of ingress objects that have host name longer than 110 characters.

Attempt to create ingress from above example results with below message:

kubectl apply -f ing.yaml
The ingresses "ingress-test-app" is invalid: : ValidatingAdmissionPolicy 'demo-policy.example.com' with binding 'demo-policy-binding' denied request: Each host in spec.rules must be less than 110 characters.

 

Additional Information

 You can adjust the length threshold by modifying the expression value.

To change enforcement, delete and re-apply the updated policy and binding.