In Tanzu Kubernetes Grid Integrated Edition (TKGi) deployments using NSX-T, a Layer 7 (L7) Virtual Server may show a status of Down when a TLS ingress is created with a fully qualified domain name (FQDN) longer than 110 characters.
This issue arises due to internal constraints in the NSX-T load balancer, specifically related to handling long Server Name Indication (SNI) values.
Reproduction steps
Create a certificate with SAN DNS longer than 110 characters:
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: secret-tls-mydomain
namespace: default
spec:
# Secret names are always required.
secretName: example-com-tls
duration: 2160h # 90d
renewBefore: 360h # 15d
subject:
organizations:
- rws
dnsNames:
- subdomain-very-long-host-name-part-example-12345678901234567890123456789012345678901234567890123456.example.fake.example.com
issuerRef:
name: ca-issuer
kind: ClusterIssuer
Create ingress using same hostname and TLS:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
annotations:
name: ingress-test-app
namespace: default
spec:
rules:
- host: subdomain-very-long-host-name-part-example-12345678901234567890123456789012345678901234567890123456.example.fake.example.com
http:
paths:
- backend:
service:
name: nginx-cluster-service
port:
number: 80
path: /
pathType: Prefix
tls:
- hosts:
- subdomain-very-long-host-name-part-example-12345678901234567890123456789012345678901234567890123456.example.fake.example.com
secretName: example-com-tls
Testing the certificate from the server shows default NSX-LB certificate:
openssl s_client -connect 10.xxx.xxx.169:443 -servername subdomain-very-long-host-name-part-example-12345678901234567890123456789012345678901234567890123456.example.fake.example.com </dev/null 2>/dev/null | openssl x509 -noout -subject -ext subjectAltName
subject=CN = nsx-lb
No extensions in certificate
This indicates that the default NSX LB certificate was served instead of the correct TLS certificate.
In contrast, using a shorter hostname results in the expected certificate:
openssl s_client -connect 10.xxx.xxx.169:443 -servername subdomain-very-long-host-example.example.fake.example.com </dev/null 2>/dev/null | openssl x509 -noout -subject -ext subjectAltName
subject=O = rws
X509v3 Subject Alternative Name:
DNS:subdomain-very-long-host-example.example.fake.example.com
TKGi 1.2x
NSX below 4.2.2.3
No errors reported on the NCP services during ingress creation and certificate is present on NSX Certificates page successfully uploaded, however the NSX LB fails.
The NSX-T Load Balancer fails to start due to the length of the SNI hostname (Common Name) used in the certificate, which exceeds the internal limit. The specific issue arises when the total length of the SNI certificate's Common Name exceeds 110 characters, which prevents nginx (used internally by NSX-T LB) from building the server_names_hash due to a limited server_names_hash_bucket_size.
Error Seen in NSX Edge Logs:
/var/log/syslog:
NSX 2023043 LOAD-BALANCER [nsx@6876 comp="nsx-edge" subcomp="lb" s2comp="lb" level="FATAL"]
[UUID] could not build server_names_hash, you should increase server_names_hash_bucket_size: 128
Reference KB:
https://knowledge.broadcom.com/external/article/306205/layer-7-virtual-server-status-is-down.html
Starting with TKGi 1.21, you can prevent ingress objects with overly long hostnames by applying a ValidatingAdmissionPolicy and binding:
---
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingAdmissionPolicy
metadata:
name: demo-policy.example.com
spec:
failurePolicy: Fail
matchConstraints:
resourceRules:
- apiGroups: ["networking.k8s.io"]
apiVersions: ["v1"]
operations: ["CREATE", "UPDATE"]
resources: ["ingresses"]
validations:
- message: "Each host in spec.rules must be less than 110 characters."
expression: >
has(object.spec.rules) && object.spec.rules.all(r, !has(r.host) || size(r.host) < 110)
---
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingAdmissionPolicyBinding
metadata:
name: demo-policy-binding
spec:
policyName: demo-policy.example.com
matchResources:
namespaceSelector: {} # applies to all namespaces
validationActions: ["Deny"]
Once the ValidatingAdmissionPolicy and the ValidatingAdmissionPolicyBinding are applied kubernetes will deny any create or modify attempts of ingress objects that have host name longer than 110 characters.
Attempt to create ingress from above example results with below message:
kubectl apply -f ing.yaml
The ingresses "ingress-test-app" is invalid: : ValidatingAdmissionPolicy 'demo-policy.example.com' with binding 'demo-policy-binding' denied request: Each host in spec.rules must be less than 110 characters.
You can adjust the length threshold by modifying the expression value.
To change enforcement, delete and re-apply the updated policy and binding.