Kubernetes cluster master IP is not accessible through load balancer VIP when virtual server limit is reached in VMware Enterprise PKS
search cancel

Kubernetes cluster master IP is not accessible through load balancer VIP when virtual server limit is reached in VMware Enterprise PKS

book

Article ID: 298596

calendar_today

Updated On:

Products

VMware Tanzu Kubernetes Grid Integrated Edition

Issue/Introduction

Users are unable to access the master load balancer (LB) IP from a cluster with kubectl.

When running kubectl cluster-info, the error message below is observed:
Unable to connect to the server: dial tcp <clusterVIP>:8443: i/o timeout

However, from the master node, the kubectl command works fine:
bosh -d service-instance_ID ssh master/0
sudo -i 
find / -iname kubectl
/var/vcap/data/packages/kubernetes/18e8bdc60532374c5d318386791cecf1e5587eb9/bin/kubectl
alias kubectl=/var/vcap/data/packages/kubernetes/18e8bdc60532374c5d318386791cecf1e5587eb9/bin/kubectl
kubectl cluster-info
Kubernetes master is running at http://localhost:8080
CoreDNS is running at http://localhost:8080/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
The issue presents itself when there is a relatively high number of services created on a cluster above 15 per namespace.

The log on the Edge syslog indicates that the maximum virtual servers has been reached.
2019-12-03T11:04:56.884920+00:00 nsxedge2-pks NSX 8740 LB [nsx@6876 comp="nsx-edge" subcomp="nsx-edge-lb.lb_log" level="ERROR" errorCode="EDG9999999"] [cfe7930d-9d71-4274-a981-d66414e6da32] [error] 8740#0: l4lb failed to allocate vs - max services 

Running Platform

  • TKG-i Version: 1.5.0-build.32 
  • NSX-T 2.4.2 Network Details: Load Balancer Size (lb_size): "small" 

Affected Versions

  • NSX-T 2.4.0 to 2.4.2 
  • Fixed in NSX-T 2.4.3

Fixed Issue 2416081: Adds the capability to increase the number of Virtual Servers (from 10 to 20) in a small form factor NSX-T load balancer.


Environment

Product Version: 1.5

Resolution

For NSX-T 2.4.2, the following steps needs to be completed. This procedure is for "small edge," for more information, refer to this VMware KB article. https://kb.vmware.com/s/article/76464

1. Confirm on the Edge, the status of the following file:

cat /config/vmware/edge/lb/etc/lbs_small.conf

it is expected to be :
l4_worker_processes 1;
l4_virtual_servers 10;
l4_sessions     100000;


However, observe the corresponding values have to be as follows:

cat /opt/vmware/nsx-edge/lb/etc/lbs_small.conf
l4_worker_processes 1;
l4_virtual_servers 20;
l4_sessions     105000;

Note: There is a difference in the path because this was taken from NSX-T 2.4.3.


For the change to take affect, detach and reattach the LB from the Tier-1 Logical Router. 

Note: In an Enterprise PKS environment it will not be possible to make this change from the UI due the protected principle identity objects. 

To detach the LB from the Tier-1 Gateway, remove the following section from the Payload and perform a PUT operation.

GET https://<NSX_MGR>/api/v1/loadbalancer/services/<UUID>;

  "attachment": {
    "target_id": "a3e9ec60-ad7b-45d3-82c6-cca685ffc7f3",
    "target_display_name": "lb-pks-a6857d10-0c4a-42cb-a9d4-288f052c9f88-rxqnx",
    "target_type": "LogicalRouter",
    "is_valid": true
  },

PUT https://<NSX_MGR>/api/v1/loadbalancer/services/<UUID>; using header X-Allow-Overwrite:True

To reattach the LB to the Tier-1 Gateway, perform the GET again and add back in the attachment section and again update the configuration with the PUT.
GET https://<NSX_MGR>/api/v1/loadbalancer/services/<UUID>;
PUT https://<NSX_MGR>/api/v1/loadbalancer/services/<UUID>; using header X-Allow-Overwrite:True
If a large number of services are created that will consume one virtual server, the expected behavior after you upgrade is that NSX-T will spin a new one until it is also filled up with max capacity of 20 for small edge.


Workaround

From the master node, create a service that will consume NSX-T LB and then delete it, this operation temporarily fixes the problem.

Create a service as per the following documentation: https://kubernetes.io/docs/tasks/access-application-cluster/create-external-load-balancer/
apiVersion: v1
kind: Service
metadata:
  name: example-service
spec:
  selector:
    app: example
  ports:
    - port: 8765
      targetPort: 9376
  type: LoadBalancer
Save the file then run:
kubectl apply -f <filename>
kubectl delete -f <filename>