kubectl access is not working from TCA due to "Concurrent connection limit reached for IP"
search cancel

kubectl access is not working from TCA due to "Concurrent connection limit reached for IP"

book

Article ID: 345741

calendar_today

Updated On:

Products

VMware VMware Telco Cloud Automation

Issue/Introduction

Symptoms:

This issue can happen when you are using kubectl to access a workload cluster. The kubeconfig is downloaded from TCA UI -> Virtual Infrastructure.
And I got random failure of TooManyRequests. 

While doing deployment of CNF, following error is seen in TCA Manger kubectl proxy log:

2023-02-10 20:28:40.005 UTC [Connection Acceptor '*:8500', , , TxId: ]WARN  c.p.m.c.t.http.HttpEndpointListener- Concurrent connection limit reached for IP: /192.168.100.44
2023-02-15 16:35:45.771 UTC [Connection Acceptor '*:8500', , , TxId: ]
                ERROR c.p.m.c.t.http.HttpEndpointListener-
java.net.SocketException: Broken pipe (Write failed)
        at java.base/java.net.SocketOutputStream.socketWrite0(Native Method)
        at java.base/java.net.SocketOutputStream.socketWrite(Unknown Source)
        at java.base/java.net.SocketOutputStream.write(Unknown Source)
        at java.base/sun.security.ssl.SSLSocketOutputRecord.flush(Unknown Source)
        at java.base/sun.security.ssl.HandshakeOutStream.flush(Unknown Source)
        at java.base/sun.security.ssl.CertificateVerify$T13CertificateVerifyProducer.onProduceCertificateVerify(Unknown Source)
        at java.base/sun.security.ssl.CertificateVerify$T13CertificateVerifyProducer.produce(Unknown Source)
        at java.base/sun.security.ssl.SSLHandshake.produce(Unknown Source)
        at java.base/sun.security.ssl.ClientHello$T13ClientHelloConsumer.goServerHello(Unknown Source)
        at java.base/sun.security.ssl.ClientHello$T13ClientHelloConsumer.consume(Unknown Source)
        at java.base/sun.security.ssl.ClientHello$ClientHelloConsumer.onClientHello(Unknown Source)
        at java.base/sun.security.ssl.ClientHello$ClientHelloConsumer.consume(Unknown Source)
        at java.base/sun.security.ssl.SSLHandshake.consume(Unknown Source)
        at java.base/sun.security.ssl.HandshakeContext.dispatch(Unknown Source)
        at java.base/sun.security.ssl.HandshakeContext.dispatch(Unknown Source)
        at java.base/sun.security.ssl.TransportContext.dispatch(Unknown Source)
        at java.base/sun.security.ssl.SSLTransport.decode(Unknown Source)
        at java.base/sun.security.ssl.SSLSocketImpl.decode(Unknown Source)
        at java.base/sun.security.ssl.SSLSocketImpl.readHandshakeRecord(Unknown Source)
        at java.base/sun.security.ssl.SSLSocketImpl.startHandshake(Unknown Source)
        at java.base/sun.security.ssl.SSLSocketImpl.ensureNegotiated(Unknown Source)
        at java.base/sun.security.ssl.SSLSocketImpl$AppOutputStream.write(Unknown Source)
        at com.predic8.membrane.core.transport.http.HttpEndpointListener.writeRateLimitReachedToSource(HttpEndpointListener.java:236)
        at com.predic8.membrane.core.transport.http.HttpEndpointListener.run(HttpEndpointListener.java:134)
2023-02-15 16:35:45.772 UTC [Connection Acceptor '*:8500', , , TxId: ]
                WARN  c.p.m.c.t.http.HttpEndpointListener- Concurrent connection limit reached for IP: /192.168.100.44
Received the following content 
===END===
2023-01-24 17:16:24.841 UTC [Connection Acceptor '*:8500', , , TxId: ]
                WARN  c.p.m.c.t.http.HttpEndpointListener- Concurrent connection limit reached for IP: /192.168.187.100
Received the following content
 


Environment

VMware Telco Cloud Automation 2.1
VMware Telco Cloud Automation 2.1.1
VMware Telco Cloud Automation 2.2

Cause

Issue is happening because TCP connections that are established between TCA-M & TCA-CP are not quickly terminated when the connections from source to TCA-M are terminated. In TCA 2.1 TCP connections are periodically terminated every 8 seconds and the max number of concurrent connections allowed is 60. It is possible that there can be more than 60 connections initiated from same source with in 8 seconds and hence some connections will start getting rejected.

Resolution

This issue is fixed in TCA 2.2 release.

With TCA 2.2, fix is to increase max concurrent connection limit to 2048 between TCA-M and TCA-CP.

Workaround:

As a work around customer can try any of the below 2 methods:

1) Re-start proxy service via TCA-CP appliance management UI. TCA-CP appliance management UI can be accessed over https://<tca-cp-ip-address>:9443 and logging in with username as admin and corresponding password. Go to Appliance Summary page here and click on STOP and then START for Proxy Service. Doing this will get rid of stale connections and restarting proxy service wont impact other TCA-CP functionality as such. 

2) get the kubeconfig for the workload cluster directly from management cluster. This can be done by either logging into management cluster control plane node or TCA-CP via SSH. If you are logging into TCA-CP via SSH, confirm that context is set to management cluster by doing 'kubectl config get-contexts' command. Usually, management cluster context will be available on TCA-CP for root user. So you can SSH as admin and switch user to root. After this, get the kubeconfig of corresponding workload cluster using this command:
 

kubectl get secret -n <cluster_name>  <cluster_name>-kubeconfig -o jsonpath='{.data.value}'|base64 -d  > cluster_name.kubeconfig