Symptoms :
Logs
- In the proxy-tomcat-wrapper.log
located in the /var/log/proxy/
directory of the NSX Manager, a significant number of threads with identical stack traces is observed java.lang.Thread.State: WAITING
.
stackTrace:
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x000068ec71b3bde8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at org.apache.http.pool.AbstractConnPool.getPoolEntryBlocking(AbstractConnPool.java:379)
at org.apache.http.pool.AbstractConnPool.access$200(AbstractConnPool.java:69)
at org.apache.http.pool.AbstractConnPool$2.get(AbstractConnPool.java:245)
- locked <0x000068ec7d525510> (a org.apache.http.pool.AbstractConnPool$2)
- /var/log/proxy/envoy_access_log would complain about http 503 service unavailable .
1#.##.##.#4 1#.###.##.#8 "GET" "/api/v1/node" "HTTP/1.1" 503 UAEX 0 0 60003 - "1#.##.##.#4" "vAPI/2.14.0 Java/11.0.22 (Linux; 5.10.216-1.ph4; amd64)" "9#####-####-####-####-########ca7f" "1#.###.##.#8" "-"
1#.##.##.#4 1#.###.##.#8 "GET" "/api/v1/node" "HTTP/1.1" 503 UAEX 0 0 60000 - "1#.##.##.#4" "vAPI/2.14.0 Java/11.0.22 (Linux; 5.10.216-1.ph4; amd64)" "d#####-####-####-####-########b78c" "1#.###.##.#8" "-"
1#.##.##.#4 1#.###.##.#8 "GET" "/api/v1/node" "HTTP/1.1" 503 UAEX 0 0 60001 - "1#.##.##.#4" "vAPI/2.14.0 Java/11.0.22 (Linux; 5.10.216-1.ph4; amd64)" "a#####-####-####-####-########bb67" "1#.###.##.#8" "-"
the UAEX in above lines means the external authentication RPC call failed and 6000x is the 60 seconds timeout in milliseconds.
VMware NSX-T Data Center
VMware NSX
The issue occurred due to the authentication server being unable to establish a connection with LDAPS. LDAPS doesn’t support timeout settings. Consequently, any threads that were previously created but not served can remain open indefinitely, potentially leading to the LDAP server connection hanging.
This is a known issue and resolved in NSX 4.2.0 or higher.
The same underlying issue can also cause the below behaviour