Intermittent HTTP 503 error response when authenticating NSX-T manager via LDAPS
search cancel

Intermittent HTTP 503 error response when authenticating NSX-T manager via LDAPS

book

Article ID: 375741

calendar_today

Updated On:

Products

VMware NSX VMware NSX-T Data Center

Issue/Introduction

Symptoms : 

  • Customer is using LDAPS for authentication.
  • NSX-T manager UI may go unreachable intermittently .
  • NSX Manager returns HTTP 503 response when connecting via API or admin UI page.
  • Although rebooting the NSX Manager temporarily resolves the problem, it eventually reoccurs. NSX version is lower than 4.2

Logs

- In the proxy-tomcat-wrapper.log located in the /var/log/proxy/ directory of the NSX Manager,  a significant number of threads with identical stack traces is observed java.lang.Thread.State: WAITING.


stackTrace:
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x000068ec71b3bde8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at org.apache.http.pool.AbstractConnPool.getPoolEntryBlocking(AbstractConnPool.java:379)
at org.apache.http.pool.AbstractConnPool.access$200(AbstractConnPool.java:69)
at org.apache.http.pool.AbstractConnPool$2.get(AbstractConnPool.java:245)
- locked <0x000068ec7d525510> (a org.apache.http.pool.AbstractConnPool$2)

- /var/log/proxy/envoy_access_log would complain about http 503 service unavailable . 

1#.##.##.#4 1#.###.##.#8 "GET" "/api/v1/node" "HTTP/1.1" 503 UAEX 0 0 60003 - "1#.##.##.#4" "vAPI/2.14.0 Java/11.0.22 (Linux; 5.10.216-1.ph4; amd64)" "9#####-####-####-####-########ca7f" "1#.###.##.#8" "-"
1#.##.##.#4  1#.###.##.#8 "GET" "/api/v1/node" "HTTP/1.1" 503 UAEX 0 0 60000 - "1#.##.##.#4" "vAPI/2.14.0 Java/11.0.22 (Linux; 5.10.216-1.ph4; amd64)" "d#####-####-####-####-########b78c" "1#.###.##.#8" "-"
1#.##.##.#4  1#.###.##.#8 "GET" "/api/v1/node" "HTTP/1.1" 503 UAEX 0 0 60001 - "1#.##.##.#4" "vAPI/2.14.0 Java/11.0.22 (Linux; 5.10.216-1.ph4; amd64)" "a#####-####-####-####-########bb67" "1#.###.##.#8" "-"

the UAEX in above lines means the external authentication RPC call failed and 6000x is the 60 seconds timeout in milliseconds. 

Environment

VMware NSX-T Data Center

VMware NSX

Cause

The issue occurred due to the authentication server being unable to establish a connection with LDAPS. LDAPS doesn’t support timeout settings. Consequently, any threads that were previously created but not served can remain open indefinitely, potentially leading to the LDAP server connection hanging.

Resolution

This is a known issue and resolved in NSX 4.2.0 or higher. 

Additional Information

The same underlying issue can also cause the below behaviour

  • NSX  has 3 LDAP servers associated with a domain .
  • If the first LDAP server in the list is down for any reason .
  • The auth request from users would not get forwarded to the other LDAP servers in the list .
  • Manually changing the order ( bringing the impacted server down the list ) would address the issue .