2024-10-31T04:57:11.333Z #####.###.### NSX 4005709 LOAD-BALANCER [nsx@6876 comp="nsx-edge" subcomp="lb" s2comp="lb" level="ERROR" errorCode="EDG9999999"] [########-####-####-####-########] upstream prematurely closed connection while reading response header from upstream, client: ##.##.##.##, server: , request: "POST ########## HTTP/1.1", upstream: "http://##.##.##.##:##/####/", host: "#######.##.##"
VMware NSX
This issue can occur when HTTP keepalive is used between the LB and the backend servers.
If the keepalive timeout is exceeded on the backend server, the TCP connection will be closed by the server.
If an additional HTTP request from the LB arrives immediately after the backend server closes the TCP connection, the backend server cannot respond to it and the LB returns the 502 Bad Gateway to the client.
From NSX native LB perspective, HTTP keepalive is used when the "server keepalive" is enabled on the application profile or "TCP multiplexing" is enabled on the server pool.
When server keepalive is enabled, LB disconnects both the TCP connection with the backend server and the TCP connection with the client when the idle timeout of the application profile expires.
When TCP multiplexing is enabled, LB does not disconnect the TCP connection with the backend server.
If the server keepalive is enabled on application profile which associated to the VIP, set the keepalive timeout value on the backend servers larger than the idle-timeout value of the application profile.
This will make sure timed out TCP connections are always closed by the LB, preventing the corner cases mentioned in the section above.
If the TCP multiplexing is enabled on the server pool which associated to the VIP, this issue cannot be completely prevented due to the current design of NSX LB.
However, by setting the keepalive timeout to a large enough value on the backend servers, the occurrence of this issue can be made very unlikely.