502 Bad Gateway happens if the HTTP request arrives immediately after the backend server closes the TCP connection.
search cancel

502 Bad Gateway happens if the HTTP request arrives immediately after the backend server closes the TCP connection.

book

Article ID: 383374

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • Sometimes clients receive a 502 Bad Gateway from HTTP L7 LB.
  • The HTTP keepalive timeout is set to a small value on the backend server.
  • The "upstream prematurely closed connection while reading response header from upstream" message in syslog on active edge as follows.
2024-10-31T04:57:11.333Z #####.###.### NSX 4005709 LOAD-BALANCER [nsx@6876 comp="nsx-edge" subcomp="lb" s2comp="lb" level="ERROR" errorCode="EDG9999999"] [########-####-####-####-########] upstream prematurely closed connection while reading response header from upstream, client: ##.##.##.##, server: , request: "POST ########## HTTP/1.1", upstream: "http://##.##.##.##:##/####/", host: "#######.##.##"
  • On the packet capture taken on the backend server side, an HTTP request received from the LB immediately after the server sent a FIN to the LB.

Environment

VMware NSX 

Cause

This issue can occur when HTTP keepalive is used between the LB and the backend servers.

If the keepalive timeout is exceeded on the backend server, the TCP connection will be closed by the server.

If an additional HTTP request from the LB arrives immediately after the backend server closes the TCP connection, the backend server cannot respond to it and the LB returns the 502 Bad Gateway to the client.

From NSX native LB perspective, HTTP keepalive is used when the "server keepalive" is enabled on the application profile or "TCP multiplexing" is enabled on the server pool.

When server keepalive is enabled, LB disconnects both the TCP connection with the backend server and the TCP connection with the client when the idle timeout of the application profile expires.

When TCP multiplexing is enabled, LB does not disconnect the TCP connection with the backend server.

Resolution

If the server keepalive is enabled on application profile which associated to the VIP, set the keepalive timeout value on the backend servers larger than the idle-timeout value of the application profile.

This will make sure timed out TCP connections are always closed by the LB, preventing the corner cases mentioned in the section above.

If the TCP multiplexing is enabled on the server pool which associated to the VIP, this issue cannot be completely prevented due to the current design of NSX LB.

However, by setting the keepalive timeout to a large enough value on the backend servers, the occurrence of this issue can be made very unlikely.