New connection failure to a specific virtual server when heavy traffic is concentrated across multiple virtual servers configured with source IP Persistence profiles having long timeouts.
search cancel

New connection failure to a specific virtual server when heavy traffic is concentrated across multiple virtual servers configured with source IP Persistence profiles having long timeouts.

book

Article ID: 435865

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • Source IP persistence is enabled, and a long timeout value is configured in the Persistence profile.
  • The number of live connections has not reached the maximum capacity of the persistence table.
  • New connections to a specific virtual server fail even though "Purge Entries When Full" is enabled.
  • The issue is resolved by changing the active Edge to NSX maintenance mode and failing over the Tier-1 Gateway.
  • Pool members are responding normally to health checks and are in a healthy state.
  • The following entries are recorded in the Load Balancer access logs:
2026-03-14T23:24:37.279Z ####.####.#### NSX 214964 LOAD-BALANCER [nsx@6876 comp="nsx-edge" subcomp="lb" s2comp="access" level="INFO"] [########-####-####-####-############][########-####-####-####-############] Operation.Category: 'LbAccessLog', Operation.Type: 'TCP', Lb.UUID: '########-####-####-####-################', Lb.Name: '####', Vs.UUID: '########-####-####-####-############', Vs.Name: '####', Vs.Ip: '##.##.##.##', Vs.Port: '####', Pool.UUID: '########-####-####-####-############', Pool.Name: '####', PoolMember.IP: '-', PoolMember.Port: '-', Client.Ip: '##.##.##.##', Client.Port: '####', Snat.Ip: '-', Snat.Port: '-', Error.Reason: 'Bad gateway:Failed to get upstream config of sorry server'

Environment

VMware NSX

Cause

This issue occurs due to the current product design specifications of the NSX native Load Balancer.

  • There is a global  table with persistence entries for each Load Balancer instance.
  • Separately, there are source IP persistence tables and aging tables for each virtual server.
  • The source IP persistence table registers live connections, and the aging table registers entries that have been closed and whose persistence timeout countdown has begun. All these entries come from the global persistence table.
  • When "Purge Entries When Full" is enabled, if a new connection is initiated while the global persistence table is full, an entry is retrieved from the aging table, and the connection succeeds.
  • If the persistence timeout is a long value and a large volume of access occurs across multiple virtual servers, the following situation may occur:
    • The sum of the source IP persistence tables and aging tables for all virtual servers has reached the maximum capacity of the global persistence table.
    • The number of entries in the aging table for a specific virtual server is 0.
  • Under this situation, if a new connection is initiated to that virtual server, entries cannot be retrieved from the aging table, and the connection fails.

Resolution

This is a standard behavior based on the current Load Balancer specifications.

Depending on the number of clients in your environment and the characteristics of your application, please consider shortening the persistence timeout or increasing the size of the Load Balancer.

By using the same source IP persistence profile across multiple virtual servers and enabling "Share Persistence" in the profile, you can reduce the likelihood of this issue occurring.