BGP peering goes down randomly every hour/couple of hours after vSphere upgrade.
search cancel

BGP peering goes down randomly every hour/couple of hours after vSphere upgrade.

book

Article ID: 413959

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • Datapath Crashes due to port range configuration in Virtual Server/LB.
  • The issue appeared right after upgrading vSphere to 7.0

Environment

VMware NSX Datacenter.

Cause

LB relies on the FW engine for a variety of tasks, from identifying the flows that need to be processed by LB to tracking and managing the flows themselves. For this, LB translates LB user config into FW rules behind the scenes and relies on FW rule data structures for some of the flow processing.

FW rules allow individual ports (80, 443), port ranges (400-500, 600-700) or a combination (80, 443, 400-500) to be specified on a single rule, and the way individual ports and port ranges are stored in the FW data structure is different. However, if a single port range (400-500) is specified, it is stored similarly to individual ports, not like port ranges. LB code made the incorrect assumption that even a single port range is stored similarly to multiple port ranges. So, it misinterprets the information stored in the FW data structure when it does a lookup. How this will manifest depends on a variety of factors, including the build, what is the configured port range and what features are in use, etc. So, this may or may not cause a crash of the datapath processes, and it may have gone unnoticed.

Resolution

Workaround:

Configure multiple port ranges that should be equivalent to the actual single port range.

Example:
 
If a single port range is configured as 30200~30300 (that triggers the issue), instead, configure a multiple port range of 30200~30250,30251~30300

Resolution:

NSX release 3.1.1 and later.