Latency observed between VMs when DFW in use with NSX for vSphere

Products

VMware NSX

Issue/Introduction

Symptoms:

Ping between VMs shows high latency
Issue persists when VMs on same host and port group/logical switch
A vsfwd world is close to or CPU capped, i.e. consuming a full PCPU
vsfwd.log shows large amount of container updates

2019-10-29T10:32:00Z vsfwd: [INFO] Received vsa message of ContainerSet, length 139405
2019-10-29T10:32:05Z vsfwd: [INFO] Received vsa message of ContainerSet, length 48262
2019-10-29T10:32:39Z vsfwd: [INFO] Received vsa message of ContainerSet, length 58429
2019-10-29T10:33:05Z vsfwd: [INFO] Received vsa message of ContainerSet, length 58429
2019-10-29T10:33:44Z vsfwd: [INFO] Received vsa message of ContainerSet, length 48262
2019-10-29T10:33:46Z vsfwd: [INFO] Received vsa message of ContainerSet, length 48262
2019-10-29T10:33:54Z vsfwd: [INFO] Received vsa message of ContainerSet, length 48262
2019-10-29T10:34:01Z vsfwd: [INFO] Received vsa message of ContainerSet, length 139405

Placing VMs into DFW exclusion list resolves issue

Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.

Environment

VMware NSX for vSphere 6.3.x
VMware NSX for vSphere 6.4.x

Cause

The high volume of container updates causes locking of the complete data path resulting in high latency. Typically this might be seen in VDI environments where VMs are constantly added/removed and powered cycled leading to frequent container updates.

Resolution

This issue is resolved in VMware NSX Data Center for vSphere 6.4.7. Upgrade to the latest version of NSX for vSphere following the document Download Broadcom products and software .

Workaround:
To workaround this issue enable Global Containers via API call.

First obtain the current configuration.

GET https://<NSX-Manager-IP>/api/4.0/firewall/config/globalconfiguration

<?xml version="1.0" encoding="UTF-8"?>
<globalConfiguration>
<layer3RuleOptimize>false</layer3RuleOptimize>
<layer2RuleOptimize>false</layer2RuleOptimize>
<tcpStrictOption>false</tcpStrictOption>
<enableGlobalContainers>false</enableGlobalContainers>
<autoDraftDisabled >true<autoDraftDisabled>
</globalConfiguration>

Update <enableGlobalContainers> to true

PUT https://<NSX-Manager-IP>/api/4.0/firewall/config/globalconfiguration

<?xml version="1.0" encoding="UTF-8"?>
<globalConfiguration>
<layer3RuleOptimize>false</layer3RuleOptimize>
<layer2RuleOptimize>false</layer2RuleOptimize>
<tcpStrictOption>false</tcpStrictOption>
<enableGlobalContainers>true</enableGlobalContainers>
<autoDraftDisabled >true<autoDraftDisabled>
</globalConfiguration>

Confirm the change is in place using the GET command again,
GET https://<NSX-Manager-IP>/api/4.0/firewall/config/globalconfiguration

Impact/Risks:
Do not enable Global Containers if Spoofguard is in use. This can cause vNic disconnects during vMotion resulting in failed migrations.

Additional Information

DFW rules have two components. The rule itself, which specifies the 5 Tuple & Action, and the Address sets which are specified as part of the SRC/DST.

From SRC A to DST B, Service C, Action Applied to D. Where SRC A and DST B could be Address Sets in the form of

SRC A {
Address A1
CIDR A2/Mask
Address A3
...
}

and AppliedTo D is a set of vNIC UUIDs where this rule needs to be applied in the form of

appliedTo {
vNIC1 UUID
vNIC2 UUID
..
}

Since the Applied To field is optional, this defaults to the rule being applied to every filter(vNic) in the Cluster. Since we program the rules based upon the filters mentioned in the Applied To, leading to each filter potentially having a unique set of rules. Along with the rules, we program the corresponding Address Sets (SRC A, DST B, etc) consumed by the rule. Given that typically the rules themselves are limited per filter (few hundred to low thousands), and that each filter could have unique rules, these consume a finite amount of memory (which is limited to a max of 3GB) on the system.

The consolidation ratio (number of VMs/ Per host) typically stands around 30-60, but can go up to a few hundreds in cases such as VDI etc. This leads to the rules and address sets being replicated across each filter. We have typically seen large number of addresses specified in Address Sets which leads to a memory bloat as well as configuration churn. Given that rules typically do not change that often in a datacenter, but Dynamic Address Sets which can get populated based upon the VMs being added or removed (including Powered On of Off) leads to constant churn in configuration as well as undefined size of the address set.

Since the Address sets are also configured per Filter, every time the Address Set changes a config cycle has to be performed on the filters where its consumed, which leads to an interruption in the datapath. Unlike the custom rules per filter, the Address Set information actually is the same across all the filters. As an optimization, by enabling Global / Shared Address Set, we keep only one copy of the address set in the DFW engine hence the config is done only once per an update rather than per filter. Also, since there is only one copy, the memory consumption reduces substantially. Each filter instead of now having its own copy, simply points to the Shared Address Set.

This would lead to the following being shown on the host

vsipioctl getaddrsets -f <filtername>
global addrset <<===
addrset ip-spoofguard-sfw {
# generation number: 0
# realization time : 2020-01-23T06:54:09
ip 175.20.0.194,
ip 2001:3002::250:56ff:febb:b430,
}