Vmotion of Kubernetes Worker nodes fail while using NSX-T DFW
search cancel

Vmotion of Kubernetes Worker nodes fail while using NSX-T DFW

book

Article ID: 398661

calendar_today

Updated On:

Products

VMware vDefend Firewall

Issue/Introduction

The purpose of this KB is to be informative regarding the use of Kubernetes with NSX-T Distributed Firewall  (VMware vDefend Firewall).

  • The max number of Distributed Firewall (DFW) rules that can be configured on a VM is 4K which also applies to a kubernetes worker node. See the max config documentation for reference.
    • If there are sub interfaces configured on the VM with a slot 1, the number of DFW rules per slot 1 will contribute to the total number of rules configured for that VM.
      • For example, if there are 50 sub interfaces/pods (each interface will have a slot 1) configured with 1K rules this would mean the total number of rules for that VM would be 50K which exceeds the limit.
      • To check the DFW rules configured on a sub interface/pod please see guide.
  • If the number of rules configured on the worker node exceed the limit this may impact vmotion timing and even cause vmotion failure.
    • During a vmotion each dvfilter (slot 1 or slot 2) will need to be exported along with DFW rules configured and address sets referencing those rules.
      • If the export of the dvfilter takes longer than 500 ms the vmotion will fail.
    • Note that vmotioning a worker node with 50 sub interfaces/pods would be the equivalent vmotioning 50 VMs at one time.
      • The max number of VNICs a VM can have is 10 configured.
        • Anything over this number will fail to be configured for a non kubernetes VM.
  • The number of supported sub interfaces (pods) per worker node (TKGI) is 100.
    • Other technologies like Openshift may not be documented by VMware but the supported number for TKGI can be used as a reference.

Environment

VMware NSX-T Data Center

VMware vDefend Firewall

Resolution

If there is an excesss of DFW rules configured on a kubernetes worker node causing a vmotion failure, using the following options could be used to rectify the situation:

  1. Optimize the number of sub interfaces used for the worker node.
    1. This will reduce the number of dvfilters (slot1 or slot 2) configured on the VM, and as a result reduce the number of rules on the VM.
      1. For kubernetes this can be done by reducing the number of pods configured on the worker node.
  2. Optimizing the number of rules configured on the worker nodes or name spaces.
    1. This can be done by using the "applied to" field on the DFW rule.

Additional Information

Information on the use of slot 1 and slot 2 dvfilters.

  • The use of slot 2 dvfilter is used with parent ports on VMs using kubernetes.
    • Slot 2 dvfilter will be the only slot used for non kubernetes VMs without child ports.
  • The use of slot 1 dvfilter is used for child port or sub interfaces to differentiate to the parent ports.

Exclusion list for both slot 2 and slot 1 dvfilters.

  • Putting a VM containing a slot 2 into the DFW exclusion list can be done by following the documentation.
  • Placing a VM containing a slot 1 into a DFW exclusion needs to be done at container port level.
    • A group needs to be created with the container port tagged or it could a segment_port where tag is ncp/xx  scope is k8s cluster.
      • Example: If you'd like to put the vms in following segment in exclusion list.
      •  
      •  
      •  
        • This is because creating a group does not have an option to select for container/sub ports.