ESXi host PSOD due to dvfilter race condition observed after container port vMotion
search cancel

ESXi host PSOD due to dvfilter race condition observed after container port vMotion

book

Article ID: 426847

calendar_today

Updated On:

Products

VMware NSX VMware vDefend Firewall VMware vSphere ESXi

Issue/Introduction

  • PSOD relating to dvfilter race condition may be observed on the ESXi source host after container port vMotion

  • PSOD displays the following messages:

    #PF Exception 14 in world 2098285:vmnic2-pollW IP 0x42000e744ca5 addr 0x4

    Backtrace for current CPU ###, worldID=20###5, fp=0x433d91f5eb40
    [email protected]#1.0.8.0.24302014+0x559 stack: 0x100000, 0x453a5df9a712, 0x433d91f5eb40, 0x433d91f5eb50, 0x0
    [email protected]#1.0.8.0.24302014+0x6da stack: 0x1, 0x453a5df9a7f0, 0x0, 0x42000e7470ca, 0x42004cc00000
    [email protected]#1.0.8.0.24302014+0x3bd stack: 0x45bf11cc0ae0, 0x453a5df9a1b0, 0x453a5df9a71a, 0x0, 0x2
    [email protected]#1.0.8.0.24302014+0x4ab stack: 0x14, 0x453a5df9a706, 0x1, 0x0, 0x45bf11cc0ae0
    [email protected]#1.0.8.0.24302014+0x13e stack: 0x453a5df9a942, 0x0, 0x453a5df9a944, 0x0, 0x1
    [email protected]#1.0.8.0.24302014+0x1c0 stack: 0x453a5df9ab98, 0x433b6857c6a0, 0x4200518016c0, 0x453a5df9f000,
    x0
    [email protected]#1.0.8.0.24302014+0x1efb stack: 0x453a5df9ab68, 0x0, 0x0, 0x453a5df9ab6c, 0x45deab52d418
    [email protected]#1.0.8.0.24302014+0x7f8 stack: 0x0, 0x453a5df9aeb0, 0x0, 0x453a5df9b588, 0x0
    [email protected]#1.0.8.0.24302014+0x67 stack: 0x0, 0x0, 0x0, 0x420000000000, 0x0

  • ESXi host - /var/log/vmkernel.log shows similar logging

    <Timestamp> cpu1:2098107)WARNING: DVFilter: 1203: filter->portID <Port Number> doesn't exist
    <Timestamp> cpu1:2098107)Restore state called for filter nic-#######-eth0-vmware-sfw.2
    <Timestamp> cpu1:2098107)Importing nic-#######-eth0-vmware-sfw.2, Version 1000
    <Timestamp> cpu1:2098107)pfioctl: lowering export version of nic-#######-eth0-vmware-sfw.2 to 1000
    <Timestamp> cpu1:2098107)ImportStateTLV entry type 12, len 52, cnt 1
    <Timestamp> cpu1:2098107)Importing from source version RELEASEbuild-24302014
    <Timestamp> cpu1:2098107)ImportStateTLV entry type 1, len 1763689, cnt 61
    <Timestamp> cpu1:2098107)ImportStateTLV entry type 2, len 18237, cnt 56
    <Timestamp> cpu2:7790511)World: 3355: PRDA 0x############  ss 0x# ds 0x### es 0x### fs 0x### gs 0x#
    <Timestamp> cpu2:7790511)World: 3357: TR 0x758 GDT 0x############  (0xffff) IDT 0xv (0xffff)
    <Timestamp> cpu62:2098298)World: 3355: PRDA 0x############  ss 0x# ds 0x### es 0x### fs 0x### gs 0x###
    <Timestamp> cpu2:7790511)World: 3359: CR0 0x######### CR3 0x########### CR4 0x#######
    <Timestamp> cpu62:2098298)World: 3357: TR 0x### GDT 0x############  (0xffff) IDT 0x############ (0xffff)
    <Timestamp> cpu62:2098298)World: 3359: CR0 0x######### CR3 0x###### CR4 0x######
    <Timestamp> cpu62:2098298)Panic: 630: Panic from another CPU (cpu 62, world 2098298): ip=0x############ randomOff=0x######:
    #PF Exception 14 in world 2098298:vmnic2-pollW IP 0x############  addr 0x#
    PTEs:0x##########;0x##########;0x##########;0x#; >>>>>>>>>>>>>
    <Timestamp> cpu62:2098298)Backtrace for current CPU #62, worldID=2098298, fp=0x3
    Module(s) involved in panic: [nsxt-vsip-24302014 Version 1.0.0-0 RELEASEbuild-24302014]
    <Timestamp> cpu2:7790511)cr0=0x######### cr2=0x# cr3=0x######## cr4=0x######
    <Timestamp> cpu2:7790511)FMS=06/55/7 uCode=0x#######
    <Timestamp> cpu2:7790511)frame=0x############  ip=0x############  err=0x# rflags=0x##### 
    <Timestamp> cpu2:7790511)rax=0x4# rbx=0x# rcx=0x############ 

Environment

  • VMware ESXi 8.0.x and 9.0.x
  • VMware NSX

Cause

When a VM containing container ports is "quiesced" (paused) during a vMotion, the container ports are deleted. However, the dvfilter (the network filter responsible for security/traffic rules) still tries to collect statistics from those ports before the migration completes. As the ports were deleted while the filter was still looking for them, the system encounters a "null" reference or a memory mismatch, resulting in a PSOD.

Resolution

This is resolved in ESXi 8.0.3 P10 and ESXi 9.0.2.