Intermittent packet loss and latency observed with traffic sent between two ESXi hosts.
search cancel

Intermittent packet loss and latency observed with traffic sent between two ESXi hosts.

book

Article ID: 410123

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

  • You many notice latency between endpoints.
  • Packets may be intermittently dropping. 
  • Examination of NIC stats using the below command from ESXi command line reveals very high rxPciSignalIntegrity/txPciSignalIntegrity numbers......
    • #/usr/lib/vmware/vm-support/bin/nicinfo.sh|less

      Then scroll to the private nic stats section in for each nic, and look for entries similarly names to txPciSignalIntegrity and rxPciSignalIntegrity.
      NOTE: NOT ALL NICs report this statistic. Some NICs will have a different name for this statistic.  The example below is from the Mellanox line of cards.

      NIC:  vmnic1
         vmnic1 0000:5d:00.1 nmlx5_core Up Up 25000 Full 88:e9:a4:95:dc:37 9000 Mellanox Technologies MT27710 Family [ConnectX-4 Lx]

         NICInfo:
            Advertised Auto Negotiation: true
            Advertised Link Modes: Auto, 1000BaseCX-SGMII/Full, 10000BaseKR/Full, 25000BaseTwinax/Full
      8<--------snip-------->8      
         
         NIC statistics for vmnic1:
            Packets received: 67622289
            Packets sent: 40620276
            Bytes received: 43382393373
            Bytes sent: 602329733154
            Receive packets dropped: 0
            Transmit packets dropped: 0
      8<---------snip-------→8
         
         NIC Private statistics:
            
            PSID: DELL_########
            firmware syndrome: 0x0000
            asicSensorTemperature: 63
            rxSwPackets: 67622289
            rxSwBytes: 43382393373
            txSwPackets: 40620276
            txSwTsoPackets: 11472046
            8<----------snip--------→8
            rxSymbolErrorsPhy: 0
            rxCorrectedBitsPhy: 0
            rxErrLane_0_Phy: 0
            rxErrLane_1_Phy: 0
            rxErrLane_2_Phy: 0
            rxErrLane_3_Phy: 0
            rxBufferPassedThresPhy: 0
            rxPciSignalIntegrity: 32323225123     <---------------- Number in the billions
            txPciSignalIntegrity: 46522122001     <---------------- Number in the billions
            outboundPciBufferOverflow: 0
            outboundPciStalledRd: 0
            outboundPciStalledWr: 0
            outboundPciStalledRdEvents: 0
            outboundPciStalledWrEvents: 0
            txPciTransportNonfatalMsg: 0
            txPciTransportFatalMsg: 0
      8<-----snip------->8 

 

 

Environment

vSphere combined with any other product. This is at the ESXi host level.

Cause

 

  •  If the number of reported errors for these two statistics are huge (in our case in the “billions”), then there is very likely a hardware issue at the root of the problem.

  • Errors in the rxPciSignalIntegrity and/or txPciSignalIntegrity indicate some type of problem at the PCI bus level.  This could be the physical NIC, transceiver, or mother board, which are all components of the PCI bus.  Typically, if you are only seeing high numbers of either rx, or tx signal Integrity errors (not both), then the issue is likely to be related to the motherboard.  If you have both, it increases the likelihood that the issue is with the Physical NIC or the transceiver itself.

 

 

 

Resolution

Customer should contact his hardware vendors for assistance with investigating this further. 

Additional Information

More info available here about Mellanox cards and collecting this info on a live esx host.

NOTE: NOT ALL Nics report this statistic. Some nics will have a different name for it.The example is from the Mellanox line of cards. If the number of reported errors is huge (in our case in the “billions”) then very likely a hardware issue the customer and hardware vendor need to investigate.