VxRail vSAN hosts running Intel X710 NIC goes into not responding state
search cancel

VxRail vSAN hosts running Intel X710 NIC goes into not responding state

book

Article ID: 326615

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

This is a known issue with Intel X710 NIC confirmed by Intel and VMware. Refer to public articles below,

https://communities.vmware.com/thread/552470
https://communities.intel.com/thread/124544


Symptoms:
VxRail vSAN hosts running Intel X710 NIC goes into not responding state randomly in the cluster.

You may see the following errors related to NICS Ethernet Controller X710 for 10GbE SFP+  ( ex. vmnic1 and vmnic0 in below logs ) in the vmkernel logs during which time the issue occurred with the host -:

2018-10-20T01:04:55.604Z cpu21:140896553)NetPort: 1662: enabled port 0x4000011 with mac 00:50:56:af:77:df
2018-10-20T01:05:15.214Z cpu42:678436159)WARNING: NetPort: 1934: failed to disable port 0x4000028 on DvsPortset-0: Busy
2018-10-20T01:05:15.214Z cpu42:678436159)netschedHClk: NetSchedHClkPortQuiesce:4918: vmnic1: received a force quiesce for port 0x4000028
2018-10-20T01:05:15.214Z cpu42:678436159)netschedHClk: NetSchedHClkPortQuiesce:4918: vmnic0: received a force quiesce for port 0x4000028
2018-10-20T01:05:15.214Z cpu42:678436159)netschedHClk: NetSchedHClkHashQuiesceHierarchyIter:396: vmnic0: dropped 506 pkts from queue netsched.pools.vm.67108904 while quiescing port 0x4000028
2018-10-20T01:05:15.215Z cpu42:678436159)Vmxnet3: 15916: There is still packet not transmitted when the device isdisabled, port:0x4000028, queue: 2
2018-10-20T01:05:15.215Z cpu42:678436159)NetPort: 1881: disabled port 0x4000028
2018-10-20T01:05:15.227Z cpu42:678436159)Vmxnet3: 17293: Disable Rx queuing; queue size 1024 is larger than Vmxnet3RxQueueLimit limit of 64.
2018-10-20T01:05:15.227Z cpu42:678436159)Vmxnet3: 17651: Using default queue delivery for vmxnet3 for port 0x4000028
2018-10-20T01:05:15.227Z cpu42:678436159)NetPort: 3208: resuming traffic on DV port 28679
2018-10-20T01:05:15.227Z cpu42:678436159)Team.etherswitch: TeamESPolicySet:5942: Port 0x4000028 frp numUplinks 2 active 2(max 2) standby 0
2018-10-20T01:05:15.227Z cpu42:678436159)Team.etherswitch: TeamESPolicySet:5950: Update: Port 0x4000028 frp numUplinks 2 active 2(max 2) standby 0
2018-10-20T01:05:15.227Z cpu42:678436159)NetPort: 1662: enabled port 0x4000028 with mac 00:50:56:af:f0:14
2018-10-20T01:05:15.228Z cpu42:678436159)NetPort: 1881: disabled port 0x4000028
2018-10-20T01:05:15.259Z cpu42:678436159)Vmxnet3: 17293: Disable Rx queuing; queue size 1024 is larger than Vmxnet3RxQueueLimit limit of 64.
2018-10-20T01:05:15.259Z cpu42:678436159)Vmxnet3: 17651: Using default queue delivery for vmxnet3 for port 0x4000028
2018-10-20T01:05:15.259Z cpu42:678436159)NetPort: 3208: resuming traffic on DV port 28679
2018-10-20T01:05:15.259Z cpu42:678436159)Team.etherswitch: TeamESPolicySet:5942: Port 0x4000028 frp numUplinks 2 active 2(max 2) standby 0
2018-10-20T01:05:15.259Z cpu42:678436159)Team.etherswitch: TeamESPolicySet:5950: Update: Port 0x4000028 frp numUplinks 2 active 2(max 2) standby 0

--------------------------------------------------------------------------------------------------------------------------------------------------------
vmkernel.0:2018-10-30T02:51:47.141Z cpu58:672058372)netschedHClk: NetSchedHClkHashQuiesceHierarchyIter:396: vmnic0: dropped 8 pkts from queue netsched.pools.persist.default while quiescing port 0x300003f
vmkernel.0:2018-10-30T02:51:47.141Z cpu58:672058372)netschedHClk: NetSchedHClkHashQuiesceHierarchyIter:396: vmnic0: dropped 503 pkts from queue netsched.pools.vm.50331711 while quiescing port 0x300003f
vmkernel.0:2018-10-30T02:51:52.066Z cpu53:74667289)netschedHClk: NetSchedHClkPortQuiesce:4918: vmnic0: received a force quiesce for port 0x3000012


--------------------------------------------------------------------------------------------------------------------------------------------------------
vmkernel.0:2018-10-30T02:51:47.140Z cpu58:672058372)netschedHClk: NetSchedHClkPortQuiesce:4918: vmnic1: received a force quiesce for port 0x300003f
vmkernel.0:2018-10-30T02:51:52.066Z cpu53:74667289)netschedHClk: NetSchedHClkPortQuiesce:4918: vmnic1: received a force quiesce for port 0x3000012


--------------------------------------------------------------------------------------------------------------------------------------------------------
2018-10-30T02:51:47.141Z cpu58:672058372)netschedHClk: NetSchedHClkHashQuiesceHierarchyIter:396: vmnic0: dropped 8 pkts from queue netsched.pools.persist.default while quiescing port 0x300003f
2018-10-30T02:51:47.141Z cpu58:672058372)netschedHClk: NetSchedHClkHashQuiesceHierarchyIter:396: vmnic0: dropped 503 pkts from queue netsched.pools.vm.50331711 while quiescing port 0x300003f


--------------------------------------------------------------------------------------------------------------------------------------------------------
grep  "while quiescing port" vmkernel.* | wc
   1128   15792  228683

--------------------------------------------------------------------------------------------------------------------------------------------------------
grep "time since last heartbeat" vpxd-*.log
vpxd-681.log:2018-10-30T02:38:32.961Z info vpxd[7F086F66C700] [Originator@6876 sub=HostCnx opID=CheckforMissingHeartbeats-7460ee5a] [VpxdHostCnx] No heartbeats received from host; cnx: 52dce226-1083-fa13-d11c-99cb60f68547, h: host-115, time since last heartbeat: 63271ms

 


Environment

VMware vSAN 6.6.x

Cause

Outdated drivers and firmware running on Intel X710 NIC as per VMware HCL. Please install the latest supported drivers and firmware. 

Resolution

Upgrade NIC driver and firmware of intel X710 NIC for 6.5 update 2 as per VMware HCL.

http://partnerweb.vmware.com/comp_guide2/detail.php?deviceCategory=io&productid=38742&releaseid=338&deviceCategory=io&details=1&VID=8086&DID=1572&SVID=1028&SSID=1f9c&page=1&display_interval=10&sortColumn=Partner&sortOrder=Asc

Recommended Driver -: i40en version 1.7.11

Recommended firmware -: 18.5.0