ESXi 6.x with vSAN NIC enabled becomes unresponsive when vsanmgmt service is not running
search cancel

ESXi 6.x with vSAN NIC enabled becomes unresponsive when vsanmgmt service is not running

book

Article ID: 326432

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

  • When vSAN is disabled at the cluster level but vSAN is enabled on one of the host's network adapters, the host may go into an unresponsive state if the vsanmgmt service is stopped.
  • The vsanmgmt service is typically disabled when a host is rebooted after disabling vSAN or if there is a network update on the host. 
  • In vmkernel.log or hostd-probe.log, you may see a similar message:
ALERT: hostd detected to be non-responsive 

The issue occurs when hostd tries to query vSAN network adapter info through the vsanmgmt service when the service is offline. Because hostd is expecting a response from vsanmgmt and not getting one, it will continue to retry the request. However, if there are multiple concurrent queries happening at the same time they may block the hostd worker threads and causing hostd to lock-up.

Environment

VMware vSAN 6.0.x
VMware vSAN 6.5.x
VMware vSAN 6.x
VMware vSAN 6.2.x
VMware vSAN 6.6.x
VMware vSAN 6.1.x

Resolution

This issue is resolved in ESXi 6.7 GA, available at VMware Downloads.

To work around this issue when you do not want to upgrade:
  1. Determine if you have a residual vSAN network adapter:
# localcli vsan network list 

Example output shows vmk4 is our vSAN vmkernel adapter for this host:
 
Interface
   VmkNic Name: vmk4
   IP Protocol: IP
   Interface UUID: 1dd11858-d4e1-f88d-ca19-90b11c441064
   Agent Group Multicast Address: 224.2.3.4
   Agent Group IPv6 Multicast Address: ff19::2:3:4
   Agent Group Multicast Port: 23451
   Master Group Multicast Address: 224.1.2.3
   Master Group IPv6 Multicast Address: ff19::1:2:3
   Master Group Multicast Port: 12345
   Host Unicast Channel Bound Port: 12321
   Multicast TTL: 5
   Traffic Type: vsan
  1. Remove the vSAN network interface 
Note: Perform only if you are sure that vSAN is not used in the cluster and if there are no plans in using it. 

# localcli vsan network remove -i vmk4 

Note: After removing the unnecessary vmkernel adapter with vSAN-tag set, the vsanmgmt service should stop on its own after a few minutes.