Symptoms:
- Customer may see the below error on vSAN cluster summary page after upgrading from vSAN 6.6 to 6.7 .
- It seems that all VMs and hosts are working in a normal manner.
- In the vSAN health check tab, you may see the below error
- The logs will show following log pattern:
The ESXi hosts /var/log/vsanmgmt.log will show below errors:
2019-04-27T13:50:53Z VSANMGMTSVC: WARNING vsanperfsvc[Thread-2] [VsanHealthSystemImpl::_QueryPhysicalDiskHealthSummary] entry = {'healthReason': 0, 'healthFlags': 0, 'timestamp': 127419771773}
2019-04-27T13:50:53Z VSANMGMTSVC: ERROR vsanperfsvc[Thread-2] [VsanHealthSystemImpl::_QueryPhysicalDiskHealthSummary] Failed to get disk encryption info Traceback (most recent call last): File "/build/mts/release/bora-12775454/bora/build/vsan/release/vsanhealth/usr/lib/vmware/vsan/perfsvc/VsanHealthSystemImpl.py", line 1813, in _QueryPhysicalDiskHealthSummary ValueError: Failed to open device /vmfs/devices/disks/naa.5002538a488c0a60
2019-04-30T21:33:07.993Z error hostd[5297249] [Originator@6876 sub=vmomi.soapStub[58]] Resetting stub adapter for server <cs p:0000001210900cb0, TCP:localhost.localdomain:9095> : service state request failed: N7Vmacore15SystemExceptionE(Connection reset by peer: The connection is terminated by the remote end with a reset packet. Usually, this is a sign of a network problem, timeout, or service overload.)
2019-04-27T08:01:27Z VSANMGMTSVC: ERROR vsanperfsvc[906d9cca-68c2-11e9] [VsanEsxHclUtil::__init__] Failed to run tool storcli: Exception 'RunCommandError' occured running command '['/opt/lsi/storcli/storcli', '++group=host/vim/tmp', '/call', 'show', 'J']'
On ESXi hosts /var/log/hostd.log:
2019-05-02T08:29:05.352Z warning hostd[5297256] [Originator@6876 sub=Default] Failed to connect socket; <io_obj p:0x000000120dea5a70, h:120, <TCP '127.0.0.1 : 42447'>, <TCP '127.0.0.1 : 9095'>>, e: 111(Connection refused)
On ESXi hosts /var/log/syslog.log:
2019-05-02T09:56:35Z Unknown: out of memory [7124098] ( This message repeated multiple times consecutively )
The hostd logs may point to a network issue, but it might not be a networking issue. Double check the NIC drivers to ensure they are listed on the HCL: https://www.vmware.com/resources/compatibility/search.php?deviceCategory=io
Also, you might see in /var/log/hostd.log on the ESXi host:
2019-05-02T08:29:05.588Z info hostd[5297215] [Originator@6876 sub=Vimsvc.ha-eventmgr] Event 2764 : vSAN virtual NIC has been added.
2019-05-02T08:28:56.188Z cpu40:7368912)WARNING: UserSocketInet: 2266: python: waiters list not empty!
2019-05-02T08:28:56.188Z cpu40:7368912)WARNING: UserSocketInet: 2266: python: waiters list not empty!
2019-05-02T08:29:00.349Z cpu70:7368912)WARNING: CMMDS: CMMDSArenaMemUnmapFromUser:194: Failed to unmap MPNs from world 7368917: Not found
2019-05-02T08:29:05.486Z cpu0:2099591)CMMDS: CMMDSVSIUpdateNetworkCbk:2836: RECONFIGURE of interface vmk2 with cmmds (Success).
2019-05-02T08:29:05.487Z cpu21:2099591)CMMDS: CMMDSUtil_PrintArenaEntry:41: [1035794]:Inserting (actDir:0):u:6217c35b-b6b7-53db-922e-6805ca7f6d1a o:5b9f2b83-6de1-4786-630f-6805ca7f6d1a r:23 t:NET_INTERFACE
2019-05-02T08:29:05.487Z cpu21:2099591)CMMDS: CMMDSUtil_PrintArenaEntry:41: [1035795]:Removing (actDir:0):u:6217c35b-b6b7-53db-922e-6805ca7f6d1a o:5b9f2b83-6de1-4786-630f-6805ca7f6d1a r:22 t:NET_INTERFACE
2019-05-02T08:29:09.198Z cpu42:2100048)WARNING: LSOM: LSOMVsiGetVirstoInstanceStats:800: Throttled: Attempt to get Virsto stats on unsupported disk52942248-4166-09b8-34ac-e5d4c1a8291b