The ESXi host generates a core dumps with the format:
./var/core/sfcb-intelcim-zdump.XXX (being XXX a sequence number)
In the /var/log/hostd.log file, you see entries similar to:
2019-08-27T10:45:05.727Z info hostd[2102747] [Originator@6876 sub=Vimsvc.ha-eventmgr] Event 74848 : An application (/bin/sfcbd) running on ESXi host has crashed (429451 time(s) so far). A core file might have been created at /var/core/sfcb-intelcim-zdump.001. 019-08-27T10:44:50.738Z error hostd[2102747] [Originator@6876 sub=Hostsvc.NsxSpecTracker] Object not found/hostspec disabled 2019-08-27T10:45:05.727Z info hostd[2102747] [Originator@6876 sub=Hostsvc.VmkVprobSource] VmkVprobSource::Post event: (vim.event.EventEx) { --> key = 82, --> chainId = 357229688, --> createdTime = "1970-01-01T00:00:00Z", --> userName = "", --> datacenter = (vim.event.DatacenterEventArgument) null, --> computeResource = (vim.event.ComputeResourceEventArgument) null, --> host = (vim.event.HostEventArgument) { --> name = "ESX-East2.interntnet.dk", --> host = 'vim.HostSystem:ha-host' --> }, --> vm = (vim.event.VmEventArgument) null, --> ds = (vim.event.DatastoreEventArgument) null, --> net = (vim.event.NetworkEventArgument) null, --> dvs = (vim.event.DvsEventArgument) null, --> fullFormattedMessage = <unset>, --> changeTag = <unset>, --> eventTypeId = "esx.problem.application.core.dumped", --> severity = <unset>, --> message = <unset>, --> arguments = (vmodl.KeyAnyValue) [ --> (vmodl.KeyAnyValue) { --> key = "1", --> value = "/bin/sfcbd" --> }, --> (vmodl.KeyAnyValue) { --> key = "2", --> value = "429451" --> }, --> (vmodl.KeyAnyValue) { --> key = "3", --> value = "/var/core/sfcb-intelcim-zdump.001" --> } --> ], --> objectId = "ha-host", --> objectType = "vim.HostSystem", --> objectName = <unset>, --> fault = (vmodl.MethodFault) null --> } 2019-08-27T10:45:05.727Z info hostd[2102747] [Originator@6876 sub=Vimsvc.ha-eventmgr] Event 74848 : An application (/bin/sfcbd) running on ESXi host has crashed (429451 time(s) so far). A core file might have been created at /var/core/sfcb-intelcim-zdump.001.
Note: The preceding log excerpts are only examples. Date, time and environmental variables may vary depending on your environment.
Environment
VMware vSphere 6.7.x
Cause
This issue occurs because the process nicmgmtd runs out of memory and is not able to fulfill the requests.
Resolution
This issue is resolved in VMware vSphere ESXi 6.7 Patch release ESXi670-202004002 available in My Downloads, in support.broadcom.com.
Workaround: To work around this issue if you do not want to upgrade:
Copy the script attached restart-nicmgmtd.sh to a specific location and modify the permission to execute.
chmod +x restart-nicmgmtd.sh
Note: The attached script periodically monitors the memory usage of nicmgmtd and restarts the nicmgmtd when the nicmgmtd memory usage is nearing the memory limit.
-----> This would make the crond to run the restart daemon script once in every 5 minutes.
Note: The script file take an optional parameter which is the path to a log file to log script executions or failures.This is an optional parameter. If nothing is specified as parameter, it would log the stdout. This can be ignored by:
Once the crond is started after the changes, the script would be in execution monitoring the eminpeak page table value of nicmgmtd. If it hits the predetermined 500k, it would restart the nicmgmtd. This way, there will not be any log spew in the vmkernel.log file.
Notes:
The changes made to the file /var/spool/cron/crontabs/root, is not persistent across reboots. So based on the , if you wish these changes to persist across reboot, then the below entries would have to be made to local.sh file.
file path: /etc/rc.local.d/local.sh
Below are the entries: # workaround script to restart nicmgmtd when it hits the memory limit /bin/kill $(cat /var/run/crond.pid) /bin/echo '*/5 * * * * /vmfs/volumes/datastore1/nicmgmtd/restart-nicmgmtd.sh /vmfs/volumes/datastore1/nicmgmtd/nicscript.log' >> /var/spool/cron/crontabs/root /usr/lib/vmware/busybox/bin/busybox crond
The location of the script file and its log file would have to be decided based on your preference.
The local.sh file gets executed only for non secure boot.