VMkernel log is filled with messages "Admission failure in path: vsan-scripts/python.****" events
search cancel

VMkernel log is filled with messages "Admission failure in path: vsan-scripts/python.****" events

book

Article ID: 326419

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

Symptoms:
  • VM creation workflow failed with below errors:
2019-08-27 15:38:40.409] [I] Disk 2 Datastore: Example-vSAN-Datastore-1
[2019-08-27 15:38:42.277] [I] Datastore: datastore-327516 is not compliant with policy 1e901cf8-3fb6-4401-a53a-f36a0bc2c806
[2019-08-27 15:38:46.520] [E] Error in (Workflow:Find Compliant Datastores / Find Compliant Datastores (item3)#47) Can't find compliant datastore for vmhome
  • During VM creation from the vCenter server via vSphere client, the vSAN datastore will be listed as "Incompatible".
  • The "On-disk format version" on the disk-group from Host is shown as empty.
  • The log file /var/run/log/vmkernel.log will have following log pattern:
2019-08-27T20:19:06.141Z cpu30:16298917)MemSched: 14642: uw.16298917 (117469717) extraMin/extraFromParent: 193/193, vsan-scripts (789) childEmin/eMinLimit: 51077/51200
2019-08-27T20:19:06.141Z cpu30:16298917)MemSched: 14635: Admission failure in path: vsan-scripts/python.16298917/uw.16298917
2019-08-27T20:19:06.141Z cpu30:16298917)MemSched: 14642: uw.16298917 (117469717) extraMin/extraFromParent: 193/193, vsan-scripts (789) childEmin/eMinLimit: 51077/51200
2019-08-27T20:19:06.141Z cpu30:16298917)MemSched: 14635: Admission failure in path: vsan-scripts/python.16298917/uw.16298917
2019-08-27T20:19:06.141Z cpu30:16298917)MemSched: 14642: uw.16298917 (117469717) extraMin/extraFromParent: 256/256, vsan-scripts (789) childEmin/eMinLimit: 51077/51200
2019-08-27T20:19:06.141Z cpu30:16298917)MemSched: 14635: Admission failure in path: vsan-scripts/python.16298917/uw.16298917
2019-08-27T20:19:06.141Z cpu30:16298917)MemSched: 14642: uw.16298917 (117469717) extraMin/extraFromParent: 193/193, vsan-scripts (789) childEmin/eMinLimit: 51077/51200
2019-08-27T20:19:06.141Z cpu30:16298917)MemSched: 14635: Admission failure in path: vsan-scripts/python.16298917/uw.16298917
2019-08-27T20:19:06.141Z cpu30:16298917)MemSched: 14642: uw.16298917 (117469717) extraMin/extraFromParent: 193/193, vsan-scripts (789) childEmin/eMinLimit: 51077/51200
2019-08-27T20:19:06.141Z cpu30:16298917)MemSched: 14635: Admission failure in path: vsan-scripts/python.16298917/uw.16298917
2019-08-27T20:19:06.141Z cpu30:16298917)MemSched: 14642: uw.16298917 (117469717) extraMin/extraFromParent: 256/256, vsan-scripts (789) childEmin/eMinLimit: 51077/51200
2019-08-27T20:19:06.141Z cpu30:16298917)MemSched: 14635: Admission failure in path: vsan-scripts/python.16298917/uw.16298917
++ VOBD.log reported multiple crash dumps. 
2019-08-27T01:19:52.934Z: [UserWorldCorrelator] 4892856737536us: [vob.uw.core.dumped] /bin/python3.5(16073983) /var/core/python-zdump.003
2019-08-27T01:19:54.805Z: [UserWorldCorrelator] 4892858608293us: [vob.uw.core.dumped] /bin/esxcfg-advcfg(16041217) /var/core/esxcfg-advcfg-zdump.003
2019-08-27T01:26:31.742Z: [UserWorldCorrelator] 4893255511330us: [vob.uw.core.dumped] /bin/python3.5(15747859) /var/core/python-zdump.000
2019-08-27T01:26:45.602Z: [UserWorldCorrelator] 4893269369854us: [vob.uw.core.dumped] /bin/esxcfg-advcfg(16075670) /var/core/esxcfg-advcfg-zdump.000
2019-08-27T01:32:57.460Z: [UserWorldCorrelator] 4893641196016us: [vob.uw.core.dumped] /bin/esxcfg-advcfg(16076867) /var/core/esxcfg-advcfg-zdump.001
2019-08-27T01:33:00.455Z: [UserWorldCorrelator] 4893644190338us: [vob.uw.core.dumped] /bin/python3.5(16076993) /var/core/python-zdump.001
2019-08-02T03:00:11.763Z: [UserWorldCorrelator] 2739061434437us: [vob.uw.core.dumped] /bin/python3.5(9008562) /var/core/python-zdump.000
2019-08-02T03:00:11.763Z: [UserWorldCorrelator] 2739296903918us: [esx.problem.application.core.dumped] An application (/bin/python3.5) running on ESXi host has crashed (1 time(s) so far). A core file may have been created at /var/core/python-zdump.000.
2019-08-02T03:00:12.848Z: [UserWorldCorrelator] 2739062519249us: [vob.uw.core.dumped] /bin/cmmds-tool(9008563) /var/core/cmmds-tool-zdump.000
2019-08-02T03:00:12.848Z: [UserWorldCorrelator] 2739297988884us: [esx.problem.application.core.dumped] An application (/bin/cmmds-tool) running on ESXi host has crashed (5 time(s) so far). A core file may have been created at /var/core/cmmds-tool-zdump.000.
  • Following crash dumps in /var/core on the ESXi host can be observed:
$ ls -al

total 194
drwxrwxr-x 2 root  root 339 Aug 29 02:34 .
drwxrwxr-x 6 root  root 85 Aug 29 02:34 ..
-rw-rw-r-- 1 root  root 1617920 Aug 29 02:34 cmmds-tool-zdump.000
-rw-rw-r-- 1 root  root 1617920 Aug 29 02:34 cmmds-tool-zdump.001
-rw-rw-r-- 1 root  root 1630208 Aug 29 02:34 cmmds-tool-zdump.003
-rw-rw-r-- 1 root  root 2453504 Aug 29 02:34 esxcfg-advcfg-zdump.000
-rw-rw-r-- 1 root  root 2437120 Aug 29 02:34 esxcfg-advcfg-zdump.001
-rw-rw-r-- 1 root  root 2424832 Aug 29 02:34 esxcfg-advcfg-zdump.003
-rw-rw-r-- 1 root  root 1040384 Aug 29 02:34 python-zdump.000
-rw-rw-r-- 1 root  root 1482752 Aug 29 02:34 python-zdump.001
-rw-rw-r-- 1 root  root 1826816 Aug 29 02:34 python-zdump.003
  • Log /var/run/log/hostd.log reports out of memory events:
2019-08-27T17:33:57.357Z error hostd[54081B70] [Originator@6876 sub=SysCommandPosix opID=5192a947-7f-a84a user=vpxuser:VSPHERE.LOCAL\vpxd-extension-624a1e51-ab4a-4131-91f0-236023a6756e] Failed to ForkExec /usr/bin/sh:
 Cannot allocate memory
2019-08-27T17:33:57.357Z error hostd[54081B70] [Originator@6876 sub=SysCommandPosix opID=5192a947-7f-a84a user=vpxuser:VSPHERE.LOCAL\vpxd-extension-624a1e51-ab4a-4131-91f0-236023a6756e] Invocation of process: '/usr/bi
n/sh' failed: Resource error - Process Creation - Error: VmkuserCompat_ForkExec - Error Cannot allocate memory
2019-08-27T17:33:57.357Z error hostd[54081B70] [Originator@6876 sub=SOAP command stub adapter /usr/bin/sh opID=5192a947-7f-a84a user=vpxuser:VSPHERE.LOCAL\vpxd-extension-624a1e51-ab4a-4131-91f0-236023a6756e] Error Resource error - Process Creation - Error: VmkuserCompat_ForkExec - Error Cannot allocate memory starting command
2019-08-27T17:33:57.357Z error hostd[54081B70] [Originator@6876 sub=Default opID=5192a947-7f-a84a user=vpxuser:VSPHERE.LOCAL\vpxd-extension-624a1e51-ab4a-4131-91f0-236023a6756e] AdapterServer caught exception: Error Resource error - Process Creation - Error: VmkuserCompat_ForkExec - Error Cannot allocate memory starting command
2019-08-27T17:33:57.357Z error hostd[54081B70] [Originator@6876 sub=Default opID=5192a947-7f-a84a user=vpxuser:VSPHERE.LOCAL\vpxd-extension-624a1e51-ab4a-4131-91f0-236023a6756e] Backtrace:
--> [backtrace begin] product: VMware ESX, version: 6.5.0, build: build-13004031, tag: hostd, cpu: x86, os: esx, buildType: release
--> backtrace[00] libvmacore.so[0x002DC73F]: Vmacore::System::Stacktrace::CaptureWork(unsigned int)
--> backtrace[01] libvmacore.so[0x00125DA2]: Vmacore::System::SystemFactoryImpl::CreateQuickBacktrace(Vmacore::Ref&)
--> backtrace[02] libvmacore.so[0x000DD98F]: Vmacore::Throwable::Throwable(std::string const&)
--> backtrace[03] libvmomi.so[0x0015B02F]
--> backtrace[04] libvmomi.so[0x000F9910]: Vmomi::StubImpl::_Invoke_Task(Vmomi::ManagedMethod*, std::vector, std::allocator > >&, Vmacore::Ref&)
--> backtrace[05] libvmomi.so[0x00162EF9]: Vmomi::StubBackedManagedObjectImpl::_Invoke(Vmomi::ManagedMethod*, std::vector, std::allocator > >&, Vmacore::Ref&)
--> backtrace[06] libvmomi.so[0x000C4FE5]
--> backtrace[07] hostd[0x008E423E]
--> backtrace[08] hostd[0x008E49DB]
--> backtrace[09] hostd[0x008E6BF6]
--> backtrace[10] hostd[0x008FBCEA]
--> backtrace[11] libvmacore.so[0x00201039]
--> backtrace[12] libvmacore.so[0x0012A5C2]: std::function::operator()() const
--> backtrace[13] libvmacore.so[0x001F9F19]
--> backtrace[14] libvmacore.so[0x001FB7E9]
--> backtrace[15] libvmacore.so[0x001FB9BB]
--> backtrace[16] libvmacore.so[0x00200AB6]
--> backtrace[17] libvmacore.so[0x002019CB]: std::_Function_handler (int)> >::_M_invoke(std::_Any_data const&)
--> backtrace[18] libvmacore.so[0x0012A5C2]: std::function::operator()() const
--> backtrace[19] libvmacore.so[0x001FEF16]
--> backtrace[20] libvmacore.so[0x0013724F]
--> backtrace[21] libvmacore.so[0x001FA79A]
--> backtrace[22] libvmacore.so[0x001FE535]
--> backtrace[23] libvmacore.so[0x002E4584]
--> backtrace[24] libpthread.so.0[0x00006CCA]
--> backtrace[25] libc.so.6[0x000D4C3E]
--> [backtrace end]
  • RVC commands are failing to query the objects on vCenter server command line:

    C:\Program Files\VMware\vCenter Server\rvc>rvc
    Try "--help" or "-h" to view more information about RVC usage.
No argument is specified, connecting to localhost as [email protected]
0 /
1 localhost/
VMware virtual center <XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX>
*** command> vsan.support_information
************* BEGIN Support info for datacenter DC-NAME *************
************* BEGIN Support info for cluster CLUSTER-NAME *************
*** command>vsan.cluster_info CLUSTER-NAME
2018-01-01 00:01:00 +0900: Fetching host info from HOST1 (may take a moment) ...
2018-01-01 00:01:00 +0900: Fetching host info from HOST2 (may take a moment) ...
2018-01-01 00:01:00 +0900: Fetching host info from HOST3 (may take a moment) ...
2018-01-01 00:01:00 +0900: Fetching host info from HOST4 (may take a moment) ...
2018-01-01 00:01:00 +0900: Failed to gather from HOST3: RbVmomi::Fault: SystemError: A general system error occurred:
2018-01-01 00:01:00 +0900: Failed to gather from HOST1: RbVmomi::Fault: SystemError: A general system error occurred:
2018-01-01 00:01:00 +0900: Failed to gather from HOST4: RbVmomi::Fault: SystemError: A general system error occurred:
2018-01-01 00:01:00 +0900: Failed to gather from HOST2: RbVmomi::Fault: SystemError: A general system error occurred: 
Host: HOST1
Host: HOST2
Host: HOST3
Host: HOST4

No Fault Domains configured in this cluster

*** command>vsan.host_info HOST4
2018-01-01 00:01:00 +0900: Fetching host info from HOST4 (may take a moment) ...
2018-01-01 00:01:00 +0900: Failed to gather from HOST4: RbVmomi::Fault: SystemError: 一A general system error occurred:

 vsan,host_info HOST4 hit an error
undefined method `[]' for nil:NilClass

*** command>vsan.vm_object_info
2018-01-01 00:01:00 +0900: Fetching vSAN disk info from HOST1 (may take a moment) ...
2018-01-01 00:01:00 +0900: Fetching vSAN disk info from HOST2 (may take a moment) ...
2018-01-01 00:01:00 +0900: Fetching vSAN disk info from HOST3 (may take a moment) ...
2018-01-01 00:01:00 +0900: Fetching vSAN disk info from HOST4 (may take a moment) ...
2018-01-01 00:01:00 +0900: Done fetching vSAN disk infos
SystemError: A general system error occurred:
C:/Program Files/VMware/vCenter Server/rvc/lib/rvc/lib/vsangeneric.rb:820:in `parse_response'


Environment

VMware vSAN 6.5.x
VMware vSAN 6.6.x

Cause

At the end of backtrace, "canProvisionObjects" reports below events + with fault reason empty: SPBM invokes VsanInternalSystem.canProvisionObjects with a VSAN policy to know the compatibility of VSAN datastore with the policy.

he hostd.log on ESXi will show following traces related to this:
 
2019-08-27T17:33:57.360Z info hostd[54081B70] [Originator@6876 sub=Solo.Vmomi opID=5192a947-7f-a84a user=vpxuser:VSPHERE.LOCAL\vpxd-extension-624a1e51-ab4a-4131-91f0-236023a6756e] Activation [N5Vmomi10ActivationE:0x54c06b10] : Invoke done [canProvisionObjects] on [vim.host.VsanInternalSystem:ha-vsan-internal-system]
2019-08-27T17:33:57.360Z verbose hostd[54081B70] [Originator@6876 sub=Solo.Vmomi opID=5192a947-7f-a84a user=vpxuser:VSPHERE.LOCAL\vpxd-extension-624a1e51-ab4a-4131-91f0-236023a6756e] Arg npbs:
--> (vim.host.VsanInternalSystem.NewPolicyBatch) [
--> (vim.host.VsanInternalSystem.NewPolicyBatch) {
--> size = (long) [
--> 1
--> ],

--> policy = "stripeWidthVSANstripeWidth1replicaPreferenceVSANreplicaPreferenceRAID-5/6 (Erasure Coding) - CapacityhostFailuresToTolerateVSANhostFailuresToTolerate2iopsLimitVSANiopsLimit2500vasa20 subpr
ofileProfile-Driven Storage Management2019-08-27T17:33:57.318+00:000Profile-Driven Storage ManagementDummy Policy NameCREATE_SPEC"
--> }
--> ]
2019-08-27T17:33:57.360Z verbose hostd[54081B70] [Originator@6876 sub=Solo.Vmomi opID=5192a947-7f-a84a user=vpxuser:VSPHERE.LOCAL\vpxd-extension-624a1e51-ab4a-4131-91f0-236023a6756e] Arg ignoreSatisfiability:
--> false
2019-08-27T17:33:57.360Z info hostd[54081B70] [Originator@6876 sub=Solo.Vmomi opID=5192a947-7f-a84a user=vpxuser:VSPHERE.LOCAL\vpxd-extension-624a1e51-ab4a-4131-91f0-236023a6756e] Throw vmodl.fault.SystemError
2019-08-27T17:33:57.360Z info hostd[54081B70] [Originator@6876 sub=Solo.Vmomi opID=5192a947-7f-a84a user=vpxuser:VSPHERE.LOCAL\vpxd-extension-624a1e51-ab4a-4131-91f0-236023a6756e] Result:
--> (vmodl.fault.SystemError) {
--> faultCause = (vmodl.MethodFault) null,
--> faultMessage = ,
--> reason = ""
--> msg = ""
--> }
2019-08-27T17:33:59.839Z verbose hostd[54081B70] [Originator@6876 sub=PropertyProvider opID=3937b1e6-8c-a84b user=vpxuser:VSPHERE.LOCAL\vpxd-extension-624a1e51-ab4a-4131-91f0-236023a6756e] RecordOp ASSIGN: info, haTask--vim.host.VsanInternalSystem.canProvisionObjects-171239561. Applied change to temp map.
2019-08-27T17:33:59.840Z info hostd[54081B70] [Originator@6876 sub=SysCommandPosix opID=3937b1e6-8c-a84b user=vpxuser:VSPHERE.LOCAL\vpxd-extension-624a1e51-ab4a-4131-91f0-236023a6756e] ForkExec(/usr/bin/sh) 16266080

Resolution

  • In 6.5 release, hostd needs to fork a python child process to run VsanInternalSystem.py which calls VsanInternalSystem API. Then this child process assumed to be destroyed after API call.
  • Some of the child process does not get destroyed , then the python core dump are created by "VsanInternalSystem.py" due to multiple VsanInternalSystem.py python process concurrently running which exhausted resource pool vsan-scripts memory.
  • The reason that multiple VsanInternalSystem.py python processes concurrently run is that the SPBM calls the VsanInternalSystem CanProvisionObjects API whenever it needs to verify the vSAN datastore compatibility when performing a VM provisioning process.


Workaround:
  • On ESXi 6.5.x, the process VsanInternalSystem.py can be stopped instead of rebooting the affected host.
  • You could list the process by running the command on affected host by following command
# ps -cPTgjstz | grep VsanInternalSystem.py
  • Then following command can be used with the process ID provided above:
# kill -9 <Process ID>
  • In case you do not see the process ID, the option is to reboot the host by using maintenance mode. You will not see such process on unaffected hosts.


Additional Information

Impact/Risks:
New VM creation may fail by reporting vSAN datastore as "non-compatible" datastore.