Failure to enter maintenance mode & dumped python-zdump.000
search cancel

Failure to enter maintenance mode & dumped python-zdump.000

book

Article ID: 326875

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction



Symptoms:
  • Occasionally, the ESXi host does not respond when the host is entering maintenance mode with No data evacuation Option.
  • You see message similar to:
Status: A general system error occurred: HTTP error response: Service Unavailable
Status: An error occurred during communication with the remote host
  • In the vmkernel.log system log file ESXi host, you see entries similar to:
  2019-11-02T05:43:14.799Z cpu8:67617)User: 3089: vsanmgmtd-worke: wantCoreDump:vsanmgmtd-worke signal:6 exitCode:0 coredump:enabled
  2019-11-02T05:43:15.027Z cpu8:67617)UserDump: 3024: vsanmgmtd-worke: Dumping cartel 67603 (from world 67617) to file /var/core/python-zdump.000
  • In the vobd.log system log file ESXi host, you see entries similar to:
  2019-11-02T05:54:44.540Z: [UserWorldCorrelator] 21320988428682us: [vob.uw.core.dumped] /bin/python3.5(67603) /var/core/python-zdump.000
  2019-11-02T05:54:44.540Z: [UserWorldCorrelator] 21321354329937us: [esx.problem.application.core.dumped] An application (/bin/python3.5) running on ESXi host has crashed (1 time(s) so far). A core file may have been created at /var/core/python-zdump.000.
  • If vsanmgmtd service crashed, it will restart automatically after dump output is completed.


Environment

VMware vSAN 6.5.x
VMware vSAN 6.7.x
VMware vSAN 6.6.x

Cause

This issue is triggered by Enter maintenance mode with 'noAction', which causes many object changed status in a short time window.
Object monitor will try to send back every object's status and cause the vsanmgmtd to consume huge size of memory to send object status back.

Resolution

This issue is resolved in VMware vSAN 7.0. For more information, see Release Notes.

Workaround:
To work around this issue:
  1. Perform host maintenance in Rolling.
  2. Enter host in maintenance mode by enabling the Ensure data accessibility from other hosts option.

Notes:
  • If you reboot or shutdown all hosts in vSAN cluster, use the workaround provided in the VMware Knowledge Base article A simultaneous reboot or shutdown of all hosts in the vSAN cluster may result in data unavailability after a single failure.
  • If you cannot perform host maintenance in Rolling, disabling vSphere HA and DRS before enter maintenance mode may help mitigate the issue.
  • If your host is already not responding, you can try following steps:
    1. Restart the service:
      1. # /etc/init.d/hostd restart
      2. # /etc/init.d/vsanmgmtd restart
      3.  # /etc/init.d/vsanvpd restart
      4.  # /etc/init.d/vpxa restart
    2. If not resolved in step (1):
Restart ESXi 
# reboot

If your problem still exists after trying the steps in this article, File a support request with VMware Support and note this Knowledge Base article ID (77181) in the problem description. For more information, see How to File a Support Request.