Unable to place ESXi host into maintenance mode after rebooting vCenter
search cancel

Unable to place ESXi host into maintenance mode after rebooting vCenter

book

Article ID: 416851

calendar_today

Updated On:

Products

VMware vSphere Kubernetes Service

Issue/Introduction

  • You reboot vCenter in an attempt to clear an "Enter maintenance mode" task after a host acting as a supervisor node failed to go into maintenance mode.

  • After vCenter was rebooted the task was cleared, but the supervisor nodes are in a Not ready, scheduling disabled state.

  • On the vCenter server, the log /var/log/vmware/wcp/wcpsvc.log shows the following entry repeating:
    <TIMESTAMP> error wcp [kubelifecycle/node_controller.go:1125] [opID=<UUID>-host-<ID>] Intent nodeEnterMaintModeIntent, step commitWork for supervisor <UUID> node host-<ID> returned error ServerFaultCode: The object 'vim.Task:task-<ID>' has already been deleted or has not been completely created

  • On the ESXi hosts' /var/log/vmware/spherelet.log, you see that the spherelet service crashes after being up for around 5 minutes:
    <TIMESTAMP> No(13) spherelet[2106321]: W1009 <TIMESTAMP> 2106294 reflector.go:456] k8s.io/client-go/informers/factory.go:145: watch of *v1.ConfigMap ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding

Environment

vSphere Kubernetes Service 8

Cause

There is a mismatch in the maintenance mode state between vCenter and them hosts themselves.

Resolution

1. ssh to the vCenter server.

2. Stop the WCP service: vmon-cli -k wcp

3. Log into the vCenter database: /opt/vmware/vpostgres/current/bin/psql -d VCDB postgres

4. View the current host configs: select * from wcp.node_configs;

VCDB=# select * from wcp.node_configs;
   node_id   | desired_state   | cluster          | task      | maintenance_action  | instance_id
-------------+-----------------+------------------+-----------+---------------------+------------
 host-268505 | Ready           | domain-<ID>:<ID> | task-<ID> | noaction            | <ID>
 host-558236 | NodeMaintenance | domain-<ID>:<ID> | task-<ID> | noaction            | <ID>
 host-525033 | NodeMaintenance | domain-<ID>:<ID> | task-<ID> | ensureaccessibility | <ID>
 host-268502 | Ready           | domain-<ID>:<ID> |           |                     | <ID>
(4 rows)



5. For any hosts that has a task in the task column, clear it out and set the desired state to Ready by running the following:

UPDATE wcp.node_configs SET desired_state = 'Ready', task = '' WHERE node_id = 'host-<ID>';
UPDATE wcp.node_configs SET desired_state = 'Ready', task = '' WHERE node_id = 'host-<ID>';
UPDATE wcp.node_configs SET desired_state = 'Ready', task = '' WHERE node_id = 'host-<ID>';

VCDB=# select * from wcp.node_configs;
   node_id   | desired_state | cluster          | task | maintenance_action  | instance_id
-------------+---------------+------------------+------+---------------------+------------
 host-268505 | Ready         | domain-<ID>:<ID> |      | noaction            | <ID>
 host-558236 | Ready         | domain-<ID>:<ID> |      | noaction            | <ID>
 host-525033 | Ready         | domain-<ID>:<ID> |      | ensureaccessibility | <ID>
 host-268502 | Ready         | domain-<ID>:<ID> |      |                     | <ID>
(4 rows)


6. Exit the database: \q

7. Start WCP: vmon-cli -i wcp

8. Attempt to place the host into maintenance mode.