vCenter stops responding after re-configuring HA over cluster while customer is using KMS
search cancel

vCenter stops responding after re-configuring HA over cluster while customer is using KMS

book

Article ID: 318558

calendar_today

Updated On:

Products

VMware vCenter Server

Issue/Introduction

Symptoms:

vCenter Server configured with KMS may encounter the "Hosts and Cluster" tab to be unavailable with no clusters, hosts, or VMs visible until the vpxd process is restarted on vCenter Server.

For more information about KMS, see 
Add a KMS to vCenter Server


Environment

VMware vCenter Server 6.7.x
VMware vCenter Server 7.0.x

Cause

When reconfiguring High Availability (HA) enabled cluster , a synchronization operation called "update cluster keys" is invoked. This operation holds a lock called "vpxdstatelock" and "cluster lock" which are required for the reconfiguration process to take place. However, there is another operation that runs regularly, called "refresh kms cache" which runs every 5 minutes by default. This operation also needs to hold a lock called "kms cache lock" and "host manage object lock" which may conflict with the "vpxdstatelock" held by the cluster configuration operation.


In this scenario, if the "refresh kms cache" operation acquires the "kms cache lock" earlier than the "update cluster keys" operation, the latter operation will also need to acquire the "kms cache lock" but it will be blocked by the "refresh kms cache" operation. This can lead to a deadlock situation where both operations are blocked and unable to proceed, causing the reconfiguration process to until the vpxd process is restarted.

Resolution

The issue is resolved in vCenter Server 7.0 Update 3l (build number 21477706)

Workaround:

The vpxd.KMS.compatCheckInterval is a configuration parameter that controls how often the vCenter Server checks the compatibility of the KMS cluster with the vCenter Server. By increasing the value of this parameter, you are effectively increasing the interval between the compatibility checks and thus delaying the kms cache refresh operation.

Note: Please make sure that you have a proper backup or snapshot before doing the changes.

  1. Log in to the vCenter Server Web Client and navigate to the vCenter object.

  2. Go to the Configure tab, then select Advanced Settings, and click the Edit button.

  3. Search for the vpxd.KMS.compatCheckInterval setting and change its value from 5 to 20.

  4. After making the change, you will need to restart the vpxd service to apply the changes. In order to restart the vpxd service you will need to execute the command below:

service-control --stop vpxd && service-control --start vpxd
  1. Re-configure the HA and monitor the behavior.