High CPU utilization on vCenter Server(s) managing encrypted vSAN Datastores
search cancel

High CPU utilization on vCenter Server(s) managing encrypted vSAN Datastores

book

Article ID: 318211

calendar_today

Updated On:

Products

VMware vCenter Server

Issue/Introduction

Symptoms:
  • You have one or more VCSAs with Small to Large inventory sizes.
  • The vSphere environment consists of vSAN Datastores which uses Data at Rest Encryption and are configured to use KMS servers hosted as VMs in the environment.
  • The CPU utilization on these VCSAs gradually increase and reach 100% usage over a period of 2 to 3 weeks.
  • In the vCenter Performance chart, you see the VCSA VM consuming 100% CPU across all cores.
  • Running vimtop cli utility from a SSH session shows the "vCenter Server" aka "vpxd" process consuming most of the CPU resources.

vpxd cpu

image.png

  • vCenter UI performance or other tasks are not visibly impacted - no apparent slowness is observed, however "VM CPU usage" alerts are triggered on the ESXi host where the VCSA VM is running.
  • The vSAN Health page is reporting an alert for the "vCenter and all hosts are connected to Key Management Servers" test
  • A reboot of the VCSA VM restores the CPU usage to normal until it starts to gradually increase again.

 


Environment

VMware vCenter Server 7.0.x

Cause

The VCSA CPU usage is caused by a bug in the CryptoManagerKmip API used by the vCenter for managing the KMS servers configured for vSAN Datastore Encryption.

Due to connectivity or other stability issues with the KMS server, the Kmip call would get stuck and potentially spin in an infinite loop.      

Over a period of 2 to 3 weeks, the vpxd process would aggregate more Kmip infinite loop calls, causing CPU usage to hit the ceiling.

 

 

Resolution

 

The issue is resolved in vCenter Server 7.0 Update 3l (build number 21477706)
The issue is resolved in vCenter Server 8.0 Update 1 (build number 21457385)


Workaround:

Restart your VCSA VM when the CPU usage alerts are triggered or periodically during a maintenance window.


Additional Information

Impact/Risks:

"Virtual Machine CPU usage" Alarm in the vCenter is triggered periodically causing concern for Monitoring teams.