Host Disconnects and Reconnects - Connection Pool exhausted, vpxd in busy state
search cancel

Host Disconnects and Reconnects - Connection Pool exhausted, vpxd in busy state

book

Article ID: 323194

calendar_today

Updated On:

Products

VMware vCenter Server

Issue/Introduction

Symptoms:
  • Multiple hosts from several clusters on the workload domain get disconnected from vCenter.
  • Random ESXi hosts disconnect & reconnect in the vCenter
  • In the vpxd logs you see Connection Pool exhausted while the vpxd is in a busy state, vpxd log may rotate very fast 
  • vCenter Inventory taking a long time to load
  • Any activity like connecting or browsing inside the vCenter web is sluggish
  • Search option in ui does not load the results
  • Missed heartbeats exceeding > 700 seconds
  • vpxd seems to be busy
    2021-10-23T12:33:37.359Z - time the service was last started, Section for VMware VirtualCenter, pid=3420, version=6.7.0, build=18485185, option=Release
    <unset>,
    --> metricId = (vim.PerformanceManager.MetricId) [
    --> (vim.PerformanceManager.MetricId) {
    --> counterId = 440,
    --> instance = "4000"
    --> },
    --> (vim.PerformanceManager.MetricId) {
    --> counterId = 440,
    --> instance = ""
    --> }
    --> ],
    --> intervalId = 20,
    --> format = "csv"
    --> },
    --> (vim.PerformanceManager.QuerySpec) {
    --> entity = 'vim.VirtualMachine:vm-167940',
    --> startTime = "2021-10-23T09:45:01Z",
    --> endTime = "2021-10-23T12:31:32.447Z",
    --> maxSample = <unset>,
    --> metricId = (vim.PerformanceManager.MetricId) [
    --> (vim.PerformanceManager.MetricId) {
    --> counterId = 184,
    --> instance = ""

Second instance
2021-10-23T12:58:41.428Z [tomcat-exec-30 ERROR com.vmware.cis.server.util.ConnectionManager opId=] VPXD Connection Pool is exhausted

2021-10-23T12:58:41.428Z [tomcat-exec-30 ERROR com.vmware.cis.core.authz.accesscontrol.impl.CheckPrivilegesRouterRiseImpl opId=] Error occurred checking permissions for [urn:vmomi:Folder:group-d1:36b0668d-6eea-4867-ae5c-1b3e55650641] with userName= CORP.DOMAIN.COM\ReadOnly groups= [CORP.DOMAIN.COM\Users, VSPHERE.LOCAL\Everyone, CORP.DOMAIN.COM\USER, CORP.DOMAIN.COM\Domain Users] privileges= [System.Read]

Environment

VMware vCenter Server 6.7.x
VMware vCenter Server 7.0.0

Cause

Large DNS cache with stale entries results causing VPXd timeout attempting to find a working IP and connect to that working IP address for an ESX host.

Resolution

SSH to the vCenter 
Make the following configuration changes in /etc/dnsmasq.conf:

 - Enable "negative" caching
 - Increase cache size.

Here is the sample dnsmasq.conf that you can copy on VC:
 cat /etc/dnsmasq.conf

listen-address=127.0.0.1
bind-interfaces
user=dnsmasq
group=dnsmasq

#no-negcache
no-hosts
log-queries=extra
log-facility=/var/log/vmware/dnsmasq.log
domain-needed
dns-forward-max=300
cache-size=1638
4
neg-ttl=86400

After you've updated dnsmasq.conf, restart dnsmasq service:
# systemctl restart dnsmasq