ESXi Desired State Cluster Configuration Remediation failed with error "Apply plugin 'DELETE:esx:hardware:pci_devices' failed"
search cancel

ESXi Desired State Cluster Configuration Remediation failed with error "Apply plugin 'DELETE:esx:hardware:pci_devices' failed"

book

Article ID: 416708

calendar_today

Updated On:

Products

VMware vCenter Server VMware vSphere ESXi

Issue/Introduction

  • ESXi Desired State cluster configuration remediation fails with the error "Apply plugin 'DELETE:esx:hardware:pci_devices' failed.

  • Review of /var/run/log/settingsd.log reveals the following sequence of events, indicating the failure of the esx:hardware:pci_devices plugin during the apply operation:

YYYY-MM-DD T22:40:34.627Z In(14) settingsd\[2099396\]: info \[ConfigStore:6cf8044700\] Forking plugin for id=esx:hardware:pci_devices
YYYY-MM-DD T22:40:34.627Z In(14) settingsd\[2099396\]: info \[ConfigStore:6cf80c5700\] Starting plugin monitor thread
YYYY-MM-DD T22:40:34.684Z In(14) settingsd\[2099396\]: info \[ConfigStore:6cf80c5700\] Stopping plugin monitor thread
YYYY-MM-DD T22:40:34.684Z In(14) settingsd\[2099396\]: info \[ConfigStore:6cf8044700\] Config Manager plugin=esx:hardware:pci_devices, finished successfully
YYYY-MM-DD T22:40:34.690Z In(14) settingsd\[2099396\]: info \[ConfigStore:6cf8044700\] Plugin esx:hardware:pci_devices completed operation APPLY in 0.066281 seconds.
YYYY-MM-DD T22:40:34.690Z Er(11) settingsd\[2099396\]: error \[ConfigStore:6cf8044700\] esx:hardware:pci_devices plugin failed to execute.
YYYY-MM-DD T22:40:34.714Z In(14) settingsd\[2099396\]: info \[ConfigStore:6cf7fc3700\] Task completed: apply$task:52916e09-7e4d-0f05-31b7-04a505877737

Environment

VMware vSphere ESXi Host

VMware vCenter Server

Cause

The incorrect PCI device configurations on the affected ESXi host nodes. The ESXi configuration store (config store) contains entries for PCI devices that are not physically present on the server hardware. These phantom devices often have "null" hardware labels and missing device names, appearing with characteristics like "hardware label": "#" and a sbdf address (e.g., "sbdf": "0000:d8:00.2") but no corresponding physical device.

  • The /var/run/log/syslog.log provides further evidence of this discrepancy, showing errors when the pci_devices plugin attempts to locate these non-existent devices:
YYYY-MM-DD T22:40:33.823Z In(14) ConfigStore\[9132184\]: Impact plugin invoked for key pci_devices
YYYY-MM-DDT22:40:33.823Z Er(11) ConfigStore\[9132184\]: pci_devices : Did not find PCI device at 0000:d8:00.3
YYYY-MM-DD T22:40:33.823Z Er(11) ConfigStore\[9132184\]: pci_devices : Did not find PCI device at 0000:d8:00.2
YYYY-MM-DD T22:40:33.823Z In(14) ConfigStore\[9132184\]: info \[ConfigStore:2cb0bf500\] Module{libesx_hardware_pci_devices.so} returned status = 1
YYYY-MM-DD T22:40:33.826Z In(14) ConfigStore\[9132184\]: info \[ConfigStore:2cb0bf500\] dlclose(module=libesx_hardware_pci_devices.so) completed with rc=0
  • To confirm the presence of these unused and incorrectly labeled PCI devices, the following command can be executed on the ESXi host:

#configstorecli config current get -c esx -g hardware -k pci_devices This command will display entries similar to:

"dl_bus address": "s00000003.02",
"hardware label": " ",
"sbdf": "0000:d8:00.2"
  • These entries indicate PCI devices that are configured in the software but lack corresponding physical hardware, leading to the remediation failure. 

    We can also confirm by executing lspci | grep "<sbdf>"and the output will be empty.

Resolution

  • To resolve this issue, the extraneous PCI device entries must be deleted from the config store on all affected ESXi hosts using the following either of  below command 

    • Below command to cleanup all PCI device configuration (Used/unused)
      configstorecli config current delete -c esx -g hardware -k pci_devices --all


    • Below command to cleanup specific PCI device configuration
      configstorecli config current delete -c esx -g hardware -k pci_devices -i <sbdf>

Note: This command can be executed directly on the ESXi host without requiring the host to be placed into maintenance mode. However, it is critically important to thoroughly validate that the PCI devices being targeted for deletion are indeed unused and do not correspond to any active hardware components. Incorrectly deleting entries for in-use PCI devices can lead to system instability or hardware malfunction. Always verify the status and purpose of each PCI device entry before executing this command.

  • Retry the remediation of ESXi Desired State Cluster Configuration post cleanup.