To resolve the issue related to the VCDB crash due to a duplicate value in the vCenter server database.
Symptoms -
VPXD crashes with a VCDB error
vpxd.log entries -
[YYYY-MM-DDTHH:MM:SS] error vpxd[58211] [Originator@6876 sub=Default opID=HB-host-434785@158686-60699ede] [Vdb::IsRecoverableErrorCode] Unable to recover from 23505:1
[YYYY-MM-DDTHH:MM:SS] error vpxd[58211] [Originator@6876 sub=Default opID=HB-host-434785@158686-60699ede] [VdbStatement] SQLError was thrown: "ODBC error: (23505) - ERROR: duplicate key value violates unique constraint "pk_nfle_file_info"
--> DETAIL: Key (vm_id, key_val)=(1285563, 0) already exists.;
--> Error while executing the query" is returned when executing SQL statement "INSERT INTO VPX_NORM_VM_FLE_FILE_INFO (VM_ID,KEY_VAL,NAME,FILE_SIZE,FILE_UNIQUESIZE,TYPE) VALUES (?,?,?,?,?,?)"
A duplicate value in the VCDB causes the VPXD service to crash.
Before you begin introducing any changes on the vCenter server database; Please ensure to keep offline snapshots / valid VCDB backup of the database.
Refer KB - Back up and restore vCenter Server Appliance/vCenter Server 6.x 7.0.x 8.0.x vPostgres database
After identifying the impacted VM_ID value from the vpxd logs. We need to follow the below steps to resolve this issue.
1. SSH to the vCenter server via root.
Stop the VPXD service, if the service has not yet crashed in your context.
# service-control --stop vpxd
2. Connect to the VCDB:
# /opt/vmware/vpostgres/current/bin/psql -d VCDB -U postgres
3. Run the following command to identify the VM ID associated with the vm_id found in the vpxd.log: (good to know information only)
VCDB=# select id, file_name from vpx_vm where id = 1285563;
id | file_name
---------+----------------------------------------------------------------------------------------------------
1285563 | ds:///vmfs/volumes/Datastore-UUID/VM_name/VM_name.vmx
(1 row)
4. Remove all duplicate entries from the table to resolve this issue.. associated with the KEY ID we have observed from the vpxd.log
delete from VPX_COMPUTE_RESOURCE_DAS_VM where VM_ID=1285563;
delete from VPX_COMPUTE_RESOURCE_DRS_VM where VM_ID=1285563;
delete from VPX_COMPUTE_RESOURCE_ORC_VM where VM_ID=1285563;
delete from VPX_VM_SGXINFO where VM_ID=1285563;
delete from VPX_GUEST_DISK where VM_ID=1285563;
delete from VPX_VM_VIRTUAL_DEVICE where ID=1285563;
delete from VPX_VM_DS_SPACE where VM_ID=1285563;
delete from VPX_NON_ORM_VM_CONFIG_INFO where ID=1285563;
delete from VPX_NORM_VM_FLE_FILE_INFO where VM_ID=1285563;
delete from VPX_VDEVICE_BACKING_REL where VM_ID=1285563;
delete from VPX_VIRTUAL_DISK_IOFILTERS where VM_ID=1285563;
delete from VPX_VM_STATIC_OVERHEAD_MAP where VM_ID=1285563;
delete from VPX_VM_TEXT where VM_ID=1285563;
delete from VPX_VM where ID=1285563;
delete from VPX_ENTITY where ID=1285563;
delete from VPX_DVPORT where connectee='vm-1285563';
Note: some of the delete commands may output '0 deleted', this is normal. Continue to proceed with the remaining delete commands.
5. Disconnect from the VCDB
VCDB=# \q
6. Start the VPXD service
# service-control --start vpxd
Note: If the above command fails to start vpxd, restart all services instead
# service-control --stop --all; service-control --start --all
Monitor the start of the VPXD service or all services in a separate SSH instance.
# watch service-control --status all
NOTE : It is recommended to set the DRS on the cluster to 'manual' while performing a vCenter server patch/ upgrade activity to avoid such an issue.