vpxd crashes following a triggered backup with Dell Recovery Point installed
search cancel

vpxd crashes following a triggered backup with Dell Recovery Point installed

book

Article ID: 409602

calendar_today

Updated On:

Products

VMware vCenter Server

Issue/Introduction

Users are unable to access the vCenter UI and/or inventory is not loading.
The VMware vCenter Service (vpxd) is in a stopped state.

Restarting the vpxd service will temporarily resolve the issue until the next backup is triggered.

A backup is triggered on the vCenter Appliance.

The event VirtualMachine.reconfigure with LWD protect is present

/var/log/vmware/vpxd.log

<timestamp> info vpxd[XXXXX] [Originator@XXXXXsub=vpxLro opID=XXXXXXX-XXXXX-XX] [VpxLRO] -- BEGIN task-####### -- vm-XXXX -- vim.VirtualMachine.reconfigure -- task-id)
<timestamp> info vpxd[XXXXX] [Originator@#### sub=provisioning opID=XXXXXXX-XXXXX-XX] Extra overhead memory required for LWD protect is : 30
<timestamp> info vpxd[XXXXX] [Originator@#### sub=provisioning opID=XXXXXXX-XXXXX-XX] Total extra overhead: 30 MB. (LWD:30 + Resv:0)
<timestamp> info vpxd[XXXXX] [Originator@#### sub=provisioning opID=XXXXXXX-XXXXX-XX] CurrentOverhead = 332, currentOverheadLimit = 1091, availableOverheadResv = 759
<timestamp>info vpxd[XXXXX] [Originator@6876 sub=provisioning opID=XXXXXXX-XXXXX-XX] ExtraOverheadForReconfig = 30, availableOverhead = 759
<timestamp> info vpxd[XXXXX] [Originator@6876 sub=VdbOpJournal opID=XXXXXXX-XXXXX-XX] Added new journal id=### type=1
<timestamp>info vpxd[XXXXX] [Originator@6876 sub=vpxLro opID=XXXXXXX-XXXXX-XX [VpxLRO] -- BEGIN lro-XXXXXX -- vm-XXXX -- vim.VirtualMachine.reconfigure -- task-id)



The vpxd service crashes & the UI is no longer available

<timestamp> In(05) host-XXXXX<vpxd> Service is dumping core. Coredump count 2. CurrReq: 0
<timestamp> Wa(03) host-XXXXX [ReadSvcSubStartupData] No startup information from vpxd.
<timestamp> In(05) host-XXXXX <event-pub> Constructed command: /usr/bin/python /usr/lib/vmware-vmon/vmonEventPublisher.py --eventdata vpxd,UNHEALTHY,HEALTHY,1
<timestamp> Wa(03) host-XXXXX <vpxd> Service exited. Exit code 1
<timestamp> Wa(03) host-XXXXX <vpxd> Service exited unexpectedly. Crash count 2. Taking configured recovery action.


Latest /var/run/log/dpd.log messages & events are not recorded on the ESXi where the VM is registered.


TSDM is generating core dumps

/var/run/log/vobd.log

<timestamp> In(14) vobd[XXXXXX]:  [UserWorldCorrelator] 0000000000000us: [esx.problem.application.core.dumped.encrypted] An application (/opt/tsdm/bin/tsdm) running on ESXi host has crashed (1 time(s) so far)


tsdm observes a Panic during the Sync Operation

/var/run/log/tsdm.log

<timestamp> In(30) tsdm[XXXXX]: error TSDM[0x000000cXXXXXXX] [sub=SdmMemMgr]: baseConsumer.cpp:100 MonitorConsumerDeregistration: Consumer 'station_type: "SyncReaderStation" station_instance_id: XXXXXXXXXXXXXX, session: XXXXXXX index: 5, reg_session: XXXX connection_session: ######' didn't free all memory for long time and will be declared as zombie. usedCredits='257', totalLostCredits='1156'
<timestamp> In(30) tsdm[XXXXX]: none TSDM[0x000000cXXXXXX] [sub=SdmHooks]: appExitImpl.cpp:41 PanicExit:
<timestamp> In(30)[+] tsdm[XXXXX]:
<timestamp> In(30)[+] tsdm[XXXXX]: Panic: Total lost credits amount is bigger than allowed. TotalLostCredits=1413, AllowedLostCredits=0 - /space/######/workspace/mob_sdm-portable_release_XX.XX.X.X/src/core/src/memoryManager/memoryManager.cpp:XXXX

tsdm & dpd services are started & restarting does not change behaviour

/etc/init.d/dpd status
/etc/init.d/tsdm status

/etc/init.d/dpd restart
/etc/init.d/tsdm restart

Implementing KB does not resolve the issue 




Environment

vCenter 8.x
RecoverPoint for Virtual Machines 6.x

Cause

Protected Virtual Machines(VMs) crash during backup operations due to compatibility issue with Dell RecoverPoint & other Backup Solutions 

Resolution

Suspend RecoverPoint
Open a case with Dell

https://www.dell.com/support/kbdoc/en-ie/000293193/recoverpoint-for-vms-6-x-support-for-backup-solutions