SRM 8.x service crashed intermittently with several backtraces reported for "Failed to VirtualMachineReconfigure"
search cancel

SRM 8.x service crashed intermittently with several backtraces reported for "Failed to VirtualMachineReconfigure"

book

Article ID: 338471

calendar_today

Updated On:

Products

VMware Live Recovery VMware vSphere ESXi

Issue/Introduction

Symptoms:

  • SRM service crashes intermittently and needs to be started manually which keeps it running for sometime
  • Looking at vmware-dr.logs there are several backtraces reported for 'Failed to VirtualMachineReconfigure' :
2020-02-13T01:37:24.028-05:00 warning vmware-dr[30245] [SRM@6876 sub=VmDomain opID=33b2e106:e648:a60d] Failed to VirtualMachineReconfigure of VM vim.VirtualMachine:########-####-####-####-########e2b1:vm-9787, retries left 15
--> N2Dr16TimeoutExceptionE Operation timed out after '1080.007519' seconds
--> [context]zKq7AVECAAQAACF42wALdm13YXJlLWRyAACLxRhsaWJ2bWFjb3JlLnNvAAHChQdsaWJjb25uZWN0aW9uLWJhc2Uuc28AAcWTCwIyewZsaWJjb21tb24uc28AAq17BgATQjEAUZ4pAP+6KQAaNDcDlXQAbGlicHRocmVhZC5zby4wAASvAg9saWJjLnNvLjYA[/context]
--> [backtrace begin] product: VMware vCenter Site Recovery Manager, version: 8.2.0, build: build-14383137, tag: vmware-dr, cpu: x86_64, os: linux, buildType: release
--> backtrace[03] libvmacore.so[0x0018C58B]: Vmacore::Throwable::Throwable(std::string&&)
--> backtrace[04] libconnection-base.so[0x000785C2]
--> backtrace[05] libconnection-base.so[0x000B93C5]: Dr::Internal::RemoteTaskBase::CheckTimeCallback()
--> backtrace[06] libcommon.so[0x00067B32]
--> backtrace[07] libcommon.so[0x00067BAD]
--> backtrace[08] libvmacore.so[0x00314213]
--> backtrace[09] libvmacore.so[0x00299E51]
--> backtrace[10] libvmacore.so[0x0029BAFF]
--> backtrace[11] libvmacore.so[0x0037341A]
--> backtrace[12] libpthread.so.0[0x00007495]
--> backtrace[13] libc.so.6[0x000F02AF]
--> [backtrace end]



Environment

VMware vSphere Replication 8.x

Cause

  • This is due to issue with "vmDomainImpl.cpp" file which is called during the 'VirtualMachineReconfigure' task
  • Coredump generated during crash looks as below :
#0 0x00007f25fe527134 in raise () from /lib/libc.so.6
#1 0x00007f25fe528535 in abort () from /lib/libc.so.6
#2 0x00007f2609a1dc4f in Vmacore::System::SignalTerminateHandler (info=<optimized out>, ctx=<optimized out>) at bora/vim/lib/vmacore/posix/defSigHandlers.cpp:64
#3 <signal handler called>
#4 0x00007f25feb4f070 in __dynamic_cast () from /lib/libstdc++.so.6
#5 0x00007f2603e2480b in Dr::VmDomainImpl::ProcessVmOperationRetryOrFail(Dr::FuncWorkHandle<boost::function<void ()>, boost::function<void (Dr::ExceptionHolder const&)> >*, boost::function<void ()> const&, Dr::TypedMoRef<Vim::VirtualMachine> const&, std::string const&, unsigned int, int, Dr::ExceptionHolder const&) (this=0x1b51cc0, wh=0x7f2594058820, retryOpFunc=..., vmRef=..., vmOpName=..., numRetriesLeft=numRetriesLeft@entry=15, retryDelaySec=retryDelaySec@entry=15,
    ex=...) at ../../src/connection/vc/domains/vmDomainImpl.cpp:1958
#6 0x00007f2603e24f33 in Dr::VmDomainImpl::__lambda23::operator() (ex=..., __closure=0x7f25b4007270) at ../../src/connection/vc/domains/vmDomainImpl.cpp:2237
#7 boost::detail::function::void_function_obj_invoker1<Dr::VmDomainImpl::ReconfigureInt(Dr::FuncWorkHandle<boost::function<void()> >*, unsigned int, const VirtualMachineRef&, Vim::Vm::ConfigSpec*)::__lambda23, void, const Dr::ExceptionHolder&>::invoke(boost::detail::function::function_buffer &, const Dr::ExceptionHolder &) (function_obj_ptr=..., a0=...)
    at /build/mts/release/bora-14383137/compcache/cayman_boost/ob-9345926/linux64/include/boost/function/function_template.hpp:159
#8 0x00007f2603b3fff9 in boost::function1<void, Dr::ExceptionHolder const&>::operator() (this=<optimized out>, a0=...)
    at /build/mts/release/bora-14383137/compcache/cayman_boost/ob-9345926/linux64/include/boost/function/function_template.hpp:771
#9 0x00007f2603b3fea9 in boost::function0<void>::operator() (this=this@entry=0x7f25f408c0e0) at /build/mts/release/bora-14383137/compcache/cayman_boost/ob-9345926/linux64/include/boost/function/function_template.hpp:771
#10 0x00007f2603b3ff62 in Dr::Internal::PreserveLogContextTransform<void>::operator()(boost::function<void ()> const&) (fn=..., this=<optimized out>) at ../../public/common/logOpId.h:137
#11 boost::detail::function::void_function_obj_invoker1<Dr::Internal::PreserveLogContextTransform<void>, void, boost::function<void ()> const&>::invoke(boost::detail::function::function_buffer&, boost::function<void ()> const&) (
    function_obj_ptr=..., a0=...) at /build/mts/release/bora-14383137/compcache/cayman_boost/ob-9345926/linux64/include/boost/function/function_template.hpp:159
#12 0x00007f2603b562a9 in boost::function1<void, boost::function<void ()> const&>::operator()(boost::function<void ()> const&) const (this=this@entry=0x7f25940b6450, a0=...)
    at /build/mts/release/bora-14383137/compcache/cayman_boost/ob-9345926/linux64/include/boost/function/function_template.hpp:771
#13 0x00007f2603b5664c in Dr::Internal::FuncWrapper0<void, boost::function<void (Dr::ExceptionHolder const&)> >::operator()<Dr::ExceptionHolder>(Dr::ExceptionHolder const&) const (this=0x7f25940b6430, t0=...)
    at ../../public/functional/async/funcTransform.h:381
#14 0x00007f2603b3fff9 in boost::function1<void, Dr::ExceptionHolder const&>::operator() (this=<optimized out>, a0=...)
    at /build/mts/release/bora-14383137/compcache/cayman_boost/ob-9345926/linux64/include/boost/function/function_template.hpp:771

Resolution

This is a rare race condition and the code fix is implemented in SRM 8.2.0.2 and SRM 8.3 versions 

Upgrade to SRM 8.2.0.2 and above.