- VCHA failover occurrence repeatedly more than 90 days after deploying vcha configuration.
- From /var/log/vmware/vcha/vcha.log on previous active node, this failover initiated by vMon.
[DATE/TIME] info vcha[03167] [Originator@6876 sub=Agent] Triggered vMon initiated failover[DATE/TIME] info vcha[02995] [Originator@6876 sub=Agent] Processing event kVmonFailover[DATE/TIME] info vcha[02995] [Originator@6876 sub=Agent] vMon initiated failover
- From /var/log/vmware/vmon/vmon log, there is crashing vpxd service on previous active node.
[DATE/TIME] Wa(03) host-2528 <vpxd> Service exited. Exit code 1[DATE/TIME] Wa(03) host-2528 <vpxd> Service exited unexpectedly. Crash count 0. Taking configured recovery action.[DATE/TIME] In(05) host-2528 SOCKET creating new socket, connecting to /storage/vmware-vmon/vchalistener[DATE/TIME] In(05) host-2528 <vpxd> Initiated VCHA failover for service.
- In /var/core directory, there are vpxd core dump generated on active/passive nodes.
(Sometimes there are many dump files due to failing over many times.)
-rw-rw-r-- 1 [USER] [GROUP] 824M [DATE/TIME] core.vpxd-worker.108474-rw-rw-r-- 1 [USER] [GROUP] 918M [DATE/TIME] core.vpxd-worker.114953-rw-rw-r-- 1 [USER] [GROUP] 1.3G [DATE/TIME] core.vpxd-worker.98069
- From the backtrace of core dump, it's similar with this.
You can check there is 'GetNetworkIfaceInfoPeer' function like in frame #9
(gdb) bt
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:49
#1 0x00007f88fba30546 in __GI_abort () at abort.c:79
#2 0x00007f8901d062c2 in Vmacore::System::SignalTerminateHandler (info=0x7f88f9f535b0, ctx=0x7f88f9f53480) at bora/vim/lib/vmacore/posix/defSigHandlers.cpp:62
#3 <signal handler called>
#4 std::char_traits<char>::copy (__n=37, __s2=0x90 <error: Cannot access memory at address 0x90>, __s1=0x7f882c02dec0 "}o\222ԏ\177")
at external/cayman_esx_toolchain_gcc12/usr/bin/../lib/gcc/x86_64-vmk-linux-gnu/12.1.0/../../../../x86_64-vmk-linux-gnu/include/c++/12.1.0/bits/char_traits.h:431
#5 std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_S_copy (__d=0x7f882c02dec0 "}o\222ԏ\177", __s=0x90 <error: Cannot access memory at address 0x90>, __n=37)
at external/cayman_esx_toolchain_gcc12/usr/bin/../lib/gcc/x86_64-vmk-linux-gnu/12.1.0/../../../../x86_64-vmk-linux-gnu/include/c++/12.1.0/bits/basic_string.h:423
#6 0x000056553a43103e in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_assign (this=0x7f88f9f54038, __str=<error: Cannot access memory at address 0x90>)
at external/cayman_esx_toolchain_gcc12/usr/bin/../lib/gcc/x86_64-vmk-linux-gnu/12.1.0/../../../../x86_64-vmk-linux-gnu/include/c++/12.1.0/bits/basic_string.h:234
#7 0x000056553b519e37 in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::assign (__str=<error: Cannot access memory at address 0x90>, this=0x7f88f9f54038)
at external/cayman_esx_toolchain_gcc12/usr/bin/../lib/gcc/x86_64-vmk-linux-gnu/12.1.0/../../../../x86_64-vmk-linux-gnu/include/c++/12.1.0/bits/basic_string.h:1571
#8 std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::operator= (__str=<error: Cannot access memory at address 0x90>, this=0x7f88f9f54038)
at external/cayman_esx_toolchain_gcc12/usr/bin/../lib/gcc/x86_64-vmk-linux-gnu/12.1.0/../../../../x86_64-vmk-linux-gnu/include/c++/12.1.0/bits/basic_string.h:805
#9 Vpxd::Vcha::GetNetworkIfaceInfoPeer (peerIp="[WITNESS_IP]", ifName="eth0", nicInfo=...) at bora/vpx/vpxd/vcha/utils.cpp:448
#10 0x000056553b511e30 in Vpxd::Vcha::PopulatePlacement (info=std::shared_ptr<Com::Vmware::Vcenter::Vcha::ClusterSvc::Info> (use count 1, weak count 0) = {...}, configInfo=<optimized out>,
vcAccess=0x7f882c2c7920, partial=...)
at external/cayman_esx_toolchain_gcc12/usr/bin/../lib/gcc/x86_64-vmk-linux-gnu/12.1.0/../../../../x86_64-vmk-linux-gnu/include/c++/12.1.0/bits/new_allocator.h:80
#11 Vpxd::Vcha::FailoverClusterOperator::GetClusterInfo (this=<optimized out>, vcSpec=..., partial=...,
activation=std::shared_ptr<Vapi::Core::AsyncActivation> (use count 3, weak count 0) = {...},
resultInfo=std::shared_ptr<Com::Vmware::Vcenter::Vcha::ClusterSvc::Info> (use count 1, weak count 0) = {...}) at bora/vpx/vpxd/vcha/failoverClusterOperator.cpp:2277
#12 0x000056553b4f2117 in Vpxd::Vcha::ClusterSvc::AsyncClusterImpl::Get (this=<optimized out>, vcSpec=..., partial=...,
activation_=std::shared_ptr<Vapi::Core::AsyncActivation> (use count 3, weak count 0) = {...}, resultCb_=...) at bora/vpx/vpxd/vcha/AsyncClusterImpl.cpp:159
<vcha>
...
<witness>
<ip>[WITNESS_IP]</ip>
<uuid>Changing password for root.
########-####-####-####-##########</uuid>
</witness>
<witnessIP>[WITNESS_IP]</witnessIP>
(Please change [WITNESS_IP] value to real ip of witness node when doing the test.)
# ssh vcha@[WITNESS_IP] -i /home/vcha/.ssh/id_rsa
FIPS mode initialized
VMware vCenter Server 8.0.1.00000
Type: vCenter Server with an embedded Platform Services Controller
Last login:
sudo: Account or password is expired, reset your password and try again
Changing password for root.
Current password:
vCenter 8.x
VCHA replicates the whole /etc directory.
This means /etc/shadow is synced from the active to the passive and therefore any changes to the password
(including changing dates for min age, max age, warn age etc.) are all replicated to the passive node. This way if the root account is not expired on the active,
it will not be expired on the passive either and we will only see this issue if the root password expires on the active and therefore it also expires on the passive.
The witness node on the other hand does not sync with the active node and it therefore does not receive any updates to the password i.e. it does not get the updated /etc/shadow file.
This means that despite the root password being updated on the active node, it will not be updated on the witness node and hence the password will eventually become expired on the witness node.
This issue has been fixed in vCenter Server 8.0 Update 3e.
For alternative workaround of this issue.
# sudo chage -I -1 -m 0 -M 99999 -E -1 root# ssh vcha@[WITNESS_IP] -i /home/vcha/.ssh/id_rsa
VCHA is using private key for communication on each node so updating witness root password doesn't affect VCHA fuction.