Clomd service crashes on host and a restart fails to keep the service running
search cancel

Clomd service crashes on host and a restart fails to keep the service running

book

Article ID: 318128

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

Symptoms:
In clomd.log you see:

2022-10-14T16:17:26.456Z PANIC: NOT_REACHED bora/lib/vsan/vsan_config_builder.c:744
2022-10-14T16:17:26.456Z Backtrace:
2022-10-14T16:17:26.456Z Backtrace[0] 0000030b4742c6a0 rip=000000bf0c7de98f rbx=0000030b4742c6a0 rbp=0000030b4742cad0 r12=000000bf0d677788 r13=0000030b4742cae8 r14=000000bf14ce052c r15=000000bf14ce3c2c
2022-10-14T16:17:26.456Z Backtrace[1] 0000030b4742cae0 rip=000000bf0c7dea5b rbx=000000bf14ce0544 rbp=0000030b4742cbb0 r12=000000becbd3e3e0 r13=0000000000000000 r14=000000bf14ce052c r15=000000bf14ce3c2c
2022-10-14T16:17:26.456Z Backtrace[2] 0000030b4742cbc0 rip=000000becb780fab rbx=000000bf14ce0544 rbp=0000030b4742cbc0 r12=000000becbd3e3e0 r13=0000000000000000 r14=000000bf14ce052c r15=000000bf14ce3c2c
2022-10-14T16:17:26.456Z Backtrace[3] 0000030b4742cbd0 rip=000000becb770311 rbx=000000bf14ce0544 rbp=0000030b4742cfe0 r12=000000becbd3e3e0 r13=0000000000000000 r14=000000bf14ce052c r15=000000bf14ce3c2c
2022-10-14T16:17:26.456Z Backtrace[4] 0000030b4742cff0 rip=000000becb77390e rbx=000000becbd3e3e0 rbp=0000030b4742d400 r12=000000bf14cc2cd0 r13=000000bf14cc31d0 r14=0000000000000000 r15=000000becbd3e4f8
2022-10-14T16:17:26.456Z Backtrace[5] 0000030b4742d410 rip=000000becb6ab245 rbx=000000becbd3e3e0 rbp=0000030b4742d850 r12=0000000000000000 r13=0000030b4742d450 r14=0000000000000001 r15=000000becbd3d9a0
2022-10-14T16:17:26.456Z Backtrace[6] 0000030b4742d860 rip=000000becb6abb24 rbx=000000000000000f rbp=0000030b4742d8d0 r12=0000000000000002 r13=000000becbd3d9a0 r14=0000030b4742dd70 r15=000000becbd3d9a0
2022-10-14T16:17:26.456Z Backtrace[7] 0000030b4742d8e0 rip=000000becb6ac6d8 rbx=0000000000000000 rbp=0000030b4742e920 r12=00000000000000b8 r13=000000000000005e r14=0000030b4742dd70 r15=000000becbd3d9a0
2022-10-14T16:17:26.456Z Backtrace[8] 0000030b4742e930 rip=000000becb6ae2d5 rbx=000000becbd3d9a0 rbp=0000030b4742e980 r12=0000003fc0000000 r13=0000030b4742e9e8 r14=0000000000000003 r15=000000bf1435a000
2022-10-14T16:17:26.456Z Backtrace[9] 0000030b4742e990 rip=000000becb6f0834 rbx=000000becbd3d9a0 rbp=0000030b4742ea30 r12=000000becbd3d9b8 r13=0000000000000000 r14=0000030b4742e9f0 r15=000000bf1435a000
2022-10-14T16:17:26.456Z Backtrace[10] 0000030b4742ea40 rip=000000becb6fe063 rbx=000000bece088550 rbp=0000030b4742ea50 r12=0000000000000000 r13=000000bf14bfe010 r14=0000009d7e9c0d57 r15=0000009d7e9c0587
2022-10-14T16:17:26.456Z Backtrace[11] 0000030b4742ea60 rip=000000becb77dc27 rbx=000000bece157d20 rbp=0000030b4742ea70 r12=0000000000000000 r13=000000bf14bfe010 r14=0000009d7e9c0d57 r15=0000009d7e9c0587
2022-10-14T16:17:26.456Z Backtrace[12] 0000030b4742ea80 rip=000000becb77e8cc rbx=000000bece157d20 rbp=0000030b4742eaf0 r12=0000000000000000 r13=000000bf14bfe010 r14=0000009d7e9c0d57 r15=0000009d7e9c0587
2022-10-14T16:17:26.456Z Backtrace[13] 0000030b4742eb00 rip=000000becb67fd2a rbx=0000000000000000 rbp=0000030b4742ec60 r12=0000000000000000 r13=0000030b4742ebe0 r14=0000000000002328 r15=000000becba10020
2022-10-14T16:17:26.456Z Backtrace[14] 0000030b4742ec70 rip=000000bf0d2ebd5d rbx=0000000000000000 rbp=0000000000000000 r12=000000becb6807f4 r13=0000030b4742ed40 r14=0000000000000000 r15=0000000000000000
2022-10-14T16:17:26.456Z Backtrace[15] 0000030b4742ed30 rip=000000becb68081d rbx=0000000000000000 rbp=0000000000000000 r12=000000becb6807f4 r13=0000030b4742ed40 r14=0000000000000000 r15=0000000000000000
2022-10-14T16:17:26.456Z Backtrace[16] 0000030b4742ed38 rip=0000000000000000 rbx=0000000000000000 rbp=0000000000000000 r12=000000becb6807f4 r13=0000030b4742ed40 r14=0000000000000000 r15=0000000000000000
2022-10-14T16:17:26.456Z SymBacktrace[0] 0000030b4742c6a0 rip=000000bf0c7de98f in function Panic_Panic in object /lib64/libvmlibs.so loaded at 000000bf0c638000
2022-10-14T16:17:26.456Z SymBacktrace[3] 0000030b4742cbd0 rip=000000becb770311 in function (null) in object /usr/lib/vmware/vsan/bin/clomd loaded at 000000becb666000
2022-10-14T16:17:26.456Z SymBacktrace[4] 0000030b4742cff0 rip=000000becb77390e in function (null) in object /usr/lib/vmware/vsan/bin/clomd loaded at 000000becb666000
2022-10-14T16:17:26.456Z SymBacktrace[5] 0000030b4742d410 rip=000000becb6ab245 in function (null) in object /usr/lib/vmware/vsan/bin/clomd loaded at 000000becb666000
2022-10-14T16:17:26.456Z SymBacktrace[6] 0000030b4742d860 rip=000000becb6abb24 in function (null) in object /usr/lib/vmware/vsan/bin/clomd loaded at 000000becb666000
2022-10-14T16:17:26.456Z SymBacktrace[7] 0000030b4742d8e0 rip=000000becb6ac6d8 in function (null) in object /usr/lib/vmware/vsan/bin/clomd loaded at 000000becb666000
2022-10-14T16:17:26.456Z SymBacktrace[8] 0000030b4742e930 rip=000000becb6ae2d5 in function (null) in object /usr/lib/vmware/vsan/bin/clomd loaded at 000000becb666000
2022-10-14T16:17:26.456Z SymBacktrace[9] 0000030b4742e990 rip=000000becb6f0834 in function (null) in object /usr/lib/vmware/vsan/bin/clomd loaded at 000000becb666000
2022-10-14T16:17:26.456Z SymBacktrace[10] 0000030b4742ea40 rip=000000becb6fe063 in function (null) in object /usr/lib/vmware/vsan/bin/clomd loaded at 000000becb666000
2022-10-14T16:17:26.456Z SymBacktrace[11] 0000030b4742ea60 rip=000000becb77dc27 in function (null) in object /usr/lib/vmware/vsan/bin/clomd loaded at 000000becb666000
2022-10-14T16:17:26.456Z SymBacktrace[12] 0000030b4742ea80 rip=000000becb77e8cc in function (null) in object /usr/lib/vmware/vsan/bin/clomd loaded at 000000becb666000
2022-10-14T16:17:26.456Z SymBacktrace[13] 0000030b4742eb00 rip=000000becb67fd2a in function (null) in object /usr/lib/vmware/vsan/bin/clomd loaded at 000000becb666000
2022-10-14T16:17:26.457Z SymBacktrace[14] 0000030b4742ec70 rip=000000bf0d2ebd5d in function __libc_start_main in object /lib64/libc.so.6 loaded at 000000bf0d2ca000
2022-10-14T16:17:26.457Z SymBacktrace[15] 0000030b4742ed30 rip=000000becb68081d in function (null) in object /usr/lib/vmware/vsan/bin/clomd loaded at 000000becb666000
2022-10-14T16:17:26.457Z SymBacktrace[16] 0000030b4742ed38 rip=0000000000000000
2022-10-14T16:17:26.457Z Failed to dump core: Failure.
2022-10-14T16:17:26.457Z Msg_Post: Error
2022-10-14T16:17:26.457Z [msg.log.error.unrecoverable] vSAN Cluster level Object Manager unrecoverable error: (host-5028003)
2022-10-14T16:17:26.457Z NOT_REACHED bora/lib/vsan/vsan_config_builder.c:744
2022-10-14T16:17:26.457Z [msg.panic.requestSupport.withoutLog] You can request support.
2022-10-14T16:17:26.457Z [msg.panic.requestSupport.vmSupport.vmx86]
2022-10-14T16:17:26.457Z To collect data to submit to VMware technical support, run "vm-support".
2022-10-14T16:17:26.457Z [msg.panic.response] We will respond on the basis of your support entitlement.
2022-10-14T16:17:26.457Z ----------------------------------------
2022-10-14T16:17:26.457Z Exiting

Environment

VMware vSAN 7.0.x
VMware vSAN 8.0.x

Cause

During object format change, config has both the old and new layout but here the old layout was already partially cleaned up, leaving the config in an invalid state and because of this, we observe the crash whenever clom tries to process that object as part of reconfigurations other than cleanup.

In most of the cases we have observed crashes during VOTES_REBALANCE since VOTES_REBALANCE workItem has a higher priority, though clom was posting CLEANUP workitem it was not getting processed causing clomd to crash.

Resolution

Upgrade vCenter/ESXi to one of the below versions:
8.0a or higher for 8.x code
7.0U3i or higher for 7.x code

Additional Information

Impact/Risks:
clomd crashes and fails to start

Attachments

forceReconfig.py get_app