- To stabilize the vSAN cluster and clean up the DISCARDED_COMPONENT
- Restart the EPD and CMMDS Arena services to resume normal vSAN operations
Symptoms:
- Cluster is partitioned and/or rapidly changing node membership.
- Sub-Cluster Membership Entry Revision rapidly incrementing due to nodes joining and leaving the cluster constantly.
- Commands querying CMMDS (e.g. cmmds-tool) will intermittently not function and/or return constantly changing information.
- Active resync not progressing
- High DISCARDED_COMPONENTS entries on multiple hosts
Note: DISCARDED_COMPONENTS are per-node counts and potentially only one node has a critical build-up of these.
In vsantraces logs you see the following messages:
2023-06-30T06:43:47.833657 [162009952] [cpu75] [] DOMTraceProcessSubscrEntry:1927: {'obj':0x45dc29f3cd80, 'objType': 'COMP', 'queryType': 35, 'numFiringSubscrs-24': 0, 'numRetrySubscrs-24': 51,
'subscrOp-32': 0x243fd8c0, 'subscrEntry-32': 0x2a3d68c0, 'queryUuid': 'db2dd163-cace-ac06-a423-b483510025bc', 'status': 'VMK_NO_MEMORY', 'isDisabled': False, 'isShared': False, 'isRetry': True, 'processTimeMs':
0, 'fetchesRun': 1, 'unmarshalsRun': 0, 'role': 'DOM_ROLE_COMPONENT_SERVER'}
In vmkernel.log2023-06-30T05:58:06.885Z cpu127:2099358)WARNING: exprmsh: Error unmarshaling structure CmmdsDiscardedComponentsEntry: Out of memory
2023-06-30T05:58:56.602Z cpu123:2099358)DOM: DOMComponentObjectDeletedEntryCb:12729: Failed to update DISCARDED_ENTRY entry for b22dd163-a815-ca68-cd35-b483510025bc: Out of memory
In cmmdsd.log2023-06-27T07:00:19.225Z 2099726 WARNING Traversing CMMDS entries returned error: Out of memory
2023-06-27T07:30:19.534Z 2099726 WARNING Traversing CMMDS entries returned error: Out of memory
2023-06-27T08:00:19.855Z 2099726 WARNING Traversing CMMDS entries returned error: Out of memory
In epd.log2023-06-26T17:20:24.763Z 4645130 PANIC: Unrecoverable memory allocation failure
2023-06-26T17:20:24.763Z 4645130 Backtrace:
2023-06-26T17:20:24.763Z 4645130 Backtrace[0] 0000030ecf9429a0 rip=000000fa299df98f rbx=0000030ecf9429a0 rbp=0000030ecf942dd0 r12=000000fa2a673788 r13=0000030ecf942de8 r14=000000f9e904e1e0 r15=000000f9e905c420
2023-06-27T16:00:46.740Z 4779858 Failed to dump core: Failure.
2023-06-27T16:00:46.740Z 4779858 Msg_Post: Error
2023-06-27T16:00:46.740Z 4779858 [msg.log.error.unrecoverable] VSAN CMMDS persistence daemon unrecoverable error: (epd)
2023-06-27T16:00:46.740Z 4779858 Unrecoverable memory allocation failure
2023-06-27T16:00:46.740Z 4779858 [msg.panic.requestSupport.withoutLog] You can request support.
2023-06-27T16:00:46.740Z 4779858 [msg.panic.requestSupport.vmSupport.vmx86]
2023-06-27T16:00:46.740Z 4779858 To collect data to submit to VMware technical support, run "vm-support".
2023-06-27T16:00:46.740Z 4779858 [msg.panic.response] We will respond on the basis of your support entitlement.
2023-06-27T16:00:46.740Z 4779858 ----------------------------------------