This KB is written to advise that this issue may occur, and direct you to reach out to VMware for assistance with resolving this issue.
Impact/Risks:If a failures to tolerate of 0 policy is in use, data is in a reduced redundancy state, or when multiple events occur before data resync or rebuild can take place, then this situation could potentially lead to a data unavailable or data loss scenario.
Symptoms:vSAN disk group is taken offline with the vmkernel log (/var/run/log/vmkernel.log) message similar to the below examples (note that specific dates, times, and IDs will be different for your environment):
Example 1:2020-05-21T15:22:38.514Z cpu1:1000341425)WARNING: PLOG: DDPCacheIOCb:686: Trying to format a valid metadata block, UUID ########-####-####-####-########9362, type 4, pbn 43980465156472020-05-21T15:22:38.514Z cpu0:1000214054)WARNING: PLOG: DDPCompleteDDPWrite:6455: Throttled: DDP write failed Invalid metadata callback [email protected]#0.0.0.1, diskgroup ########-####-####-####-########8e53 txnScopeIdx 02020-05-21T15:22:38.514Z cpu0:1000214054)PLOG: DDPCompleteDDPWrite:6469: Throttled: (DDPWrite): Curr: completeTask, Prev: updateHashmap, Status: Success2020-05-21T15:22:38.514Z cpu0:1000214054)WARNING: PLOG: PLOGDDPWriteCbFn:655: DDP write failed on device ########-####-####-####-########9362:Invalid metadata (ssdPerm: no)elevIo 0, doDdpCommit yes2020-05-21T15:22:38.514Z cpu1:1000213133)WARNING: PLOG: PLOGPropagateError:4232: DDP: Propagating error state from original device ########-####-####-####-########93622020-05-21T15:22:38.514Z cpu1:1000213133)WARNING: PLOG: PLOGPropagateError:4284: DDP: Propagating error state to MDs in device ########-####-####-####-########8e532020-05-21T15:22:38.514Z cpu1:1000213133)PLOG: PLOG_FindAndUpdateDevTelemetryStat:1058: Setting devResState : dev: mpx.vmhba0:C0:T4:L0 cState: 0 nState: 6 isLSE: 02020-05-21T15:22:38.514Z cpu1:1000213133)WARNING: PLOG: PLOGPropagateErrorInt:4172: Permanent error event on ########-####-####-####-########93622020-05-21T15:22:38.514Z cpu1:1000213133)PLOG: PLOG_FindAndUpdateDevTelemetryStat:1058: Setting devResState : dev: mpx.vmhba0:C0:T3:L0 cState: 7 nState: 7 isLSE: 02020-05-21T15:22:38.514Z cpu1:1000213133)WARNING: PLOG: PLOGPropagateErrorInt:4188: Error/unhealthy propagate event on ########-####-####-####-########b3d62020-05-21T15:22:38.514Z cpu1:1000213133)PLOG: PLOG_FindAndUpdateDevTelemetryStat:1058: Setting devResState : dev: mpx.vmhba0:C0:T6:L0 cState: 7 nState: 7 isLSE: 02020-05-21T15:22:38.514Z cpu1:1000213133)WARNING: PLOG: PLOGPropagateErrorInt:4188: Error/unhealthy propagate event on ########-####-####-####-########8e53Example 2:2020-05-21T16:36:22.055Z cpu0:1000341426)WARNING: PLOG: DDPCacheIOCb:686: Trying to format a valid metadata block, UUID ########-####-####-####-########bba0, type 3, pbn 32985349043462020-05-21T16:36:22.055Z cpu1:1000214313)WARNING: PLOG: DDPCompleteDDPWrite:6455: Throttled: DDP write failed Invalid metadata callback [email protected]#0.0.0.1, diskgroup ########-####-####-####-########4c6a txnScopeIdx 02020-05-21T16:36:22.055Z cpu1:1000214313)PLOG: DDPCompleteDDPWrite:6469: Throttled: (DDPWrite): Curr: completeTask, Prev: addNewHash, Status: Success2020-05-21T16:36:22.055Z cpu1:1000214313)WARNING: PLOG: PLOGDDPWriteCbFn:655: DDP write failed on device ########-####-####-####-########bba0:Invalid metadata (ssdPerm: no)elevIo 0, doDdpCommit yes2020-05-21T16:36:22.058Z cpu0:1000214307)PLOG: PLOGElevHandleFailure:2325: Waiting till we process failure ... dev ########-####-####-####-########bba02020-05-21T16:36:22.061Z cpu0:1000213234)WARNING: PLOG: PLOGPropagateError:4232: DDP: Propagating error state from original device ########-####-####-####-########bba02020-05-21T16:36:22.061Z cpu0:1000213234)WARNING: PLOG: PLOGPropagateError:4284: DDP: Propagating error state to MDs in device ########-####-####-####-########4c6a2020-05-21T16:36:22.061Z cpu0:1000213234)PLOG: PLOG_FindAndUpdateDevTelemetryStat:1058: Setting devResState : dev: mpx.vmhba0:C0:T4:L0 cState: 0 nState: 6 isLSE: 02020-05-21T16:36:22.061Z cpu0:1000213234)WARNING: PLOG: PLOGPropagateErrorInt:4172: Permanent error event on ########-####-####-####-########bba02020-05-21T16:36:22.061Z cpu0:1000213234)PLOG: PLOG_FindAndUpdateDevTelemetryStat:1058: Setting devResState : dev: mpx.vmhba0:C0:T3:L0 cState: 7 nState: 7 isLSE: 02020-05-21T16:36:22.061Z cpu0:1000213234)WARNING: PLOG: PLOGPropagateErrorInt:4188: Error/unhealthy propagate event on ########-####-####-####-########d4f32020-05-21T16:36:22.061Z cpu0:1000213234)PLOG: PLOG_FindAndUpdateDevTelemetryStat:1058: Setting devResState : dev: mpx.vmhba0:C0:T6:L0 cState: 7 nState: 7 isLSE: 02020-05-21T16:36:22.061Z cpu0:1000213234)WARNING: PLOG: PLOGPropagateErrorInt:4188: Error/unhealthy propagate event on ########-####-####-####-########4c6a2020-05-21T16:36:25.915Z cpu0:1000214307)PLOG: PLOGRelogBase:226: RELOG: relogTask exit requested2020-05-21T16:36:25.915Z cpu0:1000214307)PLOG: PLOGRelogExit:605: RELOG task exiting UUID ########-####-####-####-########4c6a SuccessExample 3:2020-05-21T16:56:00.941Z cpu1:1000341426)WARNING: PLOG: DDPCacheIOCb:686: Trying to format a valid metadata block, UUID ########-####-####-####-########fae9, type 5, pbn 54975581600572020-05-21T16:56:00.941Z cpu0:1000213922)WARNING: PLOG: DDPCompleteDDPWrite:6455: Throttled: DDP write failed Invalid metadata callback [email protected]#0.0.0.1, diskgroup ########-####-####-####-########cbee txnScopeIdx 02020-05-21T16:56:00.941Z cpu0:1000213922)PLOG: DDPCompleteDDPWrite:6469: Throttled: (DDPWrite): Curr: completeTask, Prev: readXmap, Status: Success2020-05-21T16:56:00.941Z cpu0:1000213922)WARNING: PLOG: PLOGDDPWriteCbFn:655: DDP write failed on device ########-####-####-####-########fae9:Invalid metadata (ssdPerm: no)elevIo 0, doDdpCommit yes2020-05-21T16:56:00.941Z cpu0:1000213916)PLOG: PLOGElevHandleFailure:2325: Waiting till we process failure ... dev ########-####-####-####-########fae92020-05-21T16:56:00.941Z cpu0:1000213152)WARNING: PLOG: PLOGPropagateError:4232: DDP: Propagating error state from original device ########-####-####-####-########fae92020-05-21T16:56:00.941Z cpu0:1000213152)WARNING: PLOG: PLOGPropagateError:4284: DDP: Propagating error state to MDs in device ########-####-####-####-########cbee2020-05-21T16:56:00.941Z cpu0:1000213152)PLOG: PLOG_FindAndUpdateDevTelemetryStat:1058: Setting devResState : dev: mpx.vmhba0:C0:T4:L0 cState: 0 nState: 6 isLSE: 02020-05-21T16:56:00.943Z cpu0:1000213152)WARNING: PLOG: PLOGPropagateErrorInt:4172: Permanent error event on ########-####-####-####-########fae92020-05-21T16:56:00.943Z cpu0:1000213152)PLOG: PLOG_FindAndUpdateDevTelemetryStat:1058: Setting devResState : dev: mpx.vmhba0:C0:T3:L0 cState: 7 nState: 7 isLSE: 02020-05-21T16:56:00.943Z cpu0:1000213152)WARNING: PLOG: PLOGPropagateErrorInt:4188: Error/unhealthy propagate event on ########-####-####-####-########17d02020-05-21T16:56:00.943Z cpu0:1000213152)PLOG: PLOG_FindAndUpdateDevTelemetryStat:1058: Setting devResState : dev: mpx.vmhba0:C0:T6:L0 cState: 7 nState: 7 isLSE: 02020-05-21T16:56:00.944Z cpu0:1000213152)WARNING: PLOG: PLOGPropagateErrorInt:4188: Error/unhealthy propagate event on ########-####-####-####-########cbee2020-05-21T16:56:04.066Z cpu0:1000213916)PLOG: PLOGRelogBase:226: RELOG: relogTask exit requested