This KB is written to advise that this issue may occur, and direct you to reach out to VMware for assistance with resolving this issue.
If a failures to tolerate of 0 policy is in use, data is in a reduced redundancy state, or multiple events occur before data resync or rebuild can occur, then this could lead to a potential data unavailable or data loss scenario.
Symptoms:a vSAN disk group is taken offline with the vmkernel log message similar to the below examples (note that specific dates, times, and IDs will be different for your environment):
Example 1:2020-05-21T15:22:38.514Z cpu1:1000341425)WARNING: PLOG: DDPCacheIOCb:686: Trying to format a valid metadata block, UUID ########-####-####-####-########9362, type 4, pbn 4398046515647
2020-05-21T15:22:38.514Z cpu0:1000214054)WARNING: PLOG: DDPCompleteDDPWrite:6455: Throttled: DDP write failed Invalid metadata callback
[email protected]#0.0.0.1, diskgroup ########-####-####-####-########8e53 txnScopeIdx 0
2020-05-21T15:22:38.514Z cpu0:1000214054)PLOG: DDPCompleteDDPWrite:6469: Throttled: (DDPWrite): Curr: completeTask, Prev: updateHashmap, Status: Success
2020-05-21T15:22:38.514Z cpu0:1000214054)WARNING: PLOG: PLOGDDPWriteCbFn:655: DDP write failed on device ########-####-####-####-########9362:Invalid metadata (ssdPerm: no)elevIo 0, doDdpCommit yes
2020-05-21T15:22:38.514Z cpu1:1000213133)WARNING: PLOG: PLOGPropagateError:4232: DDP: Propagating error state from original device ########-####-####-####-########9362
2020-05-21T15:22:38.514Z cpu1:1000213133)WARNING: PLOG: PLOGPropagateError:4284: DDP: Propagating error state to MDs in device ########-####-####-####-########8e53
2020-05-21T15:22:38.514Z cpu1:1000213133)PLOG: PLOG_FindAndUpdateDevTelemetryStat:1058: Setting devResState : dev: mpx.vmhba0:C0:T4:L0 cState: 0 nState: 6 isLSE: 0
2020-05-21T15:22:38.514Z cpu1:1000213133)WARNING: PLOG: PLOGPropagateErrorInt:4172: Permanent error event on ########-####-####-####-########9362
2020-05-21T15:22:38.514Z cpu1:1000213133)PLOG: PLOG_FindAndUpdateDevTelemetryStat:1058: Setting devResState : dev: mpx.vmhba0:C0:T3:L0 cState: 7 nState: 7 isLSE: 0
2020-05-21T15:22:38.514Z cpu1:1000213133)WARNING: PLOG: PLOGPropagateErrorInt:4188: Error/unhealthy propagate event on ########-####-####-####-########b3d6
2020-05-21T15:22:38.514Z cpu1:1000213133)PLOG: PLOG_FindAndUpdateDevTelemetryStat:1058: Setting devResState : dev: mpx.vmhba0:C0:T6:L0 cState: 7 nState: 7 isLSE: 0
2020-05-21T15:22:38.514Z cpu1:1000213133)WARNING: PLOG: PLOGPropagateErrorInt:4188: Error/unhealthy propagate event on ########-####-####-####-########8e53
Example 2:2020-05-21T16:36:22.055Z cpu0:1000341426)WARNING: PLOG: DDPCacheIOCb:686: Trying to format a valid metadata block, UUID ########-####-####-####-########bba0, type 3, pbn 3298534904346
2020-05-21T16:36:22.055Z cpu1:1000214313)WARNING: PLOG: DDPCompleteDDPWrite:6455: Throttled: DDP write failed Invalid metadata callback
[email protected]#0.0.0.1, diskgroup ########-####-####-####-########4c6a txnScopeIdx 0
2020-05-21T16:36:22.055Z cpu1:1000214313)PLOG: DDPCompleteDDPWrite:6469: Throttled: (DDPWrite): Curr: completeTask, Prev: addNewHash, Status: Success
2020-05-21T16:36:22.055Z cpu1:1000214313)WARNING: PLOG: PLOGDDPWriteCbFn:655: DDP write failed on device ########-####-####-####-########bba0:Invalid metadata (ssdPerm: no)elevIo 0, doDdpCommit yes
2020-05-21T16:36:22.058Z cpu0:1000214307)PLOG: PLOGElevHandleFailure:2325: Waiting till we process failure ... dev ########-####-####-####-########bba0
2020-05-21T16:36:22.061Z cpu0:1000213234)WARNING: PLOG: PLOGPropagateError:4232: DDP: Propagating error state from original device ########-####-####-####-########bba0
2020-05-21T16:36:22.061Z cpu0:1000213234)WARNING: PLOG: PLOGPropagateError:4284: DDP: Propagating error state to MDs in device ########-####-####-####-########4c6a
2020-05-21T16:36:22.061Z cpu0:1000213234)PLOG: PLOG_FindAndUpdateDevTelemetryStat:1058: Setting devResState : dev: mpx.vmhba0:C0:T4:L0 cState: 0 nState: 6 isLSE: 0
2020-05-21T16:36:22.061Z cpu0:1000213234)WARNING: PLOG: PLOGPropagateErrorInt:4172: Permanent error event on ########-####-####-####-########bba0
2020-05-21T16:36:22.061Z cpu0:1000213234)PLOG: PLOG_FindAndUpdateDevTelemetryStat:1058: Setting devResState : dev: mpx.vmhba0:C0:T3:L0 cState: 7 nState: 7 isLSE: 0
2020-05-21T16:36:22.061Z cpu0:1000213234)WARNING: PLOG: PLOGPropagateErrorInt:4188: Error/unhealthy propagate event on ########-####-####-####-########d4f3
2020-05-21T16:36:22.061Z cpu0:1000213234)PLOG: PLOG_FindAndUpdateDevTelemetryStat:1058: Setting devResState : dev: mpx.vmhba0:C0:T6:L0 cState: 7 nState: 7 isLSE: 0
2020-05-21T16:36:22.061Z cpu0:1000213234)WARNING: PLOG: PLOGPropagateErrorInt:4188: Error/unhealthy propagate event on ########-####-####-####-########4c6a
2020-05-21T16:36:25.915Z cpu0:1000214307)PLOG: PLOGRelogBase:226: RELOG: relogTask exit requested
2020-05-21T16:36:25.915Z cpu0:1000214307)PLOG: PLOGRelogExit:605: RELOG task exiting UUID ########-####-####-####-########4c6a Success
Example 3:2020-05-21T16:56:00.941Z cpu1:1000341426)WARNING: PLOG: DDPCacheIOCb:686: Trying to format a valid metadata block, UUID ########-####-####-####-########fae9, type 5, pbn 5497558160057
2020-05-21T16:56:00.941Z cpu0:1000213922)WARNING: PLOG: DDPCompleteDDPWrite:6455: Throttled: DDP write failed Invalid metadata callback
[email protected]#0.0.0.1, diskgroup ########-####-####-####-########cbee txnScopeIdx 0
2020-05-21T16:56:00.941Z cpu0:1000213922)PLOG: DDPCompleteDDPWrite:6469: Throttled: (DDPWrite): Curr: completeTask, Prev: readXmap, Status: Success
2020-05-21T16:56:00.941Z cpu0:1000213922)WARNING: PLOG: PLOGDDPWriteCbFn:655: DDP write failed on device ########-####-####-####-########fae9:Invalid metadata (ssdPerm: no)elevIo 0, doDdpCommit yes
2020-05-21T16:56:00.941Z cpu0:1000213916)PLOG: PLOGElevHandleFailure:2325: Waiting till we process failure ... dev ########-####-####-####-########fae9
2020-05-21T16:56:00.941Z cpu0:1000213152)WARNING: PLOG: PLOGPropagateError:4232: DDP: Propagating error state from original device ########-####-####-####-########fae9
2020-05-21T16:56:00.941Z cpu0:1000213152)WARNING: PLOG: PLOGPropagateError:4284: DDP: Propagating error state to MDs in device ########-####-####-####-########cbee
2020-05-21T16:56:00.941Z cpu0:1000213152)PLOG: PLOG_FindAndUpdateDevTelemetryStat:1058: Setting devResState : dev: mpx.vmhba0:C0:T4:L0 cState: 0 nState: 6 isLSE: 0
2020-05-21T16:56:00.943Z cpu0:1000213152)WARNING: PLOG: PLOGPropagateErrorInt:4172: Permanent error event on ########-####-####-####-########fae9
2020-05-21T16:56:00.943Z cpu0:1000213152)PLOG: PLOG_FindAndUpdateDevTelemetryStat:1058: Setting devResState : dev: mpx.vmhba0:C0:T3:L0 cState: 7 nState: 7 isLSE: 0
2020-05-21T16:56:00.943Z cpu0:1000213152)WARNING: PLOG: PLOGPropagateErrorInt:4188: Error/unhealthy propagate event on ########-####-####-####-########17d0
2020-05-21T16:56:00.943Z cpu0:1000213152)PLOG: PLOG_FindAndUpdateDevTelemetryStat:1058: Setting devResState : dev: mpx.vmhba0:C0:T6:L0 cState: 7 nState: 7 isLSE: 0
2020-05-21T16:56:00.944Z cpu0:1000213152)WARNING: PLOG: PLOGPropagateErrorInt:4188: Error/unhealthy propagate event on ########-####-####-####-########cbee
2020-05-21T16:56:04.066Z cpu0:1000213916)PLOG: PLOGRelogBase:226: RELOG: relogTask exit requested