Use one of these methods to resolve the issue:
Disable OnRetryErrors using an SATP claim rule (Preferred method)Disable OnRetryErrors for existing devices (quick fix - non-preferred method)
Disable OnRetryErrors using an SATP claim rule (Preferred method)
Create an SATP claim rule to change the default behavior of action_OnRetryErrors to "off". This setting will need to be applied to every ESXi 6.7 host that has ALUA based storage array Luns mapped to it.
- Add the claimrule with an option to disable OnRetryErrors.
esxcli storage nmp satp rule add -V COMPELNT -P VMW_PSP_RR -s VMW_SATP_ALUA -o disable_action_OnRetryErrors
- Reload the claim rules to enforce the change:
esxcli storage core claimrule load
Note: The change will take effect immediately for any new LUNs presented but requires a reboot in order to reclaim existing storage devices with the new ruleset.
- List the claimrule table to confirm the changes are there:
esxcli storage nmp satp rule list |grep -i comp
Example output:
VMW_SATP_ALUA COMPELNT disable_action_OnRetryErrors user VMW_PSP_RR
- To verify the setting against a device, capture the naa id of the device and run this command:
esxcli storage nmp device list | grep -A2 naa.deviceIDhere
For example:
Device Display Name: COMPELNT Fibre Channel Disk (naa.60000000000000000000000000000000)
Storage Array Type: VMW_SATP_ALUA
Storage Array Type Device Config: {implicit_support=on; explicit_support=off; explicit_allow=on; alua_followover=on; action_OnRetryErrors=off; {TPG_id=61445,TPG_state=AO}{TPG_id=61446,TPG_state=AO}}
Disable OnRetryErrors for existing devices (quick fix - non-preferred method)
Disable this setting for existing devices on a live host without a reboot.
- Run the following to get a list of all ALUA based storage array and validate the current setting.
esxcli storage nmp device list | grep -A2 COMPELNT
Example output:
Device Display Name: COMPELNT Fibre Channel Disk (naa.60000000000000000000000000000000)
Storage Array Type: VMW_SATP_ALUA
Storage Array Type Device Config: {implicit_support=on; explicit_support=off; explicit_allow=on; alua_followover=on; action_OnRetryErrors=on; {TPG_id=61445,TPG_state=AO}{TPG_id=61446,TPG_state=AO}}
- Extract the naa device names where appropriate and run the following to change OnRetryErrors to "off" on a per-device per-host basis.
esxcli storage nmp satp generic deviceconfig set -c disable_action_OnRetryErrors -d naa.xxx
- Once all devices are configured, go back and perform step 2a to validate the change worked.
- Repeat steps 1-3 for the remaining hosts.
Note: No reboot is required for these changes to take effect. additionally, the SATP claimrule workaround will overwrite this one upon reboot.