Clustered vmdk enabled datastore loses reservation randomly causing the clustered VMs lose access to the shared disk.
Cluster lost its PR keys; Reregistering
Following is logged in the vmkernel log:
YYYY-MM-DDThh:mm:ss cpuXX:xxxxxx)NMP: nmpCheckForMatchingKey:nnnn: Target reported more keys (0xnnn) than allocated slots: key nnnnxxxxnnnnxxxxYYYY-MM-DDThh:mm:ss cpuXX:xxxxxx)ScsiDevice: nnnnn: VMFS notified as a result of WEAR reservation removal for device:naa.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx. Reason: "Device Probe".YYYY-MM-DDThh:mm:ss cpuXX:xxxxxx)FDS: nnnn: FDS_AnnounceQuesceDevice called for device naa.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxYYYY-MM-DDThh:mm:ss cpuXX:xxxxxx)StorageDevice: nnnn: End path evaluation for device naa.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxYYYY-MM-DDThh:mm:ss cpuXX:xxxxxx)Vol3: nnn: SCSI3 reservation conflict detected on a closed or invalid volume
Over a SSH session to a host that is missing the datastore, run the following commands:
vmkfstools -L readkeys /vmfs/devices/disks/naa.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> | grep "Key " | awk '{print$4}' | wc -lvmkfstools -L readkeys /vmfs/devices/disks/naa.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> | grep "Key " | awk '{print$4}' | sort | uniq | wc -l
Sample output:
vmkfstools -L readkeys /vmfs/devices/disks/naa.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> | grep "Key " | awk '{print$4}' | wc -l192vmkfstools -L readkeys /vmfs/devices/disks/naa.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> | grep "Key " | awk '{print$4}' | sort | uniq | wc -l48
What this means is that the datastore is mapped to 48 hosts. There is a limit of 160 keys per device whereas we can see that the total keys on the device are exceeding this limit.
As a workaround, limit the number of hosts clustered vmdk datastores is presented (to about 5 hosts) and unmap these LUNs or unmount datastores from remaining hosts.
A fix is underway. It would be included in a future release.