VM in HPE SimpliVity Cluster fails to boot after migrating to another host, from a host displaying "SimpliVity Cluster Available Physical Capacity 10 Percent or Less" Alarm
search cancel

VM in HPE SimpliVity Cluster fails to boot after migrating to another host, from a host displaying "SimpliVity Cluster Available Physical Capacity 10 Percent or Less" Alarm

book

Article ID: 383759

calendar_today

Updated On: 12-11-2024

Products

VMware vSphere ESXi

Issue/Introduction

  • VMs in an ESXi host cluster using HPE SimpliVity NFS storage enter a failed state as:
    • VM is powered on but unresponsive
    • VM is powered off and unable to power on, or be modified

  • UI error message may vary but will typically show an error similar to:

    General System error occurred: Launch failure <timestamp> Out of resources ThrowableProxy.cause: Out of resources
    Unable to write VMX file: /vmfs/volumes/<DatastoreUUID>/<VM>/<VM>.vmx

  • An attempt to create a tmp directory in the VM's folder will fail with: No space left on device.

  • Examining /var/log/hostd.log on the ESXi host attempting to boot the VM, the following log message may appear:

    Hostlog_Flush: Failed to truncate hostlog /vmfs/volumes/<datatoreUUID>/<VM>/<vm>.hlog: No Space Left on Device

  • ESXi - /var/log/vmkwarning.log

    WARNING: Swap  3672: Failed to create swap file ' /vmfs/volumes/<datastoreUUID>/<VM>/<VM>.vswp': No space left on Device

Environment

VMware ESXi 7.0
VMware ESXi 8.0

Cause

This error can occur when HPE SimpliVity NFS storage runs out of logical space on the NFS cluster. The host may enter a state where it fills its Available Physical Capacity to its extent causing the host to be unable to issue any more writes to the NFS disk. Since the blocks remained locked, the VMs will remain in an error state until the NFS storage space issue is resolved.

Resolution

Normally there are utilities at the array level that allow for this space to be manually unmapped, but at this time HPE has no such utility.

VMware recommends evacuating the ESXi host having the problem and power it off completely (i.e.: perform a COLD BOOT) in order to ensure that any unmapped blocks are released by the host upon power off.

After the host reboot, any affected VMs should be able to power on. If multiple hosts are suffering from this condition, a rolling cluster reboot may need to be performed.