Virtual machine deployment fails with "Unable to Write VMX File" error on NFS datastore due to All Paths Down(APD)
search cancel

Virtual machine deployment fails with "Unable to Write VMX File" error on NFS datastore due to All Paths Down(APD)

book

Article ID: 394147

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Symptoms:

  • While deploying Virtual Machine on NFS datastore getting error - 
    Unable to write VMX file: /vmfs/volumes/<NFS Datastore>/VMFolder/VM.vmx. An error occurred while syncing configuration file "/vmfs/volumes/<NFS Datastore>/VMFolder/VM.vmx~": 5 (Input/output error).

Validation:

  • The ESXi host experienced an All Paths Down (APD) condition, leading to the disconnection of the NFS datastore from the storage array. This was observed in the /var/run/log/vmkernel.log and /var/run/log/vobd.log file, where a series of events indicated the loss and restoration of connection to the NFS server. The logs highlight the issue with NFS communication between the ESXi host and the NFS server, resulting in potential data loss due to incomplete file operations.

    /var/run/log/vmkernel.log
    
    YYYY-MM-DDTHH:MM:SS.447Z cpu2:2097622)StorageApdHandlerEv: 110: Device or filesystem with identifier [<NFS UUID>] has entered the All Paths Down state.
    YYYY-MM-DDTHH:MM:SS.448Z cpu7:2100001 opID=5b442ffb)World: 12077: VC opID sps-Main-######-###-#####-##-##-#### maps to vmkernel opID 5b442ffb
    YYYY-MM-DDTHH:MM:SS.448Z cpu7:2100001 opID=5b442ffb)SunRPC: 3291: Synchronous RPC cancel for client #x############ IP <NFS Target IP>.8.1 proc 3 xid #x######## attempt 1 of 3
    YYYY-MM-DDTHH:MM:SS.448Z cpu6:2098511)WARNING: NFS: 338: Lost connection to the server <server name> mount point <nfs mount point>, mounted as <NFS UUID> ("<NFS Datastore Name>")
    YYYY-MM-DDTHH:MM:SS.510Z cpu48:2102161 opID=85a04160)ALERT: BC: 3177: File TEST.vmx~ closed with dirty buffers. Possible data loss.
    YYYY-MM-DDTHH:MM:SS.937Z cpu48:2098513)NFS: 347: Restored connection to the server <server name> mount point <NFS Mount point>, mounted as <NFS UUID> ("<NFS Datastore Name>")
    
    /var/run/log/vobd.log
    
    YYYY-MM-DDTHH:MM:SS.376Z: [APDCorrelator] 5521000780us: [vob.storage.apd.start] Device or filesystem with identifier [<NFS UUID>] has entered the All Paths Down state.
    YYYY-MM-DDTHH:MM:SS.376Z: [APDCorrelator] 5521103372us: [esx.problem.storage.apd.start] Device or filesystem with identifier [<NFS UUID>] has entered the All Paths Down state.
    YYYY-MM-DDTHH:MM:SS.379Z: [vmfsCorrelator] 5605001637us: [vob.vmfs.nfs.server.disconnect] Lost connection to the server <server name> mount point <NFS Mount point>, mounted as <NFS UUID> ("<NFS Datastore Name>")
    YYYY-MM-DDTHH:MM:SS.379Z: [vmfsCorrelator] 5605105979us: [esx.problem.vmfs.nfs.server.disconnect] <server name> <NFS Mount point> <NFS UUID> <NFS Datastore Name>
    YYYY-MM-DDTHH:MM:SS.380Z: [APDCorrelator] 5661001644us: [vob.storage.apd.timeout] Device or filesystem with identifier [<NFS UUID>] has entered the All Paths Down Timeout state after being in the All Paths Down state for 140 seconds. I/Os will now be fast failed.

Environment

VMware vSphere ESXi 7.x
VMware vSphere ESXi 8.x

Cause

  • This issue occurs due to a mismatch in MTU (Maximum Transmission Unit) settings along the network path used by the ESXi host to reach the NFS storage.
  • If the path includes components configured with conflicting MTU sizes (e.g., 1500 vs. 9000), it can result in packet loss or dropped connections, leading to APD (All Paths Down) events and failure to write VM configuration files (.vmx)

Resolution

  1. Validate Jumbo Frame Connectivity

    Run the following command from the affected ESXi host:

    vmkping -I vmkX -s 8972 -d <NFS_Target_IP> -c 30

    • Replace vmkX with the VMkernel interface used for NFS traffic.

    • If the test results in 100% packet loss, it confirms an MTU mismatch.

  2. Remediate MTU Configuration

    Choose one of the following based on the environment's capability and design:

    • Option 1: Lower the MTU on the ESXi host's VMkernel and associated vSwitch to 1500 bytes.

    • Option 2: Ensure jumbo frames (MTU 9000) are consistently configured end-to-end, including:

      • ESXi VMkernel interfaces

      • vSwitch/Distributed Switch

      • Physical network switches

      • Storage target interfaces