vMotion Fails with "Invalid Fault" Error on ESXi Host updated to 7.0 or newer
search cancel

vMotion Fails with "Invalid Fault" Error on ESXi Host updated to 7.0 or newer

book

Article ID: 371579

calendar_today

Updated On:

Products

VMware vSphere ESXi VMware vSphere ESXi 7.0 VMware vSphere ESXi 8.0

Issue/Introduction

When attempting to migrate virtual machines (VMs) to a newly upgraded ESXi host using vMotion, the operation fails with the error message: "a general system error occurred - invalid fault".

 

This issue prevents successful VM migration and can impact the ability to balance workloads or perform maintenance on the cluster.

To diagnose this issue, look for the following log entries in various log files:

1. In hostd.log on the affected destination ESXi host:
  

 [Timestamp] info hostd[ProcessID] [Originator details] ResolveCb: Failed with fault: (vim.fault.GenericVmConfigFault) {
      faultMessage = (vmodl.LocalizableMessage) [
         (vmodl.LocalizableMessage) {
            key = "msg.checkpoint.initmigration",
            arg = (vmodl.KeyAnyValue) [
               (vmodl.KeyAnyValue) {
                  key = "1",
                  value = "Necessary module isn't loaded"
               }
            ],
            message = "Failed to start migration: Necessary module isn't loaded."
         }
      ],
      reason = "Failed to start migration: Necessary module isn't loaded.",
      msg = "Failed to start migration: Necessary module isn't loaded."
   }
   ```

2. In vmkernel.log:

[Timestamp] cpu[X]:[ProcessID] opID=[OperationID])VmMemXfer: vm [ProcessID]: [ID]: Evicting VM with path:[VM_PATH]
[Timestamp] cpu[X]:[ProcessID] opID=[OperationID])VmMemXfer: [ID]: Creating crypto hash
[Timestamp] cpu[X]:[ProcessID] opID=[OperationID])VmMemXfer: vm [ProcessID]: [ID]: Could not find MemXferFS region for [VM_PATH]

3. In the vmware.log of the VM being migrated:

[Timestamp] Wa(03) vmx - MigratePlatformInitMigration: Necessary module isn't loaded
[Timestamp] In(05) vmx - [msg.checkpoint.initmigration] Failed to start migration: Necessary module isn't loaded.

4. In syslog.* files:

[Timestamp] jumpstart[ProcessID]: FcoePnic: ALERT: Configuring Software FCoE client on [vmnic]. The NIC driver does not export DCB capability required for lossless ethernet for FCOE. Hence the FCOE connection is NOT RECOMMENDED for production use.
 

These log entries indicate that the vMotion failure is related to a necessary module not being loaded, which is likely caused by the Software FCoE misconfiguration.

Environment

- VMware vSphere environment
- Newly upgraded ESXi host
- vMotion enabled
- FCoE configured in ESXi host configstore

Error Message "could not find memxferfs region".

Cause

The issue is caused by a misconfiguration related to Software Fibre Channel over Ethernet (FCoE). Although Software FCoE is not supported in vSphere 7.0 and later versions, some systems may still have this feature incorrectly enabled in their configuration.

Resolution

To resolve this issue, follow these steps to disable Software FCoE:

1. Log in to the ESXi host using SSH or the Direct Console User Interface (DCUI).

2. Run the following command to check for any existing Software FCoE configurations:

configstorecli config current get -c esx -g storage_fcoe -k fcoe_activation_nic_policies

3. If any configurations are found, remove them using one of the following commands:
   
   a. To remove Software FCoE on all NICs:

configstorecli config current delete -c esx -g storage_fcoe -k fcoe_activation_nic_policies --all

   b. To remove Software FCoE on a specific NIC (replace <vmnic-#> with the actual NIC name):

 configstorecli config current delete -c esx -g storage_fcoe -k fcoe_activation_nic_policies -i <vmnic-#> 

4. Verify that the Software FCoE configuration has been removed by running the command from step 2 again. The output should be empty ({}).

5. Reboot the ESXi host.

6. After the host has restarted, attempt the vMotion operation again.

Additional Information

If the issue persists after following these steps, consider checking for other potential causes such as network connectivity, host compatibility, or resource constraints.