Quiesced snapshots corrupted on Hauwei servers with iBMA Net Controller NIC
search cancel

Quiesced snapshots corrupted on Hauwei servers with iBMA Net Controller NIC

book

Article ID: 323018

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Symptoms:

  • A quiesced snapshot reports corrupted, after  task completes.

...................vmx| I125: SnapshotVMX_TakeSnapshot start: 'OS Patch', deviceState=0, lazy=0, quiesced=1, forceNative=0, tryNative=1, saveAllocMaps=0
..................vcpu-3| I125: SnapshotVMXTakeSnapshotComplete: Done with snapshot 'OS Patch': 2
................. vcpu-3| I125: VigorTransport_ServerSendResponse opID=kawv4ote-2621484-auto-1k6r1-h5:70259286-cb-71-XXXX seq=208364: Completed Snapshot request.
.................. vcpu-7| I125: [msg.hbacommon.corruptredo] The redo log of 'xxxxxxxxxxx-000001.vmdk' is corrupted. If the problem persists, discard the redo log.

  • The ESXi is installed on a Huawei 2288H server
  • localcli network nic list  returns a . iBMA Net Controller NIC
    •   vmnic   PCI bus address   MAC address        name
        -----   ---------------   -----------                              ---   
        vmnic6  0000:03:00.0     9c:7d:a3:28:XX:XX   Huawei Technologies Co., Ltd. iBMA Net Controller



Cause


The issue is due to “Huawei Technologies Co., Ltd. iBMA Net Controller”  Network interface card used by a customer which presents NICs to multiple ESXi hosts with same MAC address.

VMware installation recommendations require NO duplicate MAC address on hosts using a VMFS distributed filesystem.

Duplicate MAC address conflict causes corruption of the ESXi file system and data loss.
Verfiy whether “uuid” String of all of the ESXI host end with the SAME uuid string

Resolution

Checked NIC-info, of the ibma-driver in the system: 

  • localcli network nic list  returns a . iBMA Net Controller NIC
    •   vmnic   PCI bus address   MAC address        name
        -----   ---------------   -----------                              ---   
        vmnic6  0000:03:00.0     9c:7d:a3:28:XX:XX   Huawei Technologies Co., Ltd. iBMA Net Controller

Need to confirm the uuid:       
 Perform vsish -e get /system/fsSwitch/uuid  to query whether the curHostID of the node are the same on all hosts in the cluster, if uuid String of all of the nodes ends with MAC reference.,then they are the same:

Uninstall ibma driver:

1 Perform esxcli software vib remove -n net-ibma-driver to uninstall the ibma driver:

2. Restart server to take effect after uninstall ibma driver;

3. Perform localcli network nic list to check whether ibma driver uninstall succeed or not;

4. Perform vsish -e get /system/fsSwitch/uuid to check the curHostID, if not ends with MAC reference anymore, then the curHostID are unique.

Additional Information

Impact/Risks:
Query whether the curHostID of the node are the same in the cluster, if uuid String of all of the nodes ends with same MAC reference ,then they are the same:

 Then the root cause is due to the iBMA virtual MAC addresses are the same, which makes curHostID are the same, and cause the problem.