Deploying a cluster with FCoE Storage, fails in SDDC manager 5.X, with the following error "Checking the Datastore availability"
search cancel

Deploying a cluster with FCoE Storage, fails in SDDC manager 5.X, with the following error "Checking the Datastore availability"

book

Article ID: 429570

calendar_today

Updated On:

Products

VMware SDDC Manager / VCF Installer

Issue/Introduction

When deploying the aforementioned Cluster with FCoE storage, the task will fail, with the error "Checking the Datastore availability".

Vmkernel.log snippet shows an attempt to set that explicit VLAN ID...
<date&&time>Z In(182) vmkernel: cpu0:2098488)ql_fcoe:vmhba4:CreateFabric:112:Info: Fabric created: 0x431f7a571bb0
<date&&time>Z In(182) vmkernel: cpu0:2098488)ql_fcoe_spin_lock_init: lock = RportEvListLock
<date&&time>Z In(182) vmkernel: cpu0:2098488)ql_fcoe:vmhba4:SendFCoEDiscoverySolicitation:1204:Info: Sending FIP discovery for vlan_id = 0x3ea (0x431f7a571bb0
<date&&time>Z In(182) vmkernel: cpu64:2098487)ql_fcoe:vmhba3:SendFCoEVlanSolicitation:1477:Info: Sending FCoEVlanSolicitation request (0x3)
<date&&time>Z In(182) vmkernel: cpu0:2098488)ql_fcoe:vmhba4:SendFCoEDiscoverySolicitation:1204:Info: Sending FIP discovery for vlan_id = 0x3ea (0x431f7a571bb0
<date&&time>Z Wa(180) vmkwarning: cpu64:2098487)WARNING: ql_fcoe:vmhba3:FipVlanTimeoutWork:254: FIP VLAN Max Retries reached, cur vlan and pri: <VLAN_TBD>
<date&&time>Z In(182) vmkernel: cpu64:2098487)ql_fcoe_spin_lock_init: lock = SessionListLock
<date&&time>Z In(182) vmkernel: cpu64:2098487)ql_fcoe:vmhba3:CreateFabric:112:Info: Fabric created: 0x431f7a59d830
<date&&time>Z In(182) vmkernel: cpu64:2098487)ql_fcoe_spin_lock_init: lock = RportEvListLock
<date&&time>Z In(182) vmkernel: cpu64:2098487)ql_fcoe:vmhba3:SendFCoEDiscoverySolicitation:1204:Info: Sending FIP discovery for vlan_id = 0x3ea (0x431f7a59d830
<date&&time>Z In(182) vmkernel: cpu0:2098488)ql_fcoe:vmhba4:SendFCoEDiscoverySolicitation:1204:Info: Sending FIP discovery for vlan_id = 0x3ea (0x431f7a571bb0
<date&&time>Z In(182) vmkernel: cpu64:2098487)ql_fcoe:vmhba3:SendFCoEDiscoverySolicitation:1204:Info: Sending FIP discovery for vlan_id = 0x3ea (0x431f7a59d830


...however it Fails below, after reaching the maximum number of attempts 
<date&&time>Z Wa(180) vmkwarning: cpu0:2098488)WARNING: ql_fcoe:vmhba4:FipDiscoveryTimeoutWork:125: Max retry exhausted for Fabric = 0x431f7a571bb0 vlan_id = 0x3ea
<date&&time>Z In(182) vmkernel: cpu0:2098488)ql_fcoe:vmhba4:StartPortLogout:1236:Info: Sess 0x431f7a58c740 port_id fffffe
<date&&time>Z In(182) vmkernel: cpu0:2098488)ql_fcoe:vmhba4:CancelExchangeHandling:196:Info: Enter for Sess = 0x431f7a58c740
<date&&time>Z In(182) vmkernel: cpu2:2098489)ql_fcoe:vmhba4:DeleteFabric:123:Info: Fabric 0x431f7a571bb0 000000 destroyed

While the Cluster is created, the task itself, cannot complete as the Datastore is inaccessible from a Host/ vCenter perspective.

Environment

SDDC 5.X, cluster with FCoE Storage

Cause

As per VMware engineering, the  ESX storage stack, doesn't impact the vlan configuration for the FCoE solution.
That should either be configured on the switch side, or the FCoE Firmware side, which is owned by the Vendor. 

The default deployment, sets the value of LLDP to Both [from Available options = Listen, Advertise, Both] which seems to be the cause of the Failure.
 

Resolution

VMware Engineering is aware of the issue, and is expecting an update from the Vendor side. 
VMware recommendation is to engage the Vendor for the FCoE Solution.


WORKAROUND 

1. in order to complete the SDDC task from vSphere side, deploy a cluster with FCoE

2. expect it to fail at "Checking the Datastore availability"
Description              Checking the datastore availability
Progress Messages        Datastore <Datastore_name> of datacenter datacenter_<TBD> is inaccessible

3. Once the hosts are deployed/ joined in said cluster, and once said sub-task is failed

4. Change the LLDP to LISTEN on the DVS for the cluster  
vCenter > Networking > DVS > Configure >  EDIT > Advanced > Discovery Protocol > Link Layer Discovery Protocol > select LISTEN



5. then Reboot the Hosts, these will come up with the HBA online  

6. once reboot is completed

7. Retry task from SDDC, to complete the Cluster deployment.