vSAN File Service VMs do not deploy post migration of their assigned network port group
search cancel

vSAN File Service VMs do not deploy post migration of their assigned network port group

book

Article ID: 408244

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

Symptoms:

  • The vSAN File Service VM's (FSVM) assigned network port group may be migrated for different reasons such as:

    • N-VDS to VDS(Virtual Distributed Switch) migration of network port groups during an NSX-T upgrade.

    • Or, migration of the network port groups from one VDS to another.

  • After migrating the network port group assigned to the FSVM, if a host loses its FSVM (due to maintenance mode and reboot, ESXi re-installation, or the addition of a new node to the vSAN cluster) no new FSVMs will deploy on that host.

  • The FSVM deployment will fail with the error "Cannot complete the operation. See the event log for details. Unable to enable the vSAN file service. FSVM does not exist from the beginning."

  • vSAN Skyline Health will report the alert "Infrastructure Health - File service VM not found on this host." for the host which does not have its FSVM:

  • If the "Remediate" button in this health check is clicked, it still does not deploy the FSVM on the host and it fails with the same error "Cannot complete the operation. See the event log for details. Unable to enable the vSAN file service. FSVM does not exist from the beginning.".

  • The "Network" field for vSAN File Service (vSphere Client > vSAN Cluster > Configure > vSAN - File Service) show up as blank:
  • If an upgrade of FSVM is attempted in this state, the upgrade appears to not work and the FSVMs remain on the same version.

Environment

VMware vSAN 7.x

VMware vSAN 8.x

Cause

  • The FSVMs fail to deploy as the network port group assigned for it is not found anymore. This can be validated from the vCenter's /var/log/vmware/vsan-health/vmware-vsan-health-service.log:

    YYYY-MM-DDTHH:MM:SS.SSSZ INFO vsan-mgmt[12345] [VsanEamUtil::_WaitForAgencyProgress opID=########-####] EAM: Agent 'eam.Agent:########-####-####-####-############': Runtime =
    {
            status = red,
            host = 'vim.HostSystem:host-####',
            issue = (eam.issue.Issue) [
       (eam.issue.NoCustomAgentVmNetwork) {
          dynamicType = <unset>,
          dynamicProperty = (vmodl.DynamicProperty) [],
          key = 17,
          description = 'Agent network(s) "network-####" not available on host',
          time = YYYY-MM-DDTHH:MM:SS.SSSZ,
          agency = 'eam.Agent:########-####-####-####-############',
          agencyName = 'vsan-file-services',
          solutionId = 'com.vmware.vsan.health',
          solutionName = 'com.vmware.vsan.health',
          agent = 'eam.Agent:########-####-####-####-############',
          agentName = '########-####-####-####-############',
          host = 'vim.HostSystem:host-#####',
          hostName = '<hostname>',
          customAgentVmNetwork = (vim.Network) [
             'vim.OpaqueNetwork:network-####'
          ],
          customAgentVmNetworkName = (str) [
             'network-####'
          ]
       }
    ],
            vmHook = None
    }
  • The modID of the FSVM network port group changes during the migration of the network.

  • The network port group configuration for vSAN FSVM still holds the old moID. This does not update during migration of the network port group.

  • When EAM (VMware ESX Agent Manager) tries to deploy the FSVM, it looks at this configuration and fails to find the network port group which was configured for FSVM, and thus fails to deploy the FSVM.

  • The current moID of the FSVM's network port group can be identified by navigating to vSphere Client > Select an FSVM from the affected vSAN cluster > Select the Network tab for the VM > Select the assigned network port group:   Once on the network port group view, from the URL in the browser, it can be seen that the network port group now has a different moID of (as an example from below screenshot "dvportgroup-#########"):

 

Note:

  • The existing FSVMs will continue to run as there is no re-deployment of FSVM during the migration of the network. But if an FSVM were to be re-deployed, EAM would check the FSVM configuration and be unable to find the network port group with the old moID and thus fail to deploy the new FSVMs.

  • FSVM upgrade would also not work as this involves deployment of new FSVMs.

Resolution

If the above symptoms and cause match, open a technical case with Broadcom Technical Support to investigate further.

Attachments

update_fs_networkCfg_on_vc.py get_app