Understanding network rollback and recovery in vSphere 5.1 and later
search cancel

Understanding network rollback and recovery in vSphere 5.1 and later

book

Article ID: 311145

calendar_today

Updated On:

Products

VMware vCenter Server VMware vSphere ESXi

Issue/Introduction

This article provides an overview of the automatic rollback and recovery feature in vSphere 5.1 and later.

Note: For more information, see the vSphere Networking Guide. This guide contains definitive information. If there is a discrepancy between the guide and the article, assume the guide is correct.


Environment

VMware vSphere ESXi 5.1.x
VMware vSphere ESXi 5.5.x
VMware vSphere ESXi 6.0.x
VMware vSphere ESXi 6.5.x
VMware vSphere ESXi 7.0.x
VMware vSphere ESXi 8.0.x
VMware vCenter Server 5.1.x
VMware vCenter Server 5.5.x
VMware vCenter Server 6.0.x
VMware vCenter Server 6.5.x
VMware vCenter Server 7.0.x
VMware vCenter Server 8.0.x

 

Resolution

vSphere 5.1 and later allows you to rollback to previous networking configurations if a networking misconfiguration occurs. This also allows you to recover from any misconfiguration by connecting directly to a host to fix any networking issues through the Direct Console User Interface (DCUI). Rollback is available for use on both standard and distributed switches.

Background

The management network is configured on every host and is used to communicate with vCenter Server and to interact with other hosts during vSphere HA configuration and operation. It is critical to centrally managing hosts through vCenter Server. If the management network on the host goes down or there is a misconfiguration, vCenter Server cannot connect to the host and therefore cannot centrally manage the vSphere infrastructure.

In a vSphere standard switch (VSS) environment, you can recover from management network failure on the host by reconfiguring the host management network through a DCUI.

However, in the VDS environment, where multiple hosts are connected to a distributed switch, any network failure or misconfiguration of the management port group can potentially disconnect all hosts from the vCenter Server system. In this situation, vCenter Server cannot centrally make any changes to the VDS port group configuration and push those changes to hosts. The only way to recover from this situation is by going to individual hosts and building a VSS with a proper management network configuration. After all the hosts’ management networks have been reconfigured with a VSS and are able to communicate on the management network, vCenter Server can again manage the hosts and reconfigure the VDS.

To avoid such operational issues of going back to a VSS, if you do not have physical network interface card limitations on the hosts, you can make use of a VSS for the management network and a VDS for all other virtual infrastructure and virtual machine traffic. In such deployments, you must have at least four network adapters on the hosts: two connected to the VSS and two to the VDS.

The automatic rollback and recovery feature introduced in vSphere 5.1 addresses concerns regarding use of the management network on a VDS. First, the automatic rollback feature automatically detects any configuration changes on the management network. If the host cannot reach the vCenter Server system, it does not allow the changes to take effect. Second, you also have an option to reconfigure the management network of the VDS per host through the DCUI.

vSphere Network Rollback

Rollback is enabled by default. However, you can enable or disable rollbacks at the vCenter Server level.

Several networking events can trigger a rollback. The events are grouped into these categories:

  • Host networking rollbacks (virtual switches or network system)
  • Distributed switch rollbacks

Host Networking Rollbacks

Host networking rollbacks occur when an invalid change is made to the host networking configuration. Every network change that disconnects a host also triggers a rollback. These changes to the host networking configuration are examples of what might trigger a rollback:

  • Updating the speed or duplex of a physical NIC
  • Updating DNS and routing settings
  • Updating teaming and failover policies or traffic shaping policies of a standard port group that contains the management VMkernel network adapter
  • Updating the VLAN of a standard port group that contains the management VMkernel network adapter
  • Increasing the MTU of management VMkernel network adapters and its switch to values not supported by the physical infrastructure
  • Changing the IP settings of management VMkernel network adapters
  • Removing the management VMkernel network adapter from a standard or distributed switch
  • Removing a physical NIC of a standard or distributed switch containing the management VMkernel network adapter

If a network disconnects for any of these reasons, the task fails and the host reverts to the last valid configuration.

Distributed Switch Rollbacks

Distributed switch rollbacks occur when invalid updates are made to distributed switch-related objects, such as distributed switches, distributed port groups, or distributed ports. These changes to the distributed switch configuration might trigger a rollback:

  • Changing the MTU of a distributed switch
  • Changing the following settings in the distributed port group of the management VMkernel network adapter
  • Teaming and failover
  • VLAN
  • Traffic shaping
  • Blocking all ports in the distributed port group containing the management VMkernel network adapter
  • Overriding the policies above for the distributed port to which the management VMkernel network adapter is connected

If an invalid configuration for any of the changes occurs, one or more hosts might be out of synchronization with the distributed switch.

If you know where the conflicting configuration setting is located, you can manually correct the setting. For example, if you migrated a management VMkernel network adapter to a new VLAN incorrectly, the VLAN might not be trunked on the physical switch. When you correct the physical switch configuration, the next distributed switch-to-host synchronization will resolve the configuration issue.

If you are not sure where the problem exists, you can rollback the distributed switch or distributed port group to a previous configuration. You perform both of these steps manually.

Note: For information on rolling back to a previous configuration with the vSphere Web Client or disabling network rollback using the vSphere Web Client, see vSphere Networking Rollback in the vSphere Networking Guide.

Recovering From Network Configuration Errors using the Direct Console User Interface (DCUI).

vSphere 5.1 and later allows you to connect directly to a host to fix distributed switch properties or other networking misconfigurations using the Direct Console User Interface (DCUI).
 
Notes:
  • Recovery is not supported on stateless ESXi instances.
  • The Management Network must be configured on a distributed switch. This is the only way you can fix distributed switch configuration errors using the DCUI.

To restore VDS from DCUI:

  1. Connect to the DCUI.
  2. From the Network Restore Options menu, select Restore vDS.
  3. Type the correct values for VLAN uplink and blocked properties, where appropriate.
  4. Press Enter.

The DCUI clones a host local port from the existing misconfigured port and applies the values you provided for VLAN and Blocked. The DCUI changes the Management Network to use the new host local port to restore connectivity to vCenter Server. vCenter Server picks up the new host local port and updates its database with the new information. vCenter Server creates a standalone port that is connected to the Management Network.