The vCenter server fails to connect to its default gateway
search cancel

The vCenter server fails to connect to its default gateway

book

Article ID: 370693

calendar_today

Updated On:

Products

VMware vCenter Server 7.0 VMware vCenter Server 6.0 VMware vCenter Server 8.0

Issue/Introduction

This article provides steps to troubleshoot when the vCenter is unreachable.

  • vCenter Server cannot connect to its gateway.
  • Accessing the vCenter server fails from the jump host.
  • vCenter Server lost network connectivity after an unplanned or planned outage.

Cause

There are multiple different scenarios which can lead to the symptoms observed. 

Please follow the steps in the Resolution section to further identify which scenario applies to your situation.

Resolution

Steps to troubleshooting vCenter connection issues:

1) Launch a vSphere client directly into the ESXi host on which the vCenter server is running (with root level privileges).

2) Identify which network is expected to deliver packets to / from the vCenter server (port group if on a standard switch, and dvPort group if on a Virtual Distributed Switch, abbreviated vDS).

3) SSH into the host (or access via a virtual console) with root privileges, and confirm exactly what packets are being sent from and received by the vCenter using techniques described in Packet capture on ESXi using the pktcap-uw tool -- see example below:

a) Enter the command

net-stats -l

b) In the output of that command, determine the virtual ethernet adapter(s) being used by the vCenter; example:

PortNum          Type SubType SwitchName       MACAddress         ClientName
#########           5       9 #######         00:50:56:##:##:##  ###-####-#####
  • Where:
    • ######### is the number under the "PortNum" heading, which we will use for the --switchport parameter in the pktcap-uw tool mentioned in the packet capture KB
    • ####### is the name of the standard switch or the vDS; such as the generic "vSwitch0" or "DvsPortset-N" (where N is a number like 0, 1, 2, etc.)
    • 00:50:56:##:##:## is the MAC address of the virtual ethernet adapter
    • ###-####-##### is the name of the virtual machine -- in this case, the vCenter VM.

c) Run the following command to display the packets entering / exiting the VM:

pktcap-uw --switchport ######### --capture VnicTx,VnicRx -o - | tcpdump-uw -r - -enn
  • If you see only ARP (Address Resolution Protocol) packets being sent by the vCenter server to try to determine the MAC address of its default gateway, but no replies, that confirms that the reason for the connectivity loss is that there is no device on the network oustide the physical uplink that is carrying the traffic, that is responding with the expected ARP reply. 

d) Run the command below to determine which physical uplink (vmnic) is meant to carry the traffic:

esxtop (and select "n" for networking)

e) Have the networking team log into the physical switch and check the switchport configuration for the switchport to which the physical uplink is connected.

  • A very common cause of this symptom is a mismatch between the VLAN ID specification in the virtual or distributed virtual Port Group, and what is set in the physical switchport configuration (Reference:  VLAN configuration on virtual switches, physical switches, and virtual machines )
  • For example, if the VLAN ID specification in the virtual or distributed virtual Port Group is set to a non-zero number, then the inference is that the Virtual Switch Tagging method is being used, per the KB.  However, if the physical switchport to which the vmnic(s) is(are) connected is set to an Access port, or the intent is to use the physical switch's Native VLAN, then the VLAN ID specification in the virtual or distributed virtual Port Group should be set to zero.  

 

If the above is not the issue:

  • Verify that the ESX/ESXi Server Management service on the ESX/ESXi host is running. 

/etc/init.d/hostd status
/etc/init.d/vpxa status

  • Verify that the vCenter Server Appliance Management Interface (VAMI) from the client can be accessed.
  • Verify that the port group of the vCenter is configured with two vNICs to eliminate a NIC or a physical configuration issue. To isolate a possible issue:
     
    • If the load balancing policy is set to Default Virtual Port ID at the vSwitch or vDS level:
             Leave one vNIC connected with one uplink on the vSwitch or vDS, then try different vNIC and pNIC combinations until you determine where the vcenter VM is losing connectivity.
    •  If the load balancing policy is set to IP Hash:
      Ensure the physical switch ports are configured as port-channel. For more information on verifying the configuration on the physical switch, see Configure Distributed Switch with an EtherChannel (or port channel) and Example Configuration of LACP on VMware, Cisco, HP, Dell switches. Shut down all but one of the physical ports the NICs are connected to, and toggle this between all the ports. by keeping only one port connected at a time. Take note of the port/NIC combination where the vCenter VM loses network connectivity. 
    • You can also check the esxtop output using the n option (for networking) to see which pNIC the virtual machine is using. Try shutting down the ports on the physical switch one at a time to determine where the vcenter VM is losing network connectivity. This also rules out any misconfiguration on the physical switch port(s).
  • To recover the vCenter network when connected to a distributed switch with a single vmnic, we can build a temporary Standard Switch where we will connect vCenter to recover from the network down scenario. Then we make the necessary changes in the DVS to return to a good state.

Additional Information

NOTE: If the vmnics are in an LACP configuration, that will need to be broken on the physical switch to avoid downtime. Follow Configuring LACP on a vSphere Distributed Switch Port Group for steps on how to work with an LACP configuration.

If you do not have 2 vmnics in the ESXi, it is recommended that you follow these steps via DCUI Shell. Otherwise, you will lose access to SSH when you run the status down vmnic command and won’t be able to continue with the process.