vSAN Cluster heartbeat timeouts and RDT connection failures After Enabling gdbserver Firewall Rule
search cancel

vSAN Cluster heartbeat timeouts and RDT connection failures After Enabling gdbserver Firewall Rule

book

Article ID: 436189

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

  • A vSAN cluster becomes unresponsive immediately after enabling gdbserver rule in firewall settings with a restrictive Allowable IP list.
  • Virtual machines (VMs) become unavailable or experience high I/O latency.
  • The following events are observed in vobd.log[vob.net.firewall.config.changed] Firewall configuration has changed. Operation 'enable' for rule set gdbserver succeeded.
  • VMkernel logs report heartbeat timeouts and RDT (Reliable Data Transport) connection failures: [esx.problem.vmfs.heartbeat.timedout] [Volume UUID]

Impact/Risks

Enabling this rule with IP restrictions on a production vSAN node will cause a cluster partition. This can lead to VM downtime if the remaining partition does not have a quorum of components to maintain object availability.

Environment

VMware vSAN (All versions)

Cause

The gdbserver service in the ESXi firewall is defined with a broad port range (typically TCP ports 1000 through 65535).

vSAN utilizes TCP port 2233 for RDT traffic, which is responsible for data synchronization and communication between nodes. When the gdbserver firewall rule is enabled and restricted to a specific IP address (such as a management VM), the ESXi firewall begins dropping all other traffic within that 1000-65535 range that does not originate from the allowed IP.

This causes the host to drop vSAN RDT traffic from its peer hosts, leading to immediate cluster isolation and object inaccessibility.

Resolution

To resolve the isolation and prevent future occurrences, the gdbserver rule must be disabled.

Step 1: Disable the gdbserver Rule

Review this command before running it.

esxcli network firewall ruleset set -e false -r gdbserver

Step 2: Verify vSAN Health

  1. Log in to the vSphere Client.
  2. Navigate to the vSAN Cluster > Monitor > vSAN > Skyline Health.
  3. Click Retest and ensure the Network and Cluster categories return to a Healthy (Green) status.

Step 3: Clear IP Restrictions (Best Practice)

If the rule must be used for temporary debugging, ensure the "Allowed IP" list is reverted to "All" before enabling, or explicitly include all vSAN VMkernel IP addresses in the allowed list. However, it is strongly recommended to keep this rule Disabled in production environments.