ESXi firewall rule refresh operations may result in vSAN CMMDS and/or RDT firewall rules being disabled
search cancel

ESXi firewall rule refresh operations may result in vSAN CMMDS and/or RDT firewall rules being disabled

book

Article ID: 315514

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

This article provides information relating to the symptoms, cause and resolution of this issue.

Symptoms:
vSAN CMMDS (UDP port 12321) and RDT (TCP port 2233) are both critical traffic ports for functionality of vSAN nodes.
If CMMDS traffic is not permitted to/from a node then it will become isolated from the cluster even though vmkping tests etc. will not indicate an issue.
If RDT traffic is not permitted to/from a node then it will be unable to send nor receive data read/write requests to vSAN objects.

Environment

VMware vSAN 7.0.x

Cause

Under some circumstances there can be a disparity between two ESXi ConfigStore data pertaining to configured firewall rules for some services.
Due to order in the which firewall rules are re-applied (from the ConfigStore data), following ESXi firewall rule refresh operations (either pushed from vCenter or manually performed by an ESXi administrator), this can result in an unintended 'enabled: false' value being applied for the firewall rule of a service (e.g. CMMDS and RDT here). This results in the firewall rule for this service being disabled until the host is rebooted.

Example:

    Problematic node example output:

    [root@hostname:~]  cd /etc/vmware/configstore
    [root@hostname:~] /usr/lib/vmware/sqlite/bin/sqlite3 current-store-1
    SQLite version 3.7.17 2013-05-20 00:56:22
    Enter ".help" for instructions
    Enter SQL statements terminated with a ";"
    sqlite> .mode line
    sqlite> select  * from Config where Name='firewall_rule_sets' and Identifier='rdt';
        Component = esx
      ConfigGroup = network
             Name = firewall_rule_sets
       Identifier = rdt
     ModifiedTime = 2023-10-11 11:10:19
     CreationTime = 2023-04-23 09:12:10
          Version = 1.3
          Success = 1
    AutoConfValue = {"name": "rdt", "enabled": true}                  <---Normal
        UserValue = {"name": "rdt", "enabled": false}                 <---Abnormal
       VitalValue = {"name": "rdt", "num_clients": 0}
      CachedValue =
     DesiredValue =
         Revision = 35
         
         
    sqlite> select  * from Config where Name='firewall_rule_sets' and Identifier='cmmds';
        Component = esx
      ConfigGroup = network
             Name = firewall_rule_sets
       Identifier = cmmds
     ModifiedTime = 2023-10-11 11:10:19
     CreationTime = 2023-04-23 09:12:10
          Version = 1.3
          Success = 1
    AutoConfValue = {"name": "cmmds", "enabled": true}                  <---Normal
        UserValue = {"name": "cmmds", "enabled": false}                 <---Abnormal
       VitalValue = {"name": "cmmds", "num_clients": 0}
      CachedValue =
     DesiredValue =
         Revision = 35



Non-problematic node example output:

    [root@hostname:~] cd /etc/vmware/configstore
    [root@hostname:~] /usr/lib/vmware/sqlite/bin/sqlite3 current-store-1
    SQLite version 3.7.17 2013-05-20 00:56:22
    Enter ".help" for instructions
    Enter SQL statements terminated with a ";"
    sqlite> .mode line
    sqlite> select  * from Config where Name='firewall_rule_sets' and Identifier='rdt';
        Component = esx
      ConfigGroup = network
             Name = firewall_rule_sets
       Identifier = rdt
     ModifiedTime = 2023-10-12 15:03:02
     CreationTime = 2023-04-23 11:06:19
          Version = 1.3
          Success = 1
    AutoConfValue =
        UserValue = {"name": "rdt", "enabled": true}                 <---Normal
      CachedValue =
     DesiredValue =
         Revision = 35



These values can also be checked in less detail (but more quickly) using:

# configstorecli config current get -c esx -g network -k firewall_rule_sets -i cmmds
# configstorecli config current get -c esx -g network -k firewall_rule_sets -i rdt

Resolution

This issue is resolved in ESXi 8.0 U2 (build: 22380479) and in an upcoming ESXi patch for ESXi 7.0 U3 (P09).

Workaround:
If this issue is encountered, ESXi host reboot (either warm or cold) should clear the disabled firewall settings, for remediation of these settings in-situ and to avoid re-occurrence, please open a Support Request with VMware GS and reference this KB article.

Additional Information

Impact/Risks:
An ESXi host with the above settings configured may get isolated from the vSAN cluster and/or be unable to send data traffic to the other nodes following refresh of these firewall rules.