AVI GSLB service pools marked down with error "OPER_UNAVAIL"
search cancel

AVI GSLB service pools marked down with error "OPER_UNAVAIL"

book

Article ID: 407472

calendar_today

Updated On:

Products

VMware Avi Load Balancer

Issue/Introduction

On large scale GSLB environments with a significant large number of Service Engines it is beneficial to configure the system with health monitor sharding.  This allows for a selective set of Service Engines to health check a small subset of GSLB services and reduces load on the system.

More information on this feature can be found here: Health Monitor Sharding

In a configuration with multiple GSLB sites only one or more GSLB sites mark a service down but the actual service is operational and reachable from the DNS virtual service Service Engines of the affected site.

Example:

From CLI the GSLB service runtime the data path status of the affected site will show the status as "OPER_UP" but the data path status as "OPER_UNAVAIL"

Command: 

show gslbservice <GS_NAME> runtime

 

Example:

| groups[1]                 |                                                                                |
|   name                    | EXAMPLE-gs-pool-10                                                             |
|   members[1]              |                                                                                |
|     cluster_uuid          | cluster-UUID                                                                   |
|     site_name             | site-A                                                                         |
|     fqdn                  | example1.local                                                                 |
|     ip                    | x.x.x.x                                                                        |
|     oper_ips[1]           | x.x.x.x                                                                        |
|     vip_type              | NON_AVI_VIP                                                                    |
|     ip_value_to_se        | xxxxxxxxx                                                                      |
|     oper_status           |                                                                                |
|       state               | OPER_UP                                                                        |
|       last_changed_time   | Wed Jul 23 19:56:13 2025 ms(422836) UTC                                        |
--SNIP---
|     datapath_status[4]    |                                                                                |
|       site_uuid           | cluster-UUID                                                                    |
|       oper_status         |                                                                                |
|         state             | OPER_UNAVAIL                                                                   |
--SNIP---

| groups[2]                 |                                                                                |
|   name                    | EXAMPLE-gs-pool-9                                                              |
|   members[1]              |                                                                                |
|     cluster_uuid          | cluster-UUID                                                                  |
|     site_name             | site-A                                                                         |
|     fqdn                  | example2.local                                                                 |
|     ip                    | x.x.x.x                                                                        |
|     oper_ips[1]           | x.x.x.x                                                                        |
|     vip_type              | NON_AVI_VIP                                                                    |
|     ip_value_to_se        | xxxxxxxxx                                                                      |
|     oper_status           |                                                                                |
|       state               | OPER_UP                                                                        |
|       last_changed_time   | Wed Jul 23 19:56:13 2025 ms(423346) UTC                                        |
--SNIP---
|     datapath_status[4]    |                                                                                |
|       site_uuid           | cluster-UUID                                                                   |
|       oper_status         |                                                                                |
|         state             | OPER_UNAVAIL                                                                   |
|       location            |                                                                                |

From the GSLB health monitor status you will observe the following items:

  • No "shm_runtime" for the active health monitor, ie. TCP/HTTP
  • The system default "System-GSLB-Health-Monitor" will be marked "OPER_DOWN" with error "State not derived locally"
  • All Service Engines will have hm_off set to True.  In a health monitor shard environment at least one SE will have hm_off set to False, and that SE will perform the health check.

Command: 

show virtualservice <DNS_VS_NAME> gslbservicehmonstat filter gs_ref <GS_NAME>  disable_aggregate se

Example: - In this example the DNS VS used for GSLB was hosted on two Service Engines.

SE: Avi-se-pqlyk

| se_uuid                           | se-UUID                                                               |
| uuid                              | gslbservice-UUID                                                      |
| hm_off                            | True                                                                  |

| groups[1]                         |                                                                       |
|   name                            | EXAMPLE-gs-pool-9                                                     |
|   hmon[1]                         |                                                                       |
|     last_transition_timestamp_3   | Tue Jul 22 18:29:40 2025 ms(703430) UTC                               |
|     last_transition_timestamp_2   | Tue Jul 22 18:29:40 2025 ms(689533) UTC                               |
|     server_hm_stat[1]             |                                                                       |
|       server_name                 | x.x.x.x:0                                                             |
|       oper_status                 |                                                                       |
|         state                     | OPER_DOWN                                                             |
|         reason[1]                 | Marked down by System-GSLB-Health-Monitor [State not derived locally] |
|       last_transition_timestamp_3 | Tue Jul 22 18:29:40 2025 ms(703426) UTC                               |
|       last_transition_timestamp_2 | Tue Jul 22 18:29:40 2025 ms(689645) UTC                               |
|       shm_runtime[1]              |                                                                       |
|         health_monitor_name       | System-GSLB-Health-Monitor                                            |
|         health_monitor_type       | HEALTH_MONITOR_GSLB                                                   |
|     se_uuid                       | Avi-se-pqlyk:se-UUID                                                  |

| groups[2]                         |                                                                       |
|   name                            | EXAMPLE-gs-pool-10                                                    |
|   hmon[1]                         |                                                                       |
|     last_transition_timestamp_3   | Tue Jul 22 18:29:40 2025 ms(707440) UTC                               |
|     last_transition_timestamp_2   | Tue Jul 22 18:29:40 2025 ms(693994) UTC                               |
|     server_hm_stat[1]             |                                                                       |
|       server_name                 | x.x.x.x:0                                                             |
|       oper_status                 |                                                                       |
|         state                     | OPER_DOWN                                                             |
|         reason[1]                 | Marked down by System-GSLB-Health-Monitor [State not derived locally] |
|       last_transition_timestamp_3 | Tue Jul 22 18:29:40 2025 ms(707437) UTC                               |
|       last_transition_timestamp_2 | Tue Jul 22 18:29:40 2025 ms(694050) UTC                               |
|       shm_runtime[1]              |                                                                       |
|         health_monitor_name       | System-GSLB-Health-Monitor                                            |
|     se_uuid                       | Avi-se-pqlyk:se-UUID                                                  |

SE: Avi-se-osgr

| se_uuid                           | se-UUID                                                               |
| uuid                              | gslbservice-UUID                                                      |
| hm_off                            | True                                                                  |

| groups[1]                         |                                                                       |
|   name                            | EXAMPLE-gs-pool-10                                                    |
|   hmon[1]                         |                                                                       |
|     last_transition_timestamp_3   | Tue Jul 22 18:29:40 2025 ms(699898) UTC                               |
|     last_transition_timestamp_2   | Tue Jul 22 18:29:40 2025 ms(696941) UTC                               |
|     server_hm_stat[1]             |                                                                       |
|       server_name                 | x.x.x.x:0                                                             |
|       oper_status                 |                                                                       |
|         state                     | OPER_DOWN                                                             |
|         reason[1]                 | Marked down by System-GSLB-Health-Monitor [State not derived locally] |
|       last_transition_timestamp_3 | Tue Jul 22 18:29:40 2025 ms(699892) UTC                               |
|       last_transition_timestamp_2 | Tue Jul 22 18:29:40 2025 ms(696967) UTC                               |
|       shm_runtime[1]              |                                                                       |
|         health_monitor_name       | System-GSLB-Health-Monitor                                            |
|         health_monitor_type       | HEALTH_MONITOR_GSLB                                                   |
|     se_uuid                       | Avi-se-osgrc:se-UUID                                                  |

| groups[2]                         |                                                                       |
|   name                            | EXAMPLE-gs-pool-9                                                     |
|   hmon[1]                         |                                                                       |
|     last_transition_timestamp_3   | Tue Jul 22 18:29:40 2025 ms(700413) UTC                               |
|     last_transition_timestamp_2   | Tue Jul 22 18:29:40 2025 ms(693564) UTC                               |
|     server_hm_stat[1]             |                                                                       |
|       server_name                 | x.x.x.x:0                                                             |
|       oper_status                 |                                                                       |
|         state                     | OPER_DOWN                                                             |
|         reason[1]                 | Marked down by System-GSLB-Health-Monitor [State not derived locally] |
|       last_transition_timestamp_3 | Tue Jul 22 18:29:40 2025 ms(700409) UTC                               |
|       last_transition_timestamp_2 | Tue Jul 22 18:29:40 2025 ms(693633) UTC                               |
|       shm_runtime[1]              |                                                                       |
|         health_monitor_name       | System-GSLB-Health-Monitor                                            |
|         health_monitor_type       | HEALTH_MONITOR_GSLB                                                   |
|     se_uuid                       | Avi-se-osgrc:se-UUID                                                  |

Environment

Affects Version(s):

22.1.1 - 22.1.7-2p10
30.1.1
30.1.2 - 30.1.2-2p3
30.2.1 - 30.2.1-2p6
30.2.2 - 30.2.2-2p6
30.2.3 - 30.2.3-2p4
30.2.4
31.1.1 = 31.1.1-2p3

Cause

This health monitor issue has been identified as a product issue with health monitor sharding where the mapping is not sent to the SE elected for health monitoring.

Resolution

This product issue will be fixed in the next GA releases of the VMware Avi Load Balancer.  Please look for the ID below in the product release notes.  VMware Avi Load Balancer

ID: AV-246215

Workaround(s):

The workaround for this issue is to restart the shard service only on the affected GSLB site controller cluster leader node.

Step(s):

  1. Ssh to the controller leader node of the affected GSLB site with the admin local user
  2. Execute the following command to restart the shard service and recover health monitoring issue.

    sudo systemctl restart shard_server.service


  3. Validate health monitor status of the GSLB pool via the GUI or via CLI with the following steps:

    a. ssh to the affected GSLB site leader controller with admin and launch and login the CLI (shell) also with the admin user

    b. Run the following commands to check the status of the GSLB service from the DNS VS of the affected site.

    switchto tenant *
    show gslbservice <GS_NAME> runtime
    show virtualservice <DNS_VS_NAME> gslbservicehmonstat filter gs_ref <GS_NAME> disable_aggregate se