On large scale GSLB environments with a significant large number of Service Engines it is beneficial to configure the system with health monitor sharding. This allows for a selective set of Service Engines to health check a small subset of GSLB services and reduces load on the system.
More information on this feature can be found here: Health Monitor Sharding
In a configuration with multiple GSLB sites only one or more GSLB sites mark a service down but the actual service is operational and reachable from the DNS virtual service Service Engines of the affected site.
Example:
From CLI the GSLB service runtime the data path status of the affected site will show the status as "OPER_UP" but the data path status as "OPER_UNAVAIL"
Command:
show gslbservice <GS_NAME> runtime
Example:
| groups[1] | |
| name | EXAMPLE-gs-pool-10 |
| members[1] | |
| cluster_uuid | cluster-UUID |
| site_name | site-A |
| fqdn | example1.local |
| ip | x.x.x.x |
| oper_ips[1] | x.x.x.x |
| vip_type | NON_AVI_VIP |
| ip_value_to_se | xxxxxxxxx |
| oper_status | |
| state | OPER_UP |
| last_changed_time | Wed Jul 23 19:56:13 2025 ms(422836) UTC |
--SNIP---
| datapath_status[4] | |
| site_uuid | cluster-UUID |
| oper_status | |
| state | OPER_UNAVAIL |
--SNIP---
| groups[2] | |
| name | EXAMPLE-gs-pool-9 |
| members[1] | |
| cluster_uuid | cluster-UUID |
| site_name | site-A |
| fqdn | example2.local |
| ip | x.x.x.x |
| oper_ips[1] | x.x.x.x |
| vip_type | NON_AVI_VIP |
| ip_value_to_se | xxxxxxxxx |
| oper_status | |
| state | OPER_UP |
| last_changed_time | Wed Jul 23 19:56:13 2025 ms(423346) UTC |
--SNIP---
| datapath_status[4] | |
| site_uuid | cluster-UUID |
| oper_status | |
| state | OPER_UNAVAIL |
| location | |
From the GSLB health monitor status you will observe the following items:
Command:
show virtualservice <DNS_VS_NAME> gslbservicehmonstat filter gs_ref <GS_NAME> disable_aggregate se
Example: - In this example the DNS VS used for GSLB was hosted on two Service Engines.
SE: Avi-se-pqlyk
| se_uuid | se-UUID |
| uuid | gslbservice-UUID |
| hm_off | True |
| groups[1] | |
| name | EXAMPLE-gs-pool-9 |
| hmon[1] | |
| last_transition_timestamp_3 | Tue Jul 22 18:29:40 2025 ms(703430) UTC |
| last_transition_timestamp_2 | Tue Jul 22 18:29:40 2025 ms(689533) UTC |
| server_hm_stat[1] | |
| server_name | x.x.x.x:0 |
| oper_status | |
| state | OPER_DOWN |
| reason[1] | Marked down by System-GSLB-Health-Monitor [State not derived locally] |
| last_transition_timestamp_3 | Tue Jul 22 18:29:40 2025 ms(703426) UTC |
| last_transition_timestamp_2 | Tue Jul 22 18:29:40 2025 ms(689645) UTC |
| shm_runtime[1] | |
| health_monitor_name | System-GSLB-Health-Monitor |
| health_monitor_type | HEALTH_MONITOR_GSLB |
| se_uuid | Avi-se-pqlyk:se-UUID |
| groups[2] | |
| name | EXAMPLE-gs-pool-10 |
| hmon[1] | |
| last_transition_timestamp_3 | Tue Jul 22 18:29:40 2025 ms(707440) UTC |
| last_transition_timestamp_2 | Tue Jul 22 18:29:40 2025 ms(693994) UTC |
| server_hm_stat[1] | |
| server_name | x.x.x.x:0 |
| oper_status | |
| state | OPER_DOWN |
| reason[1] | Marked down by System-GSLB-Health-Monitor [State not derived locally] |
| last_transition_timestamp_3 | Tue Jul 22 18:29:40 2025 ms(707437) UTC |
| last_transition_timestamp_2 | Tue Jul 22 18:29:40 2025 ms(694050) UTC |
| shm_runtime[1] | |
| health_monitor_name | System-GSLB-Health-Monitor |
| se_uuid | Avi-se-pqlyk:se-UUID |
SE: Avi-se-osgr
| se_uuid | se-UUID |
| uuid | gslbservice-UUID |
| hm_off | True |
| groups[1] | |
| name | EXAMPLE-gs-pool-10 |
| hmon[1] | |
| last_transition_timestamp_3 | Tue Jul 22 18:29:40 2025 ms(699898) UTC |
| last_transition_timestamp_2 | Tue Jul 22 18:29:40 2025 ms(696941) UTC |
| server_hm_stat[1] | |
| server_name | x.x.x.x:0 |
| oper_status | |
| state | OPER_DOWN |
| reason[1] | Marked down by System-GSLB-Health-Monitor [State not derived locally] |
| last_transition_timestamp_3 | Tue Jul 22 18:29:40 2025 ms(699892) UTC |
| last_transition_timestamp_2 | Tue Jul 22 18:29:40 2025 ms(696967) UTC |
| shm_runtime[1] | |
| health_monitor_name | System-GSLB-Health-Monitor |
| health_monitor_type | HEALTH_MONITOR_GSLB |
| se_uuid | Avi-se-osgrc:se-UUID |
| groups[2] | |
| name | EXAMPLE-gs-pool-9 |
| hmon[1] | |
| last_transition_timestamp_3 | Tue Jul 22 18:29:40 2025 ms(700413) UTC |
| last_transition_timestamp_2 | Tue Jul 22 18:29:40 2025 ms(693564) UTC |
| server_hm_stat[1] | |
| server_name | x.x.x.x:0 |
| oper_status | |
| state | OPER_DOWN |
| reason[1] | Marked down by System-GSLB-Health-Monitor [State not derived locally] |
| last_transition_timestamp_3 | Tue Jul 22 18:29:40 2025 ms(700409) UTC |
| last_transition_timestamp_2 | Tue Jul 22 18:29:40 2025 ms(693633) UTC |
| shm_runtime[1] | |
| health_monitor_name | System-GSLB-Health-Monitor |
| health_monitor_type | HEALTH_MONITOR_GSLB |
| se_uuid | Avi-se-osgrc:se-UUID |
Affects Version(s):
22.1.1 - 22.1.7-2p10
30.1.1
30.1.2 - 30.1.2-2p3
30.2.1 - 30.2.1-2p6
30.2.2 - 30.2.2-2p6
30.2.3 - 30.2.3-2p4
30.2.4
31.1.1 = 31.1.1-2p3
This health monitor issue has been identified as a product issue with health monitor sharding where the mapping is not sent to the SE elected for health monitoring.
This product issue will be fixed in the next GA releases of the VMware Avi Load Balancer. Please look for the ID below in the product release notes. VMware Avi Load Balancer
ID: AV-246215
Workaround(s):
The workaround for this issue is to restart the shard service only on the affected GSLB site controller cluster leader node.
Step(s):
sudo systemctl restart shard_server.service
switchto tenant *
show gslbservice <GS_NAME> runtime
show virtualservice <DNS_VS_NAME> gslbservicehmonstat filter gs_ref <GS_NAME> disable_aggregate se