Network Latency Alarms in Skyline Health when no issue present in 7.0 U2 and U3
search cancel

Network Latency Alarms in Skyline Health when no issue present in 7.0 U2 and U3

book

Article ID: 326408

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

This KB is written to provide information around this issue, and to provide workaround steps and solution.

Symptoms:
The vSAN Network Latency Test in Skyline Health reports a Yellow status for the latency check on the cluster leader node. 

This can be seen in the UI by going to Skyline Health > Network Latency Check > Failed (Yellow).

For the following log messages your details will be different:

In the logs on vCenter you can see the following in the vmware-vsan-health-summary-result.log:
      Test siteconnectivity health : yellow
         SiteLatency: FaultDomain  FaultDomain  ObservedLatency(Ms)  Threshold(Ms)  LatencyResult
                      (WitnessHost, Vcf-W1C1_Primary-Az-Faultdomain, 12.84, 200, Green), (WitnessHost, Vcf-W1C1_Secondary-Az-Faultdomain, 0.37, 200, Green), (Vcf-W1C1_Primary-Az-Faultdomain, Vcf-W1C1_Secondary-Az-Faultdomain, 14.30, 5, Yellow),
         NetworkLatencyAmongHosts: FromHost  SourceDomain  ToHost  DestinationDomain  NetworkLatency(Ms)  Threshold(Ms)  NetworkLatencyCheckResult
                                   (Host-2495, Vcf-W1C1_Secondary-Az-Faultdomain, Host-2116, Vcf-W1C1_Primary-Az-Faultdomain, 0.23, 5, Green), (Host-2495, Vcf-W1C1_Secondary-Az-Faultdomain, Host-2128, Vcf-W1C1_Primary-Az-Faultdomain, 0.21, 5, Green),
..............
 (Host-2124, Vcf-W1C1_Primary-Az-Faultdomain, Host-2398, WitnessHost, 0.50, 200, Green), (Host-2124, Vcf-W1C1_Primary-Az-Faultdomain, Host-2479, Vcf-W1C1_Secondary-Az-Faultdomain, 0.20, 5, Green),
                                   (Host-2132, Vcf-W1C1_Primary-Az-Faultdomain, Host-2487, Vcf-W1C1_Secondary-Az-Faultdomain, 14.30, 5, Yellow), (Host-2132, Vcf-W1C1_Primary-Az-Faultdomain, Host-2471, Vcf-W1C1_Secondary-Az-Faultdomain, 0.61, 5, Green),

and

Test siteconnectivity health : yellow
SiteLatency: FaultDomain  FaultDomain  ObservedLatency(Ms)  Threshold(Ms)  LatencyResult
(Dc-K, Dc-S, 22.05, 5, Yellow)

During manual ping testing to the host listed in the logs no latency is apparaent.
 


Environment

VMware vSAN 7.0.x

Cause

This issue is caused by a CPU preemption situation between services on the vSAN cluster leader node leading to a false positive error.

Resolution

To fix this upgrade both vCenter and ESXi to version 7.0U3f or higher.

Workaround:
Disabling and re-enabling the vSAN performance service can alleviate this issue temporarily. As the performance data object will be removed and recreated, vSAN performance data before this point will be lost. If this data is required please pull a log bundle from the cluster master before disabling the service.

It may be needed to leave the service disabled to stop the alerts. If so, please realize no performance data will be collected during this time.

Additional Information

Impact/Risks:
There is no direct impact caused by this false alarm. 

There is potential to miss an actual alarm related to a real high latency connection when dealing with false positive errors.