Collecting vSAN Performance Service data for vSAN performance issues
book
Article ID: 326959
calendar_today
Updated On:
Products
VMware vSAN
Issue/Introduction
This article describes how to collect vSAN performance service data that needs to be uploaded to VMware Support ticket (GS) to analyze/review performance matrix of your vSAN/VxRail clusters.
Symptoms:
Customer Experience Improvement Program (CEIP) cannot be enabled due to security concerns
vCenter Server does not have a connection to the internet to upload CEIP data to VMware
See deteriorated performance on vSAN cluster and you need VMware support to identify and resolve performance related issues on vSAN/VxRail environments.
Environment
VMware vSAN 6.x VMware vSAN 7.x VMware vSAN 8.x
Cause
Observing any of the following:
high latency on vSAN performance charts
VMs experiencing high latency
slow throughput
very high outstanding I/O
Resolution
Notes:
The metrics collected by this process only contain metrics of the last 2 days;
The collection operation will invoke a number of queries. This operation could take a while to finish.
Collection steps:
Ensure the performance issue is actively occurring.
Enable Verbose and Network Diagnostics mode via vSAN cluster > Configure > vSAN > Services > Performance Service > Edit
Let the system run for 10 to 15 minutes.
Collect the host logs from the entire cluster including vCenter logs
Disable the Verbose and Network Diagnostics mode
Upload the logs to VMware for analysis
Notes:
It's highly recommended that the cluster logs are collected before vCenter logs to ensure we get all the required logging from the hosts at the time of the event as host logs tend to wrap faster than vCenter logs.
When troubleshooting performance issues it's best to collect the Master/Leader host first to ensure we get the perf data at the time of the performance event.
When dealing with large clusters 20+ hosts and collecting logs via vCenter it's best to collect the host logs in small batches, no more than 5 at a time to ensure the logs don't get corrupt.
If the issue is not actively occurring but is reproducible on demand then do the following:
Enable Verbose and Network Diagnostics mode via vSAN cluster -> Configure -> vSAN -> Services -> Performance Service -> Edit
Reproduce the performance issue
Let the system run for 10 to 15 minutes.
Collect the ESXi host logs from all hosts in the cluster and also logs from vCenter
Disable the Verbose and Network Diagnostics mode
Upload the logs to VMware for analysis
If the performance issue is affecting the log collection, we will need at a bare minimum the logs from the Master host in the cluster, as the bundle of the Master will contain the performance statistics of the cluster.
To determine the Master host in the cluster check the following: vCenter -> vSAN Cluster -> Monitor -> vSAN -> Health -> Performance Service -> Stats Master Election
Additional Information
Impact/Risks:
In order for the required performance data to be accessible, it is required that the vSAN Performance Service is enabled on all production vSAN deployments.
This helps to identify bottlenecks/issues at various layers of the vSAN stack.
The verbose mode causes log files to be large in size and are placed the scratch partition and perf stat db object
The ESXI logs collection may take a long time to finish the long bundle collection