Notes:
- The metrics collected by this process only contain metrics of the last 2 days;
- The collection operation will invoke a number of queries. This operation could take a while to finish.
Collection steps:
- Ensure the performance issue is actively occurring.
- Enable Verbose and Network Diagnostics mode via vSAN cluster > Configure > vSAN > Services > Performance Service > Edit

- Let the system run for 10 to 15 minutes.
- Collect the host logs from the entire cluster including vCenter logs
- Disable the Verbose and Network Diagnostics mode
- Upload the logs to VMware for analysis
Notes:
- It's highly recommended that the cluster logs are collected before vCenter logs to ensure we get all the required logging from the hosts at the time of the event as host logs tend to wrap faster than vCenter logs.
- When troubleshooting performance issues it's best to collect the Master/Leader host first to ensure we get the perf data at the time of the performance event.
- When dealing with large clusters 20+ hosts and collecting logs via vCenter it's best to collect the host logs in small batches, no more than 5 at a time to ensure the logs don't get corrupt.
If the issue is not actively occurring but is reproducible on demand then do the following:
- Enable Verbose and Network Diagnostics mode via vSAN cluster -> Configure -> vSAN -> Services -> Performance Service -> Edit
- Reproduce the performance issue
- Let the system run for 10 to 15 minutes.
- Collect the ESXi host logs from all hosts in the cluster and also logs from vCenter
- Disable the Verbose and Network Diagnostics mode
- Upload the logs to VMware for analysis
If the performance issue is affecting the log collection, we will need at a bare minimum the logs from the Master host in the cluster, as the bundle of the Master will contain the performance statistics of the cluster.
To determine the Master host in the cluster check the following:
vCenter -> vSAN Cluster -> Monitor -> vSAN -> Health -> Performance Service -> Stats Master Election
If the performance issue is for specific VM(s) run the below command from any host in the cluster so we can have the complete object layout for the impacted VM(s)
esxcli vsan debug object list --vm-name=<exact vm name> > /tmp/<vm-name>_objects.txt
For example: esxcli vsan debug object list --vm-name=msVM2 > /tmp/msVM2_objects.txt
Note: By exact VM name it's the name of the VM used for the files not the display name in the instance the VM display name was changed in vCenter after the VM was initially created. The syntax is case sensitive and if the VM name contains spaces use "".
Once the command completes and the file(s) created download the file(s) from the host via an FTP client like WinSCP and then upload the file(s) to the SR along with the collected log bundles.
This is especially helpful if dealing with VMs with multiple VMDKs and if the VMDKs are large in size.