Understanding Slow Storage vMotion Performance Through Log Analysis
search cancel

Understanding Slow Storage vMotion Performance Through Log Analysis

book

Article ID: 383770

calendar_today

Updated On:

Products

VMware vSphere ESXi VMware vCenter Server

Issue/Introduction

Storage vMotion operations may exhibit significantly degraded performance when migrating virtual machines between datastores, even under optimal conditions (such as during off-peak hours). This can result in extended migration times that far exceed expected durations.

Environment

  • VMware ESXi
  • Storage vMotion operations
  • Virtual machines of any size
  • Extended migration times for Storage vMotion operations
  • Poor throughput during data transfer
  • Higher than normal storage latency
  • Increased host CPU utilization during migrations

 

Cause

Several factors can contribute to degraded Storage vMotion performance:

  1. Disabled or misconfigured VAAI (vStorage APIs for Array Integration)
  2. High storage latency
  3. Time synchronization issues
  4. Outdated system firmware or drivers
  5. Suboptimal storage network configuration

Resolution

Step 1: Analyze Key Log Files

1. Check VAAI Status in /var/log/hostd.log:
```
YYYY-MM-DDThh:mm:ss.###Z: Unable to connect to vaai-nasd socket [No such file or directory]
```
This indicates VAAI is not functioning properly.

2. Monitor Storage Latency through /var/log/vmkernel.log:
```
2024-11-19T17:35:45.709Z cpu14:2098284)WARNING: NFS: 5015: NFS volume #######_01 performance has deteriorated. I/O latency increased from average value of 5085(us) to 117068(us).
```
This indicates a significant degradation in storage performance.

High latency indicators:

  • Max latency > 1000ms (1000000 µs)indicates severe performance issues and can crash VMs
  • High latency > 40ms (40,000 µs) provides noticeably slow performance for intensive operations such as storage vMotion
  • Median latency > 25ms (25,000 µs) suggests consistent performance problems
  • Lower latencies < 20ms (20,000 µs) are usually noticeable only with very high input/output applications

4. Examine Migration Progress in vpxa.log:
```
YYYY-MM-DDThh:mm:ss.###Z verbose vpxa[#####] Immigrating VM at path /vmfs/volumes/volume-id/vm-name/vm-name.vmx has vmid ##
...
YYYY-MM-DDThh:mm:ss.###Z verbose vpxa[#####] Finished tracking destination of migration
```
These entries help track migration duration and completion status.

5. Review Available Storage Space:
```
NFS         Public    volume-id    volume_name    total    used    free   %used
                                                 2560.00  2306.91  253.09  9.89%
```
Low free space can impact performance.

Step 2: Performance Metrics Analysis

Calculate throughput using these formulas:
1. Transfer Duration = (Migration End Time - Start Time)
2. Average Throughput = (Total Data Size / Transfer Duration)

Example calculation:
- For a 100GB VM with 27-minute transfer:
  - Throughput ≈ 62 MB/s (indicates poor performance if using 10GbE networking)

This detailed log analysis helps identify:

  • Storage array performance issues
  • Network bottlenecks
  • Resource constraints
  • Configuration problems
  • Consider this an expected baseline for 10GbE networks:
    • Good performance: > 500 MB/s
    • Acceptable performance: 200-500 MB/s
    • Poor performance: < 100 MB/s

Step 3: Implement Performance Optimizations

Based on log analysis results:

  1.  If VAAI issues are detected:
    • Engage storage vendor to enable and configure VAAI support
    • Verify storage configuration

  2. If high latency is observed:
    • Review storage network configuration
    • Check for network or storage path congestion
    • Verify storage array performance

  3. If system-level issues are found:
    • Update system firmware and drivers
    • Configure proper time synchronization
    • Perform system maintenance as needed

Additional Information