Seeing Bulk/RAV Migration Slowness
search cancel

Seeing Bulk/RAV Migration Slowness

book

Article ID: 410180

calendar_today

Updated On:

Products

VMware HCX

Issue/Introduction

  • Seeing Bulk/RAV migration times of >30 days to seed and migrate. 
  • Transport analytics shows "up to 86 migrations" supported with a delta transfer size of "519GB"
  • Available Bandwidth has diminished while Bulk/RAV migrations are ongoing. 
  • The following is observed when logging into HCX Manager via SSH, then using CCLI to navigate to IX appliance servicing these migrations:
  • Viewing the RX Queue output from IX appliance /opt/vmware/cgw/files/techS* we see some RX queues being more utilized than the others. 
    RX Queue 0 :        ucast pkts rx: 5087469
    RX Queue 1 :        ucast pkts rx: 83540783
    RX Queue 2 :        ucast pkts rx: 73231604
    RX Queue 3 :        ucast pkts rx: 5358422
    RX Queue 4 :        ucast pkts rx: 50398273
    RX Queue 5 :        ucast pkts rx: 14378617
    RX Queue 6 :        ucast pkts rx: 9680470
    RX Queue 7 :        ucast pkts rx: 2298687

Environment

VMware HCX 4.11

Cause

  • In 4.11 there is a 1 to 1 mapping of CPU per RX queue. 
  • In 4.11 a condition exists which will send more packets to 1 RX queue than the others.
    • This lopsided distribution of packets will cause the CPU assigned to this RX queue to become over utilized. 

Resolution

This is a known issue impacting VMware HCX.

Workaround

  • Enabling APR (application path resiliency) on the service mesh may provide better distribution of packets to RX queues.
  • Utilize multiple service mesh's (IX appliances) to distribute migration workload.