Seeing Bulk/RAV Migration Slowness

search cancel

Seeing Bulk/RAV Migration Slowness

book

Article ID: 410180

calendar_today

Updated On:

Products

VMware HCX

Issue/Introduction

Seeing Bulk/RAV migration times of >30 days to seed and migrate.
Transport analytics shows "up to 86 migrations" supported with a delta transfer size of "519GB"
Available Bandwidth has diminished while Bulk/RAV migrations are ongoing.
The following is observed when logging into HCX Manager via SSH, then using CCLI to navigate to IX appliance servicing these migrations:

Viewing the RX Queue output from IX appliance /opt/vmware/cgw/files/techS* we see some RX queues being more utilized than the others.

RX Queue 0 :        ucast pkts rx: 5087469
RX Queue 1 :        ucast pkts rx: 83540783
RX Queue 2 :        ucast pkts rx: 73231604
RX Queue 3 :        ucast pkts rx: 5358422
RX Queue 4 :        ucast pkts rx: 50398273
RX Queue 5 :        ucast pkts rx: 14378617
RX Queue 6 :        ucast pkts rx: 9680470
RX Queue 7 :        ucast pkts rx: 2298687

Environment

VMware HCX 4.11

Cause

In 4.11 there is a 1 to 1 mapping of CPU per RX queue.
In 4.11 a condition exists which will send more packets to 1 RX queue than the others.
- This lopsided distribution of packets will cause the CPU assigned to this RX queue to become over utilized.

Resolution

This is a known issue impacting VMware HCX.

Workaround

Enabling APR (application path resiliency) on the service mesh may provide better distribution of packets to RX queues.
Utilize multiple service mesh's (IX appliances) to distribute migration workload.

Feedback

thumb_up Yes

thumb_down No