Programming the Verbs API for a PVRDMA Device on ESXi 6.5 / ESXi 6.7 Hosts
search cancel

Programming the Verbs API for a PVRDMA Device on ESXi 6.5 / ESXi 6.7 Hosts

book

Article ID: 328676

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

VMware vSphere 6.5 and later supports remote direct memory access (RDMA) between virtual machines with paravirtual RDMA (PVRDMA) network adapters. You program RDMA devices using a language called the Verbs API. VMware supports a subset of the Verbs API as described below.

Resolution

A paravirtual RDMA (PVRDMA) device is a virtual NIC that implements RDMA and supports the Verbs API.

ESXi 6.5 hosts support PVRDMA for Linux virtual machines of hardware version 13 or later, if the kernel supports RDMA and the PVRDMA module is installed in the OS. Additionally, the virtual machine must be configured with vSphere Distributed Switch. Currently, PVRDMA is not supported for Windows virtual machines.

To configure an ESXi 6.5 host for PVRDMA, see the vSphere 6.5 Doc Center topic Configure an ESXi Host for PVRDMA. To install PVRDMA components on Linux, scroll to the bottom of this article.

To connect PVRDMA-based queue pairs (QPs) you can use an RDMA connection manager such as rdma_cm(7) on Linux.
You can edit an RDMA uplink configuration only with virtual machines powered off. A virtual machine can have only one PVRDMA device.

RDMA applications deployed in PVRDMA-enabled virtual machines can communicate only with peers running in other PVRDMA-enabled virtual machines. They cannot communicate with RDMA applications running on physical machines.

Verbs APIs Supported for VMware PVRDMA

Supported RDMA Work Requests:
Send/Receive
RDMA Write
RDMA Read
Local Invalidate (kernel-only)
Fast Register Memory Region (kernel-only)
Send With Invalidate (kernel-only)

Supported APIs for User Space:
Description of APIVerbs API Name
Query deviceibv_query_device
Query portibv_query_port
Allocate protection domainibv_alloc_pd
Deallocate protection domainibv_dealloc_pd
Register memory regionibv_reg_mr
Deregister memory regionibv_dereg_mr
Create completion queueibv_create_cq
Poll completion queueibv_poll_cq
Request notify for completion queueibv_req_notify_cq
Destroy completion queueibv_destroy_cq
Create queue pairibv_create_qp
Modify queue pairibv_modify_qp
Destroy queue pairibv_destroy_qp
Post send work requestibv_post_send
Post receive work requestibv_post_recv
Create address handleibv_create_ah
Destroy address handleibv_destroy_ah
Get CQ Eventsibv_get_cq_events
Create CQ completion channelibv_create_comp_channel
Acknowledge CQ eventsibv_ack_cq_events

Supported APIs for Kernel Space:
Description of APIVerbs API Name
Get DMA memory regionib_get_dma_mr
Allocate memory regionib_alloc_mr

Userspace verbs are supported in the kernel but are called ib_verb instead of ibv_verb. Additional verbs are supported in the kernel as alternatives to ibv_reg_mr.

Caution: various Linux distributions might contain variations of the kernel verbs.

Limitations of the VMware PVRDMA Implementation

The Verbs API has these limitations:
  • Only Reliable Connected (RC) and Unreliable Datagram (UD) QPs are supported.
  • Up to 3 UD QPs are supported.
  • Shared Receive Queues (SRQs) are supported in HWv14 VMs in ESXi 6.7.
  • Remote Read and Remote Write flags are not supported on DMA Memory Regions.
  • The rdma_create_ep verb is not supported.
The status of RDMA over converged Ethernet (RoCE) support is as follows:
  • MAC-based and IP-based GIDs are supported.
  • RoCEv2 is supported in HWv14 VMs in ESXi 6.7.
  • Guest VLAN Tagging is not supported.
Message Passing Interface (MPI) applications have the following limitations:
  • Intel MPI-based applications cannot be deployed on ESXi 6.5 because VMware does not support SRQs.
  • When running OpenMPI applications, additional parameters must be passed to the mpirun executable:
--mca btl openib,self,sm --mca btl_openib_verbose 1 \
--mca btl_openib_receive_queues P,65536,512,256,256 \
--mca btl_openib_use_eager_rdma 1 \
--mca btl_openib_eager_rdma_threshold 1 \
--mca btl_openib_cpc_include rdmacm --mca orte_base_help_aggregate 0 \
--mca btl_openib_max_inline_data 0
VMware PVRDMA currently supports these Linux distributions for RoCE v1:
  • CentOS 7.2 or later
  • RHEL 7.2 or later
  • SLES 12 SP1 or later
  • Oracle Linux 7 UEKR4 or later
  • Ubuntu LTS Releases 14.04 or later
VMware PVRDMA currently supports these Linux distributions for RoCE v2:
  • CentOS 7.3 or later
  • RHEL 7.3 or later
  • SLES 12SP3 or later
  • Oracle Linux 7 UEKR5 or later
  • Ubuntu LTS Releases 16.04.2 or later

nstalling PVRDMA Support in Linux

Use OFED version 4.8 or above
  • Both the PVRDMA library and driver are part of a larger software called OpenFabrics Enterprise Distribution (OFED) which installs RDMA support in Linux. The OFED software can be downloaded here: http://downloads.openfabrics.org/OFED/. This is the recommended method to install RDMA and PVRDMA support in Linux.
 
Use other open source locations
  • Alternatively, the PVRDMA kernel driver is available in Linux kernel version 4.10 and above.
The PVRDMA library is available through a new set of common libraries called rdma-core: http://github.com/linux-rdma/rdma-core.

Information about RDMA programming is available on the Web. See mellanox.com for a complete Verbs API reference.

New PVRDMA Features in ESXi 6.7
  • Starting with HWv14 VMs in ESX 6.7 PVRDMA device supports Shared Receive Queues (SRQs) and RoCE Protocol Version 2 (RoCEv2).
  • HWv14 PVRDMA device can be selected to operate in RoCEv1 or RoCEv2 mode based on the protocol selected in the UI.
  • Upgrading an HWv13 VM with a PVRDMA device to HWv14 maintains the RoCE mode as v1.
 New Userspace Verbs Supported

 In addition to the list of verbs mentioned above, we support the following verbs in ESXi 6.7 for an HWv14 VM.
 
NameVerb Function
Create Shared Receive Queueibv_create_srq
Destroy Shared Receive Queueibv_destroy_srq
Modify Shared Receive Queueibv_modify_srq
Query Shared Receive Queueibv_query_srq
Post Receive WR to Shared Receive Queueibv_post_recv_srq
Note: SRQs are not supported in the Kernel.