book
Article ID: 316474
calendar_today
Updated On:
Issue/Introduction
To maximize performance reliability of your Bitfusion deployment, VMware makes several best-practice recommendations. For more information, also see the VMware Bitfusion 2.0 documentation and the Bitfusion Performance Best Practices guide . The top best practices are:
- Disable virtual NIC coalescing within both the client and Bitfusion Server virtual machines.
- For more information on virtual NIC coalescing in the vmxnet3 adapter, see the “Virtual Network Interrupt Coalescing” and the “Running Network Latency Sensitive Workloads” section of the “Performance Best Practices for VMware vSphere 7.0” guide (pages 47-50 at the time of writing).
- Ensure that client-side Nvidia Cuda libraries and drivers are the correct version. Note: correct versions are enforced on the Bitfusion server appliance.
- Ensure that end-to-end bandwidth between the Bitfusion client(s) and server(s) is at least 10 Gb/sec. If the client is accessing more than 2 GPUs, it may benefit from additional bandwidth.
- This can be tested with iperf, which is an open-source network utility. For more information on iperf, see the iperf documentation.
- Configure ESXi hosts as applicable for latency sensitive workloads. Specifically:
- Disable vSphere HA for Bitfusion servers. As Bitfusion server appliances are hardware-dependent on installed GPUs passed through to them, they are out of scope vSphere HA.
- Disable physical NIC interrupt moderation or coalescing
- Disable virtual NIC LRO (Large Receive Offload)
- Set the VMs to “High” in the “Latency Sensitivity” configuration
- Consider setting client-side VM memory reservations
- For more information on latency-sensitive workload tuning, see the “Performance Best Practices for VMware vSphere 7.0” guide.
Environment
VMware vSphere Bitfusion 2.x