Understanding VMware Bitfusion Configuration Files
search cancel

Understanding VMware Bitfusion Configuration Files

book

Article ID: 336872

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

The primary Bitfusion executable is controlled by configuration files. There are a few configuration options handled by these files.


Environment

VMware vSphere Bitfusion 2.x

Resolution

There are two configuration files available to Bitfusion:

  • /etc/bitfusion/bitfusion.yaml
  • ~/.bitfusion/bitfusion.yaml

Either of these files can be configured, and both are read during application startup. Both files use the same options.
Note: If both files are present and populated, both will be used to configure the application. The same parameters are defined in both files, the file from the user directory (~/.bitfusion/bitfusion.yaml) will take precedence.

There are three potential configurations options:

  1. Control whether the server should connect through Infiniband RDMA, and the number of hops
  2. Duration after which requested GPU resources are automatically released
  3. Control cache file options and thresholds

 

 A sample configuration file with all available options is:

      rdma:
            enabled: <true|false>
            hops: “<1|2|3>”
      srs:
            auto_release_timeout: “<time in the format of 2m30s>”
      cache_store:
            client_root: "~/cache"
            client_cleanup_threshold_MB: 5120
            server_root: "/var/cache/bitfusion"
            server_cleanup_threshold_MB: 5120

 

 

Additional Information

The number of RDMA hops define RDMA behavior.

  • 1 hop: Local GPU -> Remote GPU
    • This option has the lowest latency but also lower bandwidth
    • Not supported by all hardware
  • ​​​​​​​2 hops: Local GPU -> Local Host -> Remote GPU
    • ​​​​​​​This is the default value
    • Requires Nvidia GPUDirect-compatible adapters
    • Falls back gracefully to 3 hops if not supported.
  • ​​​​​​​3 hops: Local GPU -> Local Host -> Remote Host -> Remote GPU
    • Most compatible and always supported
    • Slower than 1 or 2 hops
    • Not hardware-dependent