VMware Cloud Facility (VCF) Operations 8.18 Sizing Guidelines
search cancel

VMware Cloud Facility (VCF) Operations 8.18 Sizing Guidelines

book

Article ID: 369262

calendar_today

Updated On:

Products

VCF Operations/Automation (formerly VMware Aria Suite)

Issue/Introduction

This article provides information on using the sizing guidelines for VCF Operations 8.18 to determine configurations for installation or post-installation.

Notes

  • An object in this table represents a basic entity in Aria Operations, characterized by properties and metrics collected from adapter data sources. Examples of objects include a virtual machine, a host, a datastore for a VMware vCenter adapter, a storage switch port for a storage device adapter, an Exchange server, a Microsoft SQL Server, a Hyper-V server, or a Hyper-V virtual machine for a Hyperic adapter, and an AWS instance for an AWS adapter.
  • For other versions of VMware Aria Operations, see VMware Aria Operations Sizing Guidelines.

Recommended sizing can be checked by inputting values on Operations Sizing Tool, or by inputting values on the attached spreadsheet.

Environment

VCF Operations 8.18

Resolution

By default, VMware offers Extra Small, Small, Medium, Large, and Extra Large configurations during installation. You can size the environment according to the existing infrastructure to be monitored. After the VMware Aria Operations instance outgrows its current size, you must expand the cluster by adding nodes of the same size.

 VMware Aria Operations NodeCloud Proxy (CP)
Extra SmallSmallMediumLargeExtra LargeSmallStandard
Objects and Metrics
Single-Node Maximum Objects70010,00030,00044,000100,0008,000 (4)40,000 (4)
Single-Node Maximum Collected Metrics (1)140,0001,600,0005,000,0008,000,00020,000,0001,200,0006,000,000
Maximum number of nodes in a cluster1281612200
Multi-Node Maximum Objects Per NodeN/A6,00017,00036,00088,000N/A
Multi-Node Maximum Metrics Per Node1,400,0004,000,0006,000,00015,000,000
Maximum number of objects in a cluster70012,000136,000576,0001,056,000
Maximum number of metrics in a cluster140,0002,800,00032,000,00081,600,000126,000,000
Maximum number of objects in a extended cluster (2)84013,200149,600633,6001,161,600
Maximum number of metrics in a extended cluster (2)188,0003,080,00035,200,00089,760,000138,600,000
Configuration
vCPU248162424
Default Memory (GB)8163248128832
Maximum Memory (GB) (2)16326496256N/A
vCPU: physical core ratio for data nodes (3)1 vCPU to 1 physical core at scale maximums
Network latency (5)< 5 ms< 500 ms
Network latency for agents (to VMware Aria Operations node or CP) (5)< 20 ms
Network bandwidth (Mbps) (6)N/A1560
Datastore latency< 10 ms, with possible occasional peaks up to 15 ms
IOPSSee the Sizing Guide Worksheet for details
Disk SpaceSee the Sizing Guide Worksheet for details
Other maximums
Maximum number of telegraf agents per nodeN/A5003,000
Maximum number of vCenter on a single collector5255010012025100
Maximum number of the Service Discovery objectsN/A3,000
Maximum number of concurrent users per node (7)10N/A
Maximum certified number of concurrent users (8)300
Maximum number of concurrent API calls per client50
Maximum number of concurrent API calls per node300
  1. This is the total number of metrics from all adapter instances. To get this number, go to the Administration page and open the Audit page.
  2. With the maximum memory configuration (Extended Cluster), the maximum number of objects and metrics is 20% higher for single-node and 10% higher for multi-node.
  3. It is critical to allocate enough CPU for environments running at scale maximums to avoid performance degradation. Refer to the VMware Aria Operations Cluster Node Best Practices.
  4. Based on the VMware vCenter adapter.
  5. The latency limits are provided between nodes, nodes, and Cloud Proxies in Round Trip Time (RTT).
  6. Network bandwidth requirements are provided between nodes and Cloud Proxies, assuming they are operating at their respective maximum sizes.
  7. The maximum number of concurrent users is 10 per node with objects or metrics at maximum levels (For example, 16 nodes Large with 200K objects can support 160 concurrent users).
  8. The maximum certified number of concurrent users is achieved on a system configured with the objects and metrics at 50% of the supported maximums (For example, 4 nodes Large with 32K objects).


Sizing guidelines for VMware Aria Operations Continuous Availability

Continuous Availability (CA) allows the cluster nodes to be stretched across two fault domains, enabling them to experience up to one fault-domain failure and recover without causing cluster downtime. CA requires an equal number of nodes in each fault domain and a witness node in a third site to monitor split-brain scenarios.

 VMware Aria Operations Node
 SmallMediumLargeExtra Large
Maximum number of nodes in each Continuous Availability fault-domain (*)1486

* Each Continuous Availability cluster must have one Witness node, which will require 2 vCPUs and 8 GB of Memory.

 Between fault-domainsBetween witness node and fault-domains
Latency< 10ms, with peaks up to 20ms during 20sec intervals< 30ms, with peaks up to 60ms during 20sec intervals
Packet LossPeaks up to 2% during 20sec intervalsPeaks up to 2% during 20sec intervals
Bandwidth10Gbits/sec10Mbits/sec

 

VDI use case

  • A large node can collect up to 20,000 VMware Aria Operations for Horizon objects when a dedicated Cloud Proxy is used.
  • A large node can collect up to 20,000 VMware Aria Operations for Published Apps objects when a dedicated Cloud Proxy is used.

Constraints

  • If you have >1 node, then all nodes must be scaled equally. No mixing of nodes with different sizes.
  • Snapshots impact performance. Snapshots on the disk causes slow IO performance and high CPU co-stop values which degrades the performance of VMware Aria Operations.
  • In HA, each object is replicated in some nodes of a cluster, hence the limit for HA based instance is two times less compare to non HA.
  • VMware Aria Operations HA supports only one node failure. Avoid single-point-of-failure by putting each nodes into different hosts in the vSphere cluster.
  • In CA, each object is replicated in paired nodes of a cluster, hence the limit for a CA based instance is two times less compared to non CA.
  • VMware Aria Operations CA supports up to one fault domain failure. Avoid single-point-of failure by placing fault domains across a stretched vSphere cluster.

Scaling Tips

  • Scale up vertically (adding more vCPU/Memory), not horizontally (adding more nodes).
Use the configuration with the fewest nodes.
Example: For 352,000 objects, deploy as 4 Extra Large nodes instead of 10 Large nodes. You will save almost half the CPU.
  • You can increase RAM size instead of increasing both RAM and CPU.
This is useful if the number of objects is close to the upper limit. Check that there is enough RAM on the underlying hardware.
Example: Large node has 48 GB, and the number of objects is close to 44,000. You can increase up to 96 GB. This assumes the underlying ESXi has >96 GB per socket.
  • Scale down CPU configuration.
The cluster will perform better if the nodes stay within a single socket (don't cross NUMA boundaries).
Example: Reclaim up to 4 vCPUs from the Large and Extra Large node VMs if the cluster is not at its upper limits and CPU usage in the node VMs is less than 60%.
 

Collectors

The collection process on a node supports adapter instances with up to 700, 10,000, 30,000, 44,000, and 100,000 objects on extra small, small, medium, large, and extra large multi-node VMware Aria Operations clusters, respectively. For example, a 2-node system of small nodes will support a total of 20,000 objects. However, if an adapter instance needs to collect 15,000 objects, a collector running on a small node cannot support it, as a small node can only handle 10,000 objects. In this situation, you can add a large cloud proxy and pin the adapter instance to it, or scale up using a configuration that supports more objects.

Cloud Proxy

  • Collect from larger vCenter Servers with up to 65000 objects by scaling up a large Cloud Proxy to 8 vCPU and 32GB of RAM.
  • CPU and Memory configurations can both be doubled to achieve a higher collection of objects and metrics.
    • A vCenter may exceed the supported objects/metrics that are supported by Cloud Proxies; thus, the CPU/Memory configuration can be doubled to meet demand.
    • Example: If a vCenter Server has 12,000 objects, then a Small configuration Cloud Proxy can have its CPU and Memory doubled to 4 vCPUs and 16 GB RAM to support those objects.
  • The number of objects and metrics collected by the VMware Aria Application Monitoring (telegraf) agents should be within the supported maximum limits.
    • Each telegraf agent may collect many objects and metrics directly, depending on the services running on the Operating System.
    • Example: 100 Linux Virtual Machines with Apache Tomcat configured will each add 10 additional objects, bringing the overall count to 1000, including the vCenter objects.

Attachments

VCFoperationssizing_8.18_Updated.xlsx get_app