VMware Aria Operations 8.17.1 Sizing Guidelines
search cancel

VMware Aria Operations 8.17.1 Sizing Guidelines

book

Article ID: 324346

calendar_today

Updated On:

Products

VMware Aria Suite

Issue/Introduction

This article provides information on using the sizing guidelines for VMware Aria Operations 8.17.1, to determine the configurations used during installation, or post install.

Notes
  • An object in this table represents a basic entity in VMware Aria Operations that is characterized by properties and metrics that are collected from adapter data sources. Example of objects include a virtual machine, a host, a datastore for a VMware vCenter adapter, a storage switch port for a storage devices adapter, an Exchange server, a Microsoft SQL Server, a Hyper-V server, or Hyper-V virtual machine for a Hyperic adapter, and an AWS instance for a AWS adapter.
  • For VMware Aria Operations (SaaS) (formerly known as vRealize Operations Cloud) sizing, see VMware Aria Operations (SaaS) Sizing Guidelines.
  • For other versions of VMware Aria Operations, see VMware Aria Operations Sizing Guidelines.
Recommended sizing can be checked by inputting values on Operations Sizing Tool, or by inputting values on the attached spreadsheet.

Resolution

By default, VMware offers Extra Small, Small, Medium, Large, and Extra Large configurations during installation. You can size the environment according to the existing infrastructure to be monitored. After the VMware Aria Operations instance outgrows the existing size, you must expand the cluster to add nodes of the same size.

  VMware Aria Operations Node Cloud Proxy (CP)
Extra Small Small Medium Large Extra Large Small Standard
Objects and Metrics
Single-Node Maximum Objects 350 5,000 15,000 22,000 50,000 8,000 (4) 40,000 (4)
Single-Node Maximum Collected Metrics (1) 70,000 800,000 2,500,000 4,000,000 10,000,000 1,200,000 6,000,000
Maximum number of nodes in a cluster 1 2 8 16 12 200
Multi-Node Maximum Objects Per Node N/A 3,000 8,500 18,000 44,000 N/A
Multi-Node Maximum Metrics Per Node 700,000 2,000,000 3,000,000 7,500,000
Maximum number of objects in a cluster 350 6,000 68,000 288,000 528,000
Maximum number of metrics in a cluster 70,000 1,400,000 16,000,000 40,800,000 63,000,000
Maximum number of objects in a extended cluster (2) N/A 6,600 74,800 316,800 580,800
Maximum number of metrics in a extended cluster (2) 1,540,000 17,600,000 44,880,000 69,300,000
Configuration
vCPU 2 4 8 16 24 2 4
Default Memory (GB) 8 16 32 48 128 8 32
Maximum Memory (GB) (2) N/A 32 64 96 256 N/A
vCPU: physical core ratio for data nodes (3) 1 vCPU to 1 physical core at scale maximums
Network latency (5) < 5 ms < 500 ms
Network latency for agents (to VMware Aria Operations node or CP) (5) < 20 ms
Network bandwidth (Mbps) (6) N/A 15 60
Datastore latency < 10 ms, with possible occasional peaks up to 15 ms
IOPS See the Sizing Guide Worksheet for details
Disk Space See the Sizing Guide Worksheet for details
Other maximums
Maximum number of telegraf agents per node N/A 500 3,000
Maximum number of vCenter on a single collector 5 25 50 100 120 25 100
Maximum number of the Service Discovery objects N/A 3,000
Maximum number of concurrent users per node (7) 10 N/A
Maximum certified number of concurrent users (8) 300
Maximum number of concurrent API calls per client 50
Maximum number of concurrent API calls per node 300
  1. This is the total number of metrics from all adapter instances. To get this number, go to the Administration page and open Audit page.
  2. With maximum memory configuration (Extended Cluster) the maximum number for objects and metrics is 20% more for single-node and 10% more for multi-node.
  3. It is critical to allocate enough CPU for environments running at scale maximums to avoid performance degradation. Refer to the VMware Aria Operations Cluster Node Best Practices .
  4. Based on the VMware vCenter adapter.
  5. The latency limits are provided between nodes, nodes and Cloud Proxies in Round Trip Time (RTT).
  6. Network bandwidth requirement numbers are provided between nodes and Cloud Proxies working at their respective maximum sizings.
  7. The maximum number of concurrent users is 10 per node with objects or metrics at maximum levels (For example, 16 nodes Large with 200K objects can support 160 concurrent users).
  8. The maximum certified number of concurrent users is achieved on a system configured with the objects and metrics at 50% of supported maximums (For example, 4 nodes Large with 32K object).


Sizing guidelines for VMware Aria Operations Continuous Availability

Continuous Availability (CA) allows the cluster nodes to be stretched across two fault domains, with the ability to experience up to one fault domain failure and to recover without causing cluster downtime. CA requires an equal number of nodes in each fault domain and a witness node, in a third site, to monitor split brain scenarios.

  VMware Aria Operations Node
  Small Medium Large Extra Large
Maximum number of nodes in each Continuous Availability fault-domain (*) 1 4 8 6

* Each Continuous Availability cluster must have one Witness node which will require 2 vCPUs and 8GB of Memory.

  Between fault-domains Between witness node and fault-domains
Latency < 10ms, with peaks up to 20ms during 20sec intervals < 30ms, with peaks up to 60ms during 20sec intervals
Packet Loss Peaks up to 2% during 20sec intervals Peaks up to 2% during 20sec intervals
Bandwidth 10Gbits/sec 10Mbits/sec

VDI use case

  • A large node can collect up to 20,000 VMware Aria Operations for Horizon objects when a dedicated Cloud Proxy is used.
  • A large node can collect up to 20,000 VMware Aria Operations for Published Apps objects when a dedicated Cloud Proxy is used.

Constraints

  • Extra small configuration is designed for test environments and POC, we do not recommend to scale up an extra small node horizontally.
  • If you have >1 node, then all nodes must be scaled equally. No mixing of nodes with different sizes.
  • Snapshots impact performance. Snapshots on the disk causes slow IO performance and high CPU co-stop values which degrades the performance of VMware Aria Operations.
  • In HA, each object is replicated in some nodes of a cluster, hence the limit for HA based instance is two times less compare to non HA.
  • VMware Aria Operations HA supports only one node failure. Avoid single-point-of-failure by putting each nodes into different hosts in the vSphere cluster.
  • In CA, each object is replicated in paired nodes of a cluster, hence the limit for a CA based instance is two times less compared to non CA.
  • VMware Aria Operations CA supports up to one fault domain failure. Avoid single-point-of failure by placing fault domains across a stretched vSphere cluster.

Scaling Tips

  • Scale up vertically (adding more vCPU/Memory), not horizontally (adding more nodes).
Use the configuration which has the least number of nodes.
Example: For 180,000 objects, deploy as 4 Extra Large nodes instead of 12 Large nodes. You will save half the CPU.
  • You can increase RAM size instead of increasing both RAM and CPU.
This is useful if the number of objects is close to the upper limit. Check that there is enough RAM on the underlying hardware.
Example: Large node has 48GB and the number of objects are closed to 20,000. You can increase up to 96 GB. This assumes the underlying ESXi has >96 GB per socket.
  • Scale down CPU configuration.
The cluster will perform better if the nodes stay within a single socket (don't cross NUMA boundaries).
Example: Reclaim up to 4 vCPUs from the Large and Extra Large node VMs if the cluster is not running at the upper limits, and the CPU usage in node VMs is less than 60%.

Collectors

The collection process on a node will support adapter instances where the total number of objects is not more than 3,000 8,500 18,000 and 44,000 on small, medium, large, and extra large multi-node VMware Aria Operations clusters respectively. For example, a 4-node system of medium nodes will support a total of 34,000 objects. However, if an adapter instance needs to collect 12,000 objects, a collector that runs on a medium node cannot support that as a medium node can only handle 8,500 objects. In this situation, you can add a large cloud proxy and pin the adapter instance to the cloud proxy or scale up by using a configuration that supports more objects.

Attachments

OperationsSizing_8.17.1 get_app