By default, VMware offers Extra Small, Small, Medium, Large, and Extra Large configurations during installation. You can size the environment according to the existing infrastructure to be monitored. After the vRealize Operations instance outgrows the existing size, you must expand the cluster to add nodes of the same size.
| vRealize Operations Node | Remote Collector (RC) |
Extra Small | Small | Medium | Large | Extra Large | Standard | Large |
Configuration |
vCPU | 2 | 4 | 8 | 16 | 24 | 2 | 4 |
Default Memory (GB) | 8 | 16 | 32 | 48 | 128 | 4 | 16 |
Maximum Memory Configuration (GB) | N/A | 32 | 64 | 96 | N/A | 8 | 32 |
vCPU: Physical core ratio for data nodes (*) | 1 vCPU to 1 physical core at scale maximums |
Network latency for data nodes (******) | < 5 ms |
Network latency for remote collectors (******) | < 200 ms |
Network latency for agents (to vRealize Operations node or RC) (******) | < 20 ms |
Network bandwith (Mbps) (*******) | N/A | 25 | 80 |
Datastore latency | < 10 ms, with possible occasional peaks up to 15 ms |
IOPS | See the attached Sizing Guidelines worksheet for details. |
Disk Space |
Objects and Metrics |
Single-Node Maximum Objects | 350 | 5,000 | 15,000 | 20,000 | 45,000 | 6,000 (****) | 32,000 (****) |
Single-Node Maximum Collected Metrics (**) | 70,000 | 800,000 | 2,500,000 | 4,000,000 | 10,000,000 | 1,200,000 | 6,500,000 |
Maximum number of nodes in a cluster | 1 | 2 | 8 | 16 | 8 | 60 | 60 |
Multi-Node Maximum Objects Per Node (***) | N/A | 3,000 | 8,500 | 16,500 | 40,000 | N/A | N/A |
Multi-Node Maximum Collected Metrics Per Node (***) | N/A | 700,000 | 2,000,000 | 3,000,000 | 7,500,000 |
Maximum Objects for the configuration with the maximum supported number of nodes (***) | 350 | 6,000 | 68,000 | 200,000 | 320,000 |
Maximum Metrics for the configuration with the maximum supported number of nodes(***) | 70,000 | 1,400,000 | 16,000,000 | 37,500,000 | 45,000,000 |
End Point Operations agent |
Maximum number of agents per node | 100 | 300 | 1,200 | 2,500 | 2,500 | 250 | 2,000 |
vRealize Application Remote Collector (*****) telegraf agent |
Maximum number of agents per node | 100 | 500 | 1,500 | 3,000 | 4,000 | 250 | 2,500 |
Network latency for Application Remote Collector (to vRealize Operations node or RC) (*******) | < 10ms |
- * It is critical to allocate enough CPU for environments running at scale maximums to avoid performance degradation. Refer to the vRealize Operations Manager Cluster Node Best Practices in the vRealize Operations Manager 8.1 Help for more guidelines regarding CPU allocation.
- ** This is the total number of metrics from all adapter instances. To get this number, go to the Administration page, expand History and generate an Audit Report.
- *** In large configurations with more than 8 nodes, the maximum metrics and objects have been reduced to permit some head room. This adjustment is accounted for in the calculations.
- **** Based on the VMware vCenter adapter.
- ***** vRealize Application Remote Collector discovers applications running on Virtual Machines and collects run-time metrics of the operating system and applications.
- ****** The latency limits are provided in Round Trip Time (RTT).
- ******* Network bandwidth requirement numbers are provided for Remote Collectors working at their respective maximum sizings.
Sizing guidelines for vRealize Operations Continuous Availability
Continuous Availability (CA) allows the cluster nodes to be stretched across two fault domains, with the ability to experience up to one fault domain failure and to recover without causing cluster downtime. CA requires an equal number of nodes in each fault domain and a witness node, in a third site, to monitor split brain scenarios.
| vRealize Operations Node |
---|
| Small | Medium | Large | Extra Large |
---|
Maximum number of nodes in each Continuous Availability fault-domain (*) | 1 | 4 | 8 | 5(**) |
* Each Continuous Availability cluster must have one Witness node which will require 2 vCPUs and 8GB of Memory.
** 10 Extra Large nodes are supported only in a Continuous Availability cluster.
| Between fault-domains | Between witness node and fault-domains |
---|
Latency | < 10ms, with peaks up to 20ms during 20sec intervals | < 30ms, with peaks up to 60ms during 20sec intervals |
Packet Loss | Peaks up to 2% during 20sec intervals | Peaks up to 2% during 20sec intervals |
Bandwidth | 10Gbits/sec | 10Mbits/sec |
Important Notes
- The sizing guides are version specific, please use the sizing guide based on the vRealize Operations version you are planning to deploy.
- An object in this table represents a basic entity in vRealize Operations that is characterized by properties and metrics that are collected from adapter data sources. Example of objects include a virtual machine, a host, a datastore for a VMware vCenter adapter, a storage switch port for a storage devices adapter, an Exchange server, a Microsoft SQL Server, a Hyper-V server, or Hyper-V virtual machine for a Hyperic adapter, and an AWS instance for a AWS adapter.
Other Maximums
Maximum number of remote collectors | 60 |
Maximum number of vCenter adapter instances | 120 |
Maximum number of vCenter on a single collector | 100 |
Maximum number of concurrent users per node (*) | 10 |
Maximum number of certified concurrent users (**) | 300 |
Maximum number of the vRealize Application Remote Collector telegraf agents | 10,000 |
Maximum number of the End Point Operations agents | 10,000 |
Maximum number of the Service Discovery objects | 3,000 |
* The maximum number of concurrent users is 10 per node with objects or metrics at maximum levels (For example, 16 nodes Large with 200K objects can support 160 concurrent users).
** The maximum number of concurrent users is achieved on a system configured with the objects and metrics at 50% of supported maximums (For example, 4 nodes Large with 32K object).
VDI use case
- A large node can collect up to 20,000 vRealize Operations for Horizon objects when a dedicated remote collector is used.
- A large node can collect up to 20,000 vRealize Operations for Published Apps objects when a dedicated remote collector is used.
Constraints
- Extra small configuration is designed for test environments and POC, we do not recommend to scale up an extra small node horizontally.
- If you have >1 node, then all nodes must be scaled equally. No mixing of nodes with different sizes.
- Snapshots impact performance. Snapshots on the disk causes slow IO performance and high CPU co-stop values which degrades the performance of vRealize Operations.
- In HA, each object is replicated in some nodes of a cluster, hence the limit for HA based instance is two times less compare to non HA.
- vRealize Operations HA supports only one node failure. Avoid single-point-of-failure by putting each nodes into different hosts in the vSphere cluster.
- In CA, each object is replicated in paired nodes of a cluster, hence the limit for a CA based instance is two times less compared to non CA.
- vRealize Operations CA supports up to one fault domain failure. Avoid single-point-of failure by placing fault domains across a stretched vSphere cluster.
Scaling Tips
- Scale up vertically (adding more vCPU/Memory), not horizontally (adding more nodes).
Use the configuration which has the least number of nodes.
Example: For 180000 objects, deploy as 4 Extra Large nodes instead of 12 Large nodes. You will save half the CPU.
- You can increase RAM size instead of increasing both RAM and CPU.
This is useful if the number of objects is close to the upper limit. Check that there is enough RAM on the underlying hardware.
Example: Large node has 48GB and the number of objects are closed to 20000. You can increase up to 96 GB. This assumes the underlying ESXi has >96 GB per socket.
- Scale down CPU configuration.
The cluster will perform better if the nodes stay within a single socket (don't cross NUMA boundaries).
Example: Reclaim up to 4 vCPUs from the Large and Extra Large node VMs if the cluster is not running at the upper limits, and the CPU usage in node VMs is less than 60%.
Collectors
The collection process on a node will support adapter instances where the total number of objects is not more than 3,000, 8,500, 16,500 and 40,000 on small, medium, large and extra large multi-node vRealize Operations clusters respectively. For example, a 4-node system of medium nodes will support a total of 34,000 objects. However, if an adapter instance needs to collect 12,000 objects, a collector that runs on a medium node cannot support that as a medium node can only handle 8,500 objects. In this situation, you can add a large remote collector and pin the adapter instance to the remote collector or scale up by using a configuration that supports more objects.
Telegraf agents using vRealize Application Remote Collector
vRealize Application Remote Collector Maximums
| Small | Medium | Large |
vCPU | 4 | 8 | 16 |
Default Memory (GB) | 8 | 16 | 24 |
Maximum Number of supported telegraf agents | 500 | 3,000 | 6,000 |
Note: vRealize Application Remote Collector can be configured for many vCenters, although each of those vCenters can only be monitored by one Application Remote Collector.
If you have more than 6K telegraf agents increase vCPU and Memory of the Large configuration to monitor up to 10K telegraf agents.
Consider having 20% free vCPU while installing 1K telegraf agents simultaneously as the vCPU usage of vROps VM will be increased 10-18%.
The increase of memory usage depends on the number of services and their configurations on the monitored VM. Memory usage will be increased at least 1-1.5 GB when monitoring 1K OS objects.