VCF Operations 9.0 Sizing Guidelines

Products

VCF Operations

Issue/Introduction

This article provides information on using the sizing guidelines for VCF Operations 9.0, to determine the configurations used during installation, or post install.

Notes:

An object in this table represents a basic entity in VCF Operations that is characterized by properties and metrics that are collected from adapter data sources. Example of objects include a virtual machine, a host, a datastore for a VMware vCenter adapter, a storage switch port for a storage devices adapter, an Exchange server, a Microsoft SQL Server, a Hyper-V server, or Hyper-V virtual machine for a Hyperic adapter, and an AWS instance for a AWS adapter.
For other versions, see VMware Aria Operations Sizing Guidelines.

Recommended sizing can be checked by inputting values on Operations Sizing Tool, or by inputting values on the attached spreadsheet.

Environment

VCF Operations 9.0

Resolution

Maximum number of objects in an extended cluster VCF Operations comes in 5 different sizes: Extra Small, Small, Medium, Large, and Extra Large configurations during installation. You can size the environment according to the existing infrastructure to be monitored. After the VMware Cloud Foundation Operations instance outgrows the existing size, you must expand the cluster to add nodes of the same size.

Operations

	VCF Operations Node					Cloud Proxy (CP)		Unified Cloud Proxy (UCP)
	Extra Small	Small	Medium	Large	Extra Large	Small	Standard	Small	Standard
Objects and Metrics
Single-Node maximum objects	700	10,000	30,000	44,000	100,000	16,000 ⁽⁴⁾	80,000 ⁽⁴⁾	16,000⁽⁴⁾	80,000⁽⁴⁾
Single-Node maximum collected metrics ⁽¹⁾	140,000	1,600,000	5,000,000	8,000,000	20,000,000	2,400,000	12,000,000	2,400,000	12,000,000
Maximum number of nodes in a cluster	1	2	8	16	12	1000⁽⁹⁾
Multi-Node maximum objects per node	N/A	6,000	17,000	36,000	88,000	N/A
Multi-Node maximum metrics per node	N/A	1,400,000	4,000,000	6,000,000	15,000,000
Maximum number of objects in a cluster	700	12,000	136,000	576,000	1,056,000
Maximum number of metrics in a cluster	140,000	2,800,00	32,000,000	81,600,000	126,000,000
Maximum number of objects in an extended cluster ⁽²⁾	840	13,200	149,600	633,600	1,161,600
Maximum number of metrics in an extended cluster ⁽²⁾	168,000	3,080,000	35,200,000	89,760,000	138,600,000
Configuration
vCPU	2	4	8	16	24	2	4	4	8
Default Memory (GB)	8	16	32	48	128	8	32	16	48
Maximum Memory (GB) ⁽²⁾	16	32	64	96	256	N/A
vCPU: physical core ratio for nodes ⁽³⁾	1 vCPU to 1 physical core at scale maximums
Network latency ⁽⁵⁾	< 5 ms					< 300 ms
VCF Operations communicates with the components of a VCF instance through the VCF Operations Collector	< 50 ms
Network bandwidth (Mbps) ⁽⁶⁾	N/A					15	60	80	200
Datastore latency	< 10 ms, with possible occasional peaks up to 15 ms
IOPS	See the Sizing Guide Worksheet for details
Disk space	See the Sizing Guide Worksheet for details
Log Forwarder
Maximum logs per second traffic (eps)	N/A							20,000	40,000
Maximum number of connections	N/A							300	600
Other Maximums
Maximum number of Telegraf agents per node	N/A					500	3,000	500	3,000
Maximum number of vCenter adapter instances on a single collector	5	25	50	100	120	25	100	25	100
Maximum number of Service Discovery objects	N/A	3,000
Maximum number of concurrent users per node ⁽⁷⁾	10					N/A
Maximum certified number of concurrent users ⁽⁸⁾	300					N/A
Maximum number of concurrent API calls per client	50
Maximum number of concurrent API calls per node	300

This is the total number of metrics from all adapter instances. To get this number, go to the Administration page and open Audit page.
With maximum memory configuration (Extended Cluster) the maximum number for objects and metrics is 20% more for single-node and 10% more for multi-node.
It is critical to allocate enough CPU for environments running at scale maximums to avoid performance degradation. Refer to the VCF Operations Cluster Node Best Practices .
Based on the VMware vCenter Adapter.
The latency limits are provided between nodes, nodes and Cloud Proxies in Round Trip Time (RTT).
Network bandwidth requirement numbers are provided between nodes and Cloud Proxies working at their respective maximum sizings.
The maximum number of concurrent users is 10 per node with objects or metrics at maximum levels (for example, 16 nodes Large with 200K objects can support 160 concurrent users).
The maximum certified number of concurrent users is achieved on a system configured with the objects and metrics at 50% of supported maximums (For example, 4 nodes Large with 32K object).
The maximum number of Cloud Proxies and Unified Cloud Proxies per cluster node is 100 for Small and Medium configurations, and 200 for Large and Extra-Large configurations.

Sizing Guidelines for VCF Operations Continuous Availability

Continuous Availability (CA) allows the cluster nodes to be stretched across two fault domains, with the ability to experience up to one fault domain failure and to recover without causing cluster downtime. CA requires an equal number of nodes in each fault domain and a witness node, in a third site, to monitor split brain scenarios.

	VCF Operations Node
	Small	Medium	Large	Extra Large
Maximum number of nodes in each Continuous Availability fault-domain ^(*)	1	4	8	6

* Each Continuous Availability cluster must have one Witness node which will require 2 vCPUs and 8GB of Memory.

	Between fault-domains	Between witness node and fault-domains
Latency	< 10ms, with peaks up to 20ms during 20sec intervals	< 30ms, with peaks up to 60ms during 20sec intervals
Packet Loss	Peaks up to 2% during 20sec intervals	Peaks up to 2% during 20sec intervals
Bandwidth	10Gbits/sec	10Mbits/sec

VDI use case

A large node can collect up to 20,000 VCF Operations for Horizon objects when a dedicated Cloud Proxy is used.
A large node can collect up to 20,000 VCF Operations for Published Apps objects when a dedicated Cloud Proxy is used.

Constraints

If you have >1 node, then all nodes must be scaled equally. No mixing of nodes with different sizes.
Snapshots cause slow IO performance and high CPU co-stop values on the disk which degrade the performance of VCF Operations.
With HA configuration, each object is replicated on some node of a cluster, hence the limit for HA based instance is two times less compared to a non HA configuration.
VCF Operations HA supports only one node failure. Avoid single-point-of-failure by putting each node on a different host in the vSphere cluster.
With CA configuration, each object is replicated to a pair of nodes of the cluster, hence the limit for a CA based instance is two times less compared to a non CA configuration.
VCF Operations CA supports up to one fault domain failure. Avoid single-point-of failure by placing fault domains across a stretched vSphere cluster.

Scaling Tips

Use the configuration which has the least number of nodes, but is able to accommodate the number of objects to be monitored.
Example: to monitor 352,000 objects deploy 4 nodes of size Extra Large instead of deploying 10 nodes of size Large. You will save almost half the CPU.

You can increase RAM size instead of increasing both RAM and CPU. This is useful if the number of objects is close to the upper limit. Check that there is enough RAM on the underlying hardware.
Example: a Large size node requires a 48GB of RAM. If the number of objects to be monitored is close to 44,000, the upper limit for a large size node, you can increase node memory to 96 GB. This assumes the underlying ESXi host has more than 96 GB memory per socket.

Scale down CPU configuration. The cluster will perform better if the nodes stay within a single socket (don't cross NUMA boundaries).
Example: up to 4 vCPUs may be reclaimed from Large and Extra Large size nodes if the cluster is not running at the upper limits, and the CPU usage of node VMs is under 60%.

Collectors

The collection process on a node will support adapter instances with the total number of objects not exceeding 700, 10,000, 30,000, 44,000 and 100,000 on Extra Small, Small, Medium, Large and Extra Large multi-node VCF Operations clusters respectively. For example, a 2 node cluster when node size is Small will support a total of 2 times 6,000 objects which makes a total of 12,000 objects. However, in case the adapter instance needs to collect 15,000 objects, the collector running on a Small size node cannot support it, as the maximum object count supported by a small node is 6,000. The solution would be either to use a Cloud Proxy and have the Adapter Instance pinned to it or to scale up the cluster by using a configuration that supports more objects.

Cloud Proxy

Collect from larger vCenter Servers with up to 65000 objects by a scaling up a large Cloud Proxy to 8 vCPU and 32GB of RAM
CPU and Memory configurations can both be doubled to achieve higher collecting objects and metrics.

A vCenter may exceed the supported objects/metrics which are supported by Cloud Proxies, thus the CPU/Memory configuration can be doubled to cover the demand.
Example: If a vCenter Server has 12,000 objects then a Small configuration Cloud Proxy can have it’s CPU and Memory doubled to 4 vCPUs and 16 GB RAM to support those objects.

The number objects and metrics collected by the VCF Operations Application Monitoring (telegraf) agents should be within the supported maximum limits.

Each telegraf agent may collect many objects and metrics directly depending on the services running on the Operating System.
Example: 100 Linux Virtual Machines which have Apache Tomcat configured on them will each bring 10 additional objects adding 1000 objects to the overall count alongside the vCenter objects.

VCF Operations for logs

By default, VCF Operations for logs virtual appliance uses the preset values for all configurations.

Standalone Deployment

You can change the appliance settings to meet the needs of the environment for which you intend to collect logs during deployment.
VCF Operations for logs provides preset VM (virtual machine) sizes that can be selected from to meet the ingestion requirements of your environment. These presets are certified size combinations of compute and disk resources, though you can add extra resources afterward. A small configuration is suitable only for demos.

To size virtual appliances to XL, XXL, and XXXL, see Vertical scaling in Aria Operations for Logs (Formerly vRealize Log Insight) 8.2 And Newer.

	Node type
Preset Size	Small	Medium	Large
Log Ingestion Rate	30 GB/day	75 GB/day	225 GB/day
Virtual CPUs	4	8	16
Memory	8 GB	16 GB	32 GB
IOPS	500	1000	1500
Syslog Connections (Active TCP Connections)	100	250	750
Events per Second	2000	5000	15,000

You can use a syslog aggregator to increase the number of syslog connections through which events are sent to VCF Operations for Logs. However, the maximum number of events per second is fixed and does not depend on the use of a syslog aggregator. A VCF Operations for logs instance cannot be used as a syslog aggregator. The sizing is based on the following assumptions.

Each virtual CPU is at least 2 GHz.
Each ESXi host sends up to 10 messages per second with an average message size of 170 bytes/message, which is roughly equivalent to 150 MB per day, per host.

NOTE

For large installations, you must upgrade the virtual hardware version of the VCF Operations for logs virtual machine. VCF Operations for logs supports virtual hardware version 7 or later. Virtual hardware version 7 can support up to 8 virtual CPUs. Therefore, if you plan to provision 16 virtual CPUs, you must upgrade to virtual hardware version 8 or later for ESXi 7.x. You use the vSphere Client to upgrade the virtual hardware. If you want to upgrade the virtual hardware to the latest version, read and understand the information in the VMware knowledge base article Upgrading a virtual machine to the latest hardware version.

Cluster Deployment

Use the Medium configuration, or larger, for the primary and worker nodes in a VCF Operations for logs cluster. The number of events per second increases linearly with the number of nodes. For example, in a cluster of 18 large nodes (clusters must have a minimum of three nodes), the ingestion for will be 18x15000 making up 270,000 events per second (EPS), or 4 TB of events per day.

Reducing the Memory Size

Use the Small configuration of the appliance in a proof-of-concept or test environment, but not in a production environment.

VCF Operations for logs Sizing Calculator

An estimator to help you determine sizing for VCF Operations for logs including calculation for network bandwidth and storage utilization is also available. This sizing estimator is intended for guidance only. Many environment inputs are site-specific, so the calculator necessarily uses estimations in some areas. See https://vrlisizer.broadcom.com.

NOTE

The overall performance of VCF Operations for logs might degrade if forwarders are defined against the text field with complex or multiple conditions involving regular expressions, for example "text=~"Deleting the machine". In such cases, specifically when the overall load on the cluster is high, performance might be delayed, and disk blocks might accumulate on each node of the cluster.

Configuration Maximums

Item	Maximum
Node Configuration
CPU	16 vCPUs
Memory	32 GB
Storage device (vmdk)	2 TB - 512 bytes
Total addressable storage	6 TB (+ OS drive) A maximum of 6 TB addressable log storage on Virtual Machine Disks (VMDKs) with a maximum size of 2 TB each. ⁽¹⁾
Number of syslog connections per node	750
Cluster Configuration
Nodes	18 (Primary + 17 Workers)
Virtual IP addresses	60
Ingestion
Events per second	15,000 eps per node
Syslog message length	10 KB (text field) per log
Ingestion API HTTP POST request	16 KB (text field); 4 MB per HTTP Post request
Integrations
VCF Operations	1
vCenter	15 per node
VMware SSO	1
Active Directory domains	1
Email servers	1
DNS servers	2
NTP servers	4
Forwarders	10
Index Partition Configuration
Index partitions	10

You can have two 2 TB VMDKs or four 1 TB VMDKs, and so on. When you reach the maximum, you must scale outward with a larger cluster size instead of adding more disks to existing VMs.

NOTE

- For more information on vertical scaling in VCF Operations for logs for CPU and Memory node configurations, see Vertical scaling in Aria Operations for Logs (Formerly vRealize Log Insight) 8.2 And Newer
- From VCF Operations for logs version 8.18 and above, you can increase the number of Log Forwarders. For more information, see Configure "The Number Of Log Forwarders" increased the limit of forwarders up to 20 per cluster.

Attachments

VCFOperationsSizing_9.0.xlsx get_app