VCF Operations 9.0 Sizing Guidelines
search cancel

VCF Operations 9.0 Sizing Guidelines

book

Article ID: 397782

calendar_today

Updated On:

Products

VCF Operations

Issue/Introduction

This article provides information on using the sizing guidelines for VCF Operations 9.0, to determine the configurations used during installation, or post install.

Notes

  • An object in this table represents a basic entity in VCF Operations that is characterized by properties and metrics that are collected from adapter data sources. Example of objects include a virtual machine, a host, a datastore for a VMware vCenter adapter, a storage switch port for a storage devices adapter, an Exchange server, a Microsoft SQL Server, a Hyper-V server, or Hyper-V virtual machine for a Hyperic adapter, and an AWS instance for a AWS adapter.
  • For other versions, see VMware Aria Operations Sizing Guidelines.

Recommended sizing can be checked by inputting values on Operations Sizing Tool, or by inputting values on the attached spreadsheet.

Environment

VCF Operations 9.0

Resolution

Maximum number of objects in an extended cluster VCF Operations comes in 5 different sizes: Extra Small, Small, Medium, Large, and Extra Large configurations during installation. You can size the environment according to the existing infrastructure to be monitored. After the VMware Cloud Foundation Operations instance outgrows the existing size, you must expand the cluster to add nodes of the same size.



Operations



 

VCF Operations Node

Cloud Proxy
(CP)

Unified Cloud Proxy (UCP)

Extra Small

Small

Medium

Large

Extra Large

Small

Standard

Small

Standard

Objects and Metrics

Single-Node maximum objects

700

10,000

30,000

44,000

100,000

16,000 (4)

80,000 (4)

16,000(4)

80,000(4)

Single-Node maximum collected metrics (1)

140,000

1,600,000

5,000,000

8,000,000

20,000,000

2,400,000

12,000,000

2,400,000

12,000,000

Maximum number of nodes in a cluster

1

2

8

16

12

1000(9)

Multi-Node maximum objects per node

 

 

 

N/A

6,000

17,000

36,000

88,000

 

 

 

 

 

 

 

 

 

 

 

 

 

N/A

Multi-Node maximum metrics per node

1,400,000

4,000,000

6,000,000

15,000,000

Maximum number of objects in a cluster

700

12,000

136,000

576,000

1,056,000

Maximum number of metrics in a cluster

140,000

2,800,00

32,000,000

81,600,000

126,000,000

Maximum number of objects in an extended cluster (2)

840

13,200

149,600

633,600

1,161,600

Maximum number of metrics in an extended cluster (2)

168,000

3,080,000

35,200,000

89,760,000

138,600,000

Configuration

vCPU

2

4

8

16

24

2

4

4

8

Default memory (GB)

8

16

32

48

128

8

32

16

48

Maximum memory (GB) (2)

16

32

64

96

256

N/A

vCPU: physical core ratio for nodes (3)

1 vCPU to 1 physical core at scale maximums

Network latency (5)

< 5 ms

< 500 ms

Network latency for agents to nodes/Cloud Proxies (5)

< 20 ms

Network latency between nodes/Cloud Proxies and endpoints

< 50 ms

Network bandwidth (Mbps) (6)

N/A

15

60

80

200

Datastore latency

< 10 ms, with possible occasional peaks up to 15 ms

IOPS

See the Sizing Guide Worksheet for details

Disk space

See the Sizing Guide Worksheet for details

Log Forwarder

Maximum logs per second traffic (eps)

N/A

20,000

40,000

Maximum number of connections

N/A

300

600

Other Maximums

Maximum number of Telegraf agents per node

N/A

500

3,000

500

3,000

Maximum number of vCenter adapter instances on a single collector

5

25

50

100

120

25

100

25

100

Maximum number of Service Discovery objects

N/A

3,000

Maximum number of concurrent users per node (7)

10

N/A

Maximum certified number of concurrent users (8)

300

Maximum number of concurrent API calls per client

50

Maximum number of concurrent API calls per node

300

 

  1. This is the total number of metrics from all adapter instances. To get this number, go to the Administration page and open Audit page.
  2. With maximum memory configuration (Extended Cluster) the maximum number  for objects and metrics is 20% more for single-node and 10% more for multi-node.
  3. It is critical to allocate enough CPU for environments running at scale maximums to avoid performance degradation. Refer to the VCF Operations Cluster Node Best Practices .
  4. Based on the VMware vCenter Adapter.
  5. The latency limits are provided between nodes, nodes and Cloud Proxies in Round Trip Time (RTT).
  6. Network bandwidth requirement numbers are provided between nodes and Cloud Proxies working at their respective maximum sizings.
  7. The maximum number of concurrent users is 10 per node with objects or metrics at maximum levels (for example, 16 nodes Large with 200K objects can support 160 concurrent users).
  8. The maximum certified number of concurrent users is achieved on a system configured with the objects and metrics at 50% of supported maximums (For example, 4 nodes Large with 32K object).
  9. The maximum number of Cloud Proxies and Unified Cloud Proxies per cluster node is 100 for Small and Medium configurations, and 200 for Large and Extra-Large configurations.



Sizing Guidelines for VCF Operations Continuous Availability


Continuous Availability (CA) allows the cluster nodes to be stretched across two fault domains, with the ability to experience up to one fault domain failure and to recover without causing cluster downtime.  CA requires an equal number of nodes in each fault domain and a witness node, in a third site, to monitor split brain scenarios.


 

VCF Operations Node

 

Small

Medium

Large

Extra Large

Maximum number of nodes in each Continuous Availability fault-domain (*)

1

4

8

6


* Each Continuous Availability cluster must have one Witness node which will require 2 vCPUs and 8GB of Memory.


 

Between fault-domains

Between witness node and fault-domains

Latency

< 10ms, with peaks up to 20ms during 20sec intervals

< 30ms, with peaks up to 60ms during 20sec intervals

Packet Loss

Peaks up to 2% during 20sec intervals

Peaks up to 2% during 20sec intervals

Bandwidth

10Gbits/sec

10Mbits/sec

  

 

VDI use case

  • A large node can collect up to 20,000 VCF Operations for Horizon objects when a dedicated Cloud Proxy is used.
  • A large node can collect up to 20,000 VCF Operations for Published Apps objects when a dedicated Cloud Proxy is used. 



Constraints

  • If you have >1 node, then all nodes must be scaled equally. No mixing of nodes with different sizes.
  • Snapshots cause slow IO performance and high CPU co-stop values on the disk which degrade the performance of VCF Operations.
  • With HA configuration, each object is replicated on some node of a cluster, hence the limit for HA based instance is two times less compared to a non HA configuration.
  • VCF Operations HA supports only one node failure. Avoid single-point-of-failure by putting each node on a different host in the vSphere cluster.
  • With CA configuration, each object is replicated to a pair of nodes of the cluster, hence the limit for a CA based instance is two times less compared to a non CA configuration.
  • VCF Operations CA supports up to one fault domain failure.  Avoid single-point-of failure by placing fault domains across a stretched vSphere cluster.




Scaling Tips

  • Use the configuration which has the least number of nodes, but is able to accommodate the number of objects to be monitored.
    Example: to monitor 352,000 objects deploy 4 nodes of size Extra Large instead of deploying 10 nodes of size Large.  You will save almost half the CPU.
  • You can increase RAM size instead of increasing both RAM and CPU. This is useful if the number of objects is close to the upper limit. Check that there is enough RAM on the underlying hardware.
    Example: a Large size node requires a 48GB of RAM. If the number of objects to be monitored is close to 44,000, the upper limit for a large size node, you can increase node memory to 96 GB. This assumes the underlying ESXi host has more than 96 GB memory per socket.
  • Scale down CPU configuration. The cluster will perform better if the nodes stay within a single socket (don't cross NUMA boundaries).
    Example: up to 4 vCPUs may be reclaimed from Large and Extra Large size nodes if the cluster is not running at the upper limits, and the CPU usage of node VMs is under 60%.

 

Collectors

The collection process on a node will support adapter instances with the total number of objects not exceeding 700, 10,000, 30,000, 44,000 and 100,000 on Extra Small, Small, Medium, Large and Extra Large multi-node VCF Operations clusters respectively. For example, a 2 node cluster when node size is Small will support a total of 2 times 6,000 objects which makes a total of 12,000 objects. However, in case the adapter instance needs to collect 15,000 objects, the collector running on a Small size node cannot support it, as the maximum object count supported by a small node is 6,000.  The solution would be either to use a Cloud Proxy and have the Adapter Instance pinned to it or to scale up the cluster by using a configuration that supports more objects.

 

Cloud Proxy

  • Collect from larger vCenter Servers with up to 65000 objects by a scaling up a large Cloud Proxy to 8 vCPU and 32GB of RAM
  • CPU and Memory configurations can both be doubled to achieve higher collecting objects and metrics.
    • A vCenter may exceed the supported objects/metrics which are supported by Cloud Proxies, thus the CPU/Memory configuration can be doubled to cover the demand.
    • Example: If a vCenter Server has 12,000 objects then a Small configuration Cloud Proxy can have it’s CPU and Memory doubled to 4 vCPUs and 16 GB RAM to support those objects.
  • The number objects and metrics collected by the VCF Operations Application Monitoring (telegraf) agents should be within the supported maximum limits.
    • Each telegraf agent may collect many objects and metrics directly depending on the services running on the Operating System.
    • Example: 100 Linux Virtual Machines which have Apache Tomcat configured on them will each bring 10 additional objects adding 1000 objects to the overall count alongside the vCenter objects.

 

 


VCF Operations for logs

By default, VCF Operations for logs virtual appliance uses the preset values for all configurations.

Standalone Deployment

You can change the appliance settings to meet the needs of the environment for which you intend to collect logs during deployment.
VCF Operations for logs provides preset VM (virtual machine) sizes that can be selected from to meet the ingestion requirements of your environment. These presets are certified size combinations of compute and disk resources, though you can add extra resources afterward. A small configuration is suitable only for demos.

To size virtual appliances to XL, XXL, and XXXL, see Vertical scaling in Aria Operations for Logs (Formerly vRealize Log Insight) 8.2 And Newer.

 

Node type

Preset Size

Small Medium Large

Log Ingestion Rate

30 GB/day 75 GB/day 225 GB/day

Virtual CPUs

4 8 16

Memory

8 GB 16 GB 32 GB
IOPS 500 1000 1500
Syslog Connections (Active TCP Connections) 100 250 750
Events per Second 2000 5000 15,000

 

You can use a syslog aggregator to increase the number of syslog connections through which events are sent to VCF Operations for Logs. However, the maximum number of events per second is fixed and does not depend on the use of a syslog aggregator. A VCF Operations for logs instance cannot be used as a syslog aggregator. The sizing is based on the following assumptions.

  • Each virtual CPU is at least 2 GHz.

  • Each ESXi host sends up to 10 messages per second with an average message size of 170 bytes/message, which is roughly equivalent to 150 MB per day, per host.


NOTE

For large installations, you must upgrade the virtual hardware version of the VCF Operations for logs virtual machine. VCF Operations for logs supports virtual hardware version 7 or later. Virtual hardware version 7 can support up to 8 virtual CPUs. Therefore, if you plan to provision 16 virtual CPUs, you must upgrade to virtual hardware version 8 or later for ESXi 7.x. You use the vSphere Client to upgrade the virtual hardware. If you want to upgrade the virtual hardware to the latest version, read and understand the information in the VMware knowledge base article Upgrading a virtual machine to the latest hardware version.


Cluster Deployment

Use the Medium configuration, or larger, for the primary and worker nodes in a VCF Operations for logs cluster. The number of events per second increases linearly with the number of nodes. For example, in a cluster of 18 large nodes (clusters must have a minimum of three nodes), the ingestion for will be 18x15000 making up 270,000 events per second (EPS), or 4 TB of events per day.


Reducing the Memory Size

Use the Small configuration of the appliance in a proof-of-concept or test environment, but not in a production environment.


VCF Operations for logs Sizing Calculator

An estimator to help you determine sizing for VCF Operations for logs including calculation for network bandwidth and storage utilization is also available. This sizing estimator is intended for guidance only. Many environment inputs are site-specific, so the calculator necessarily uses estimations in some areas. See https://vrlisizer.broadcom.com.

NOTE

The overall performance of VCF Operations for logs might degrade if forwarders are defined against the text field with complex or multiple conditions involving regular expressions, for example "text=~"Deleting the machine". In such cases, specifically when the overall load on the cluster is high, performance might be delayed, and disk blocks might accumulate on each node of the cluster.

 


Configuration Maximums

Item

Maximum

Node Configuration

 

CPU

16 vCPUs

Memory

32 GB

Storage device (vmdk)

2 TB - 512 bytes

Total addressable storage

6 TB (+ OS drive) A maximum of 6 TB addressable log storage on Virtual Machine Disks (VMDKs) with a maximum size of 2 TB each. (1)

Number of syslog connections per node 

750

Cluster Configuration

 

Nodes

18 (Primary + 17 Workers)

Virtual IP addresses

60

Ingestion

 

Events per second

15,000 eps per node

Syslog message length

10 KB (text field) per log

Ingestion API HTTP POST request 

16 KB (text field); 4 MB per HTTP Post request

Integrations

 

VCF Operations

1

vCenter

15 per node

VMware SSO

1

Active Directory domains

1

Email servers

1

DNS servers

2

NTP servers

4

Forwarders

10

Index Partition Configuration

Index partitions

10

  1. You can have two 2 TB VMDKs or four 1 TB VMDKs, and so on. When you reach the maximum, you must scale outward with a larger cluster size instead of adding more disks to existing VMs.

NOTE

Attachments

VCFOperationsSizing_9.0.xlsx get_app