Platform Capacity alert seen in VCF Operations for Networks

Products

VCF Operations for Networks

Issue/Introduction

You are receiving a Platform Capacity alert in the system and would like to know whether you should scale up or scale out your environment.

You see that the usage numbers in one or more parameters displays an alert icon (red triangle with exclamation point) beside the usage numbers and the usage number is higher than capacity:

VMs
Active Flows
All Flows
Metric Points Per Day
Network Rule Count
Application Discovery VMs

For example:

NOTE: VCF Operations for Networks was formerly named Aria Operations for Networks (AON), and prior to that was named vRealize Network Insight (vRNI).

Environment

VCF Operations for Networks

Cause

Your deployed platform node(s) have exceeded their maximum architectural processing limits for VMs, active flows, all flows, metric points per day, network rule count and/or application discovery rules.

VCF Operations for Networks documentation defines the hard capacity limit based on your version, refer to the technical documentation for details:

Resolution

Execute either a Scale-Up or Scale-Out operation for the Platform node(s) based on your version and anticipated growth. Collector brick size(s) should be aligned to platform brick size(s).

Considerations:

Capacity Planning: Evaluate the expected growth of the environment -- do you expect to see expansion in the number of flows in the near to mid future? By planning for needed capacity over the next year, this can help you determine whether it is prudent to Scale-Out rather than Scale-Up to reduce future risk and maintenance.
Scale-Up (Vertical Scaling, XL Brick Size single Platform Node)

Vertical scaling involves expanding the appliance brick size of a single standalone Platform node (e.g., modifying an Extra Large node to a 2 Extra Large node).

- Infrastructure Requirements: The underlying ESXi host must possess sufficient contiguous compute resources to back 100% CPU and 100% Memory reservations for the expanded form factor. Failure to meet reservation limits will prevent the Platform VM from booting or lead to severe performance degradation.
- Operational Impact: Transitioning brick sizes requires a graceful shutdown of the Platform OS. This results in temporary unavailability of the user interface, API endpoints, and real-time flow analytics processing. Collectors will queue data locally until the Platform is restored.
- Capacity Ceiling: Scaling up is constrained by the maximum published single-node limits. The largest supported single node (2 Extra Large) is capped at 3 million active flows and 12 million total flows per day.
- Architectural Simplicity: Retains a single control plane. It avoids the deployment of external load balancers and complex distributed database management.
Scale-Out (Horizontal Scaling, 3 Node XL Platform Cluster)

Horizontal scaling involves deploying additional Platform VMs to form a clustered analytics engine (e.g. minimum of 3 nodes).
- Aggregate Capacity: Distributes the internal Postgres, Cassandra, and analytics microservices across multiple nodes, exponentially increasing the total flow ingestion and processing thresholds beyond single-host physical limits.
- Node Symmetry: All Platform nodes participating in a cluster must be identical in brick size (e.g., all Extra Large) and run the exact same software build version. It's recommended that Collector nodes by the same brick size as the Platform nodes.
- Network Prerequisites: Distributed clustered databases demand rigorous network health. Connectivity between all Platform nodes mandates a strict latency of less than 3ms round-trip time (RTT). Time synchronization (NTP) must be identical across all nodes to prevent database split-brain conditions or data corruption.

Option 1: Scale-Up (Vertical Scaling to 2 Extra Large) Increase the single Platform VM specifications to match a larger brick size based on the System Recommendations and Requirements for your version.

- For v6.14: see technical documentation to Increase the Brick Size of Your Setup and Increase the Disk Size based on System Recommendations and Requirements

- For v9.0: see technical documentation to Increase the Brick Size of Your Setup and Increase the Disk Size based on System Recommendations and Requirements
  
  Note: never expand the disk. Follow the above documentation to add a new disk to increase available space to the recommended size.

Option 2: Scale-Out (Horizontal Scaling to 3-Node Cluster) Deploy additional VMs to form a (larger) cluster based on the System Recommendations and Requirements for your version.

See technical documentation: Create Clusters

- For v6.14: see technical documentation to Create Clusters based on System Recommendations and Requirements
- For v9.0: see technical documentation to Create Clusters based on System Recommendations and Requirements