This article focuses on the vDefend Security Platform's Distributed Intrusion Detection and Prevention (D-IDPS) component. The guidance documented here provides recommended practices for deploying DIDPS in a scalable and performant manner.
NSX-T
What is IDPS?
IDS fundamentally matches patterns for signs of known attacks. vDefend IDPS inspects traffic specifically allowed IN or OUT via a Distributed Firewall policy and subsequently sent to the IDPS engine with a Policy. vDefend Distributed IDPS is a control point situated logically at every Virtual NIC (vNic) of all Virtual Machines (VMs).
Why is it essential to tune IDPS?
A vital aspect of any successful IDPS deployment is ensuring that the appropriate Profile and Policy align with the workload’s security requirements. As with any Security Control, a finite capacity is available for this traffic inspection. Care should be taken to ensure that the traffic sent to the IDPS engine is meaningful and will increase the overall security posture. Tuning should also assist in reducing the number of false positives in the environment.
The vDefend IPDS uses the following architecture:
Resource Requirements
Before enabling D-IDPS on a cluster, evaluate the cluster’s resource utilization using metrics such as Host CPU usage and CPU RDY% time for virtual workloads. Ensuring enough resources to support the IDPS Engine is a crucial first step to ensuring success. Enabling IDPS on a host will consume 2 GB of Host Memory and up to 5 CPU cores for inspecting the traffic. There is also an upper limit of approximately 150K PPS on the dvFilter channel per ESXi host that will limit the amount of traffic that can be effectively sent to the IDPS Engine.
Several alerts and Alarms are designed to raise awareness of resource usage on a Host (ESXi) basis:
Alarms |
Trigger |
Impact |
|
IDPS Engine CPU Oversubscription High IDPS Engine CPU Oversubscription Very High |
75% 90% |
This indicates that IDPS's overall CPU requirement is becoming constrained, and it may have to start dropping or bypassing traffic. |
|
NSX IDPS Engine Memory Usage High NSX IDPS Engine Memory Usage Medium High |
75% 85% |
This indicates that the overall memory required by IDPS is becoming constrained and may have to start dropping or bypassing traffic. |
|
IDPS Engine Network Oversubscription High IDPS Engine Network Oversubscription Very High |
75% 90% |
This indicates that the dvFilter Channel is becoming congested and may have to start dropping or bypassing traffic. |
Other alerts indicate that the system has entered Bypass Mode or Dropping Mode due to resource restrictions. The default action is to bypass when possible. (see the chart below that details the NSX Release version and capabilities)
What are the General recommended practices for a scalable and performant D-IDPS implementation?
When building a D-IDPS policy, you should limit the attack surface using the Distributed Firewall (DFW). Once a base security posture has been established, D-IDPS can be used to enhance it further. Consider the specifically allowed traffic in the DFW policy and choose from this traffic what can and should be further inspected by the IDPS engine.
Below is an example of using a DFW ruleset used to limit the traffic to the Production HR Application:
Next, we send to the IDPS engine the allowed traffic:
This only sends the HTTP and MySQL Traffic to the IDPS engine for analysis.
Bypass of Traffic when Resource Contention Exists
The IDPS Solution has added specific capabilities to address resource contention issues in newer releases. These capabilities involve what happens when a resource becomes overloaded. These resources are the CPU, Memory, and the dvFilter Channel for Network Capacity. The UI groups all three resources together, allowing users to choose Bypass or DROP. This selection is available on a Global level with a per-rule-specific override. Please note that the default is set to Bypass.
The following chart shows the Bypass capabilities of each release:
Version |
CPU |
Memory |
Network Capacity |
3.2.x |
No |
No |
No |
4.0.x |
Yes |
No |
No |
4.1.x |
Yes |
Yes |
No |
4.2.0 |
Yes |
Yes |
Yes * |
*The version of ESXi must also be at least 8.0u3 for the network bypass to work. This bypass works on a per-packet basis.
If the Global setting is moved to Drop, there will be an Impact: In the event of oversubscription, the system will drop packets destined for the IDS Engine. There are no log messages or errors that report this condition.
Best Practices for IDPS Performance
1 Rules
When developing the IDPS rules or Policy, it is often best to start with a specific use case in mind. Avoid enabling IDPS on all traffic at the onset, as this could cause a resource starvation problem exhibited by increased Latency and higher rates of dropped packets.
2 Directionality
Suppose the Source and Destination workloads are virtualized workloads. In that case, inspecting the traffic as it leaves the source and when it arrives at the destination, scanning the same flow twice is possible. If throughput is a concern, consider using a single directionality (IN or OUT) to limit traffic redirected to the IDPS engine. This is done on a per-IDPS Rule basis.
For Server workloads:
IN direction can be sufficient if a proper segmentation policy has been applied to block outbound (external) flows and if all communication is inbound to workloads protected by D=IDPS
To allow outbound (external) communication, IDPS in OUT or IN/OUT direction can be applied in a granular rule
For VDI workloads:
– OUT direction can be sufficient if a segmentation policy has been applied that blocks inbound service access to VDI desktops
Note* Use directionality with care. Directionality is stateful and effectively determines when a flow entry (for redirection to IDPS) is created.
3 Profiles
Profiles are a collection of IDPS signatures applied to an IDPS Policy or Rule. The best practices here are:
What alerts and alarms are available that highlight any issues
Depending on the release of vDefend that is running, several alerts and alarms will be raised to highlight resource capacity issues. The following chart details the NSX Version and the associated capabilities from a ByPass or Drop:
Version |
CPU |
Memory |
Network Capacity |
3.2.x |
Yes |
No |
No |
4.0.x |
Yes |
Yes |
No |
4.1.x |
Yes |
Yes |
No (Alarm Only) |
4.2.0 |
Yes |
Yes |
Yes |
General recommendations for ideal IDS/IPS performance: