NSX Network Detection and Response

Products

VMware

Issue/Introduction

The Data Node appliance performs two main functions: it indexes NTA records produced by a sensor so they are available for searching and it processes NTA records to generate anomaly events. The indexing function is performed by the Elasticsearch search engine; while the processing function is implemented by an in-house analysis framework.

In general, adding Data Node appliances to an installation will help it handle larger volumes of NTA records across both the indexing and processing functions, since the load is balanced across all available Data Nodes. This KB discusses deployment and sizing considerations for Data Nodes, focusing on its indexing function.

Resolution

Deployment considerations and recommendations

An Elasticsearch cluster is formed comprising all the available Data Node appliances, to support storing NTA records and making them available for search. The Elasticsearch cluster automatically balances the load across the nodes, by redistributing data and routing queries.

About cluster size:

One-node cluster

If the cluster consists of one node, that single node must do everything.
A single node cluster is not resilient.
If the node fails, the cluster will stop working.
Because there are no replicas in a one-node cluster, you cannot store your data redundantly.
Because they are not resilient to any failures, we do not recommend using one-node clusters in production.

Two-node cluster

The client requests will be balanced across both nodes in the cluster.
Two nodes are required for a master election, however, the election will fail if either node is unavailable, therefore the cluster cannot reliably tolerate the loss of either node.

Because it’s not resilient to failures and the fail of one node can lead to a "split brain" situation, we do not recommend deploying a two-node cluster in production.

You might expect that if either node fails then Elasticsearch can elect the remaining node as the master, but
it is impossible to tell the difference between the failure of a remote node and a mere loss of connectivity
between the nodes. If both nodes were capable of running independent elections, a loss of connectivity
would lead to a split-brain problem and therefore data loss. Elasticsearch avoids this and protects the
data by electing neither node as master until that node can be sure that it has the latest cluster state
and that there is no other master in the cluster. This could result in the cluster having no master until
connectivity is restored.
Having no master makes the cluster non operational.

Three-node cluster

Each node is master-eligible so that any two of them can hold a master election without needing to communicate with the third node.
This cluster will be resilient to the loss of any single node.

That is the reason we recommend that any production deployment should default to 3 Data Nodes.

About hardware specifications:

For virtualized appliances in VMware ESXi we recommend 2 x 1TB disks:

https://user.lastline.com/install-manuals/Data_Node_Installation_Manual.html#esxiinstallation

For physical appliances we recommend adding 4 x 2TB disks:

https://user.lastline.com/install-manuals/Data_Node_Installation_Manual.html#hardware

Install 2 or 3 Data Nodes is recommended to balance the storage load and have resiliency, according to our deployment considerations:

https://user.lastline.com/install-manuals/Data_Node_Installation_Manual.html#aboutdatnode

Additional references:
https://www.elastic.co/guide/en/elasticsearch/reference/current/high-availability-cluster-small-clusters.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-cluster.html

Additional Information

Note: This article is applicable to the standalone NSX Network Detection and Response product (formerly Lastline) and is not intended to be applied to the NSX NDR feature of NSX-T.