Description:
Root Cause:
Issue with 3rd Party Software - Vertica Database
When a data partition fails on a single node in a clustered Data Repository server, the Vertica process on that node continues to run (it should shut down). The Data Aggregator continues to transact with this node and receives a large number of transaction failures which prevent the Data Aggregator from loading data into any of the nodes in the cluster, resulting in data loss.
Impact:
This impacts our high availability solution - The reason we went to a clustered Data Repository model is so that when one node fails, we can continue to interact with the remaining nodes and prevent any data loss. It is this smooth failover that does not occur when the problem described above is encountered.
To date, no customers have run into this issue. The current GA version (IM 2.2) uses the same version of Vertica (6.0.2).
Symptoms:
ERROR: Insufficient resources to execute plan on pool general [Timedout waiting for resource request: Request exceeds limits: Memory(KB) Exceeded: Requested = 7274517, Free = 6705162 (Limit = 58273960, Used = 51568798)]
Solution:
Workaround:
Vertica is aware of this issue and provided a script to identify a partition problem and shut down that particular Vertica node.
Instructions: