Data Repository host reboots upon restarting down node.

book

Article ID: 5317

calendar_today

Updated On:

Products

CA Infrastructure Management CA Infrastructure Management CA Performance Management - Usage and Administration CA Performance Management - Data Polling

Issue/Introduction

One of the nodes in a 3-node Data Repository cluster will not start.  When the node is restarted it comes up and starts to initialize, but eventually theLinux server reboots and this error is in the vertica.log file 

 EEThread:0x7f07dc130930-a0000000b17641 <PANIC> @v_drdata_node0003: VX001/4064: NewPool::addChunk talloc calloc() error: 'Cannot allocate memory'; size 134217728 

Cause

This usually occurs when a node has been down for a long time. It appears the recovery will try to use more memory then the system can handle. 

Environment

Release: IMDAGG99000-2.8-Infrastructure Management-Data Aggregator
Component:

Resolution

To get the node to start you can set the system to to do a full rebuild instead of trying to rebuild from the time the node went down. The procedure to do this is to go to one of the running nodes and execute the following vsql:

/opt/vertica/bin/vsql -U dauser -w dapass -c "select make_ahm_now(true);" 

Then restart the problem node.