RabbitMQ crash with "Process rabbit_mgmt_external_stats with 0 neighbours crashed with reason: bad argument in call to erlang"
search cancel

RabbitMQ crash with "Process rabbit_mgmt_external_stats with 0 neighbours crashed with reason: bad argument in call to erlang"

book

Article ID: 297356

calendar_today

Updated On:

Products

Support Only for OpenSource RabbitMQ

Issue/Introduction

RabbitMQ crashed with the following error message:
Process rabbit_mgmt_external_stats with 0 neighbours crashed with reason: bad argument in call to erlang
2019-02-12 11:22:31.330 [error] <0.625.0> ** Generic server rabbit_mgmt_external_stats terminating 
** Last message in was emit_update
** When Server state == {state,8192,[{{io_file_handle_open_attempt,count},48837},{{io_file_handle_open_attempt,time},47000},{{io_read,bytes},1},{{io_read,count},3},{{io_read,time},0},{{io_reopen,count},0},{{io_seek,count},1405},{{io_seek,time},0},{{io_sync,count},11039},{{io_sync,time},625999},{{io_write,bytes},12043233},{{io_write,count},11039},{{io_write,time},125000},{{mnesia_disk_tx,count},16},{{mnesia_ram_tx,count},1063},{{msg_store_read,count},0},{{msg_store_write,count},37},{{queue_index_journal_write,count},36303},{{queue_index_read,count},0},{{queue_index_write,count},2}],{set,2,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},{{[],[],[],[],[],[{'rabbit@HQ-PCM-APPMQ03',<0.158.0>}],[],[],[],[],[{'rabbit@HQ-PCM-APPMQ02',<0.134.0>}],[],[],[],[],[]}}},undefined,5000}
** Reason for termination == 
** {badarg,[{erlang,port_command,[#Port<0.118746>,[]],[{file,"erlang.erl"},{line,3042}]},{os,cmd,1,[{file,"os.erl"},{line,242}]},{rabbit_mgmt_external_stats,get_used_fd,1,[{file,"src/rabbit_mgmt_external_stats.erl"},{line,137}]},{rabbit_mgmt_external_stats,get_used_fd,0,[{file,"src/rabbit_mgmt_external_stats.erl"},{line,65}]},{rabbit_mgmt_external_stats,'-infos/2-lc$^0/1-0-',2,[{file,"src/rabbit_mgmt_external_stats.erl"},{line,179}]},{rabbit_mgmt_external_stats,emit_update,1,[{file,"src/rabbit_mgmt_external_stats.erl"},{line,368}]},{rabbit_mgmt_external_stats,handle_info,2,[{file,"src/rabbit_mgmt_external_stats.erl"},{line,355}]},{gen_server,try_dispatch,4,[{file,"gen_server.erl"},{line,616}]}]}
2019-02-12 11:22:31.330 [error] <0.625.0> CRASH REPORT Process rabbit_mgmt_external_stats with 0 neighbours crashed with reason: bad argument in call to erlang:port_command(#Port<0.118746>, []) line 3042 in os:cmd/1 line 242
2019-02-12 11:22:31.330 [error] <0.612.0> Supervisor rabbit_mgmt_agent_sup had child rabbit_mgmt_external_stats started with rabbit_mgmt_external_stats:start_link() at <0.625.0> exit with reason bad argument in call to erlang:port_command(#Port<0.118746>, []) line 3042 in os:cmd/1 line 242 in context child_terminated
2019-02-12 11:22:31.330 [warning] <0.23308.1> Could not find handle.exe, please install from sysinternals
2019-02-12 11:41:29.388 [error] <0.24375.1> CRASH REPORT Process <0.24375.1> with 0 neighbours crashed with reason: bad argument in call to erlang:port_get_data(#Port<0.120533>) in prim_file:sendfile/8 line 590


`rabbit_mgmt_external_stats:get_used_fd/1` retrieves the number of file descriptors used by the node by starting a sub-process (to `handle.exe` on Windows IIRC) and it fails with a `badarg`.

Reference: https://erldocs.com/current/erts/erlang.html?i=0&search=erlang:open#open_port/2 -
"Failure: if the port cannot be opened, the exit reason is badarg, system_limit, or the POSIX error code that most closely describes the error"

A common reason for this issue is running out of file descriptors (Erlang ports).

Reference: http://www.rabbitmq.com/networking.html#open-file-handle-limit - 
"On Windows, the limit for the Erlang runtime is controlled using the ERL_MAX_PORTS environment variable."

The management UI displays the file descriptor limit on the Overview and Node information page.

User-added image

If `handle.exe` is not available, the node cannot find out how many file descriptors it uses, so it will never go into an alarmed state because of the amount of file descriptors.

An example of this can be referenced in rabbitmq-users google group archives: https://groups.google.com/d/msg/rabbitmq-users/xyRned4t4gk/cGpEitHaDQAJ


Environment

Product Version: 3.7
OS: Windows

Resolution

When optimizing for the number of concurrent connections, make sure your system has enough file descriptors to support not only client connections, but also files the node may use.

To calculate a limit, multiply the number of connections per node by 1.5. For example, to support 100,000 connections, set the limit to 150,000. Increasing the limit slightly increases the amount of RAM idle machine use, but this is a reasonable trade-off.