When you are unable to create a new Pivotal Cloud Foundry (PCF) RabbitMQ service instance, the following error statement is outputted:
$ cf create-service p-rabbitmq standard deleteme Creating service instance deleteme in org cpeswatorg / space Swat-227 as admin... Service broker error: Put http://rmq.example.com:15672/api/vhosts/c18883d1-d259-48f4-acd1-f8c80cbf7fc8: dial tcp: lookup rmq.example.com on 10.2.3.4:53: dial udp 10.2.3.4:53: socket: too many open files
After upgrading to PCF RabbitMQ tile version 1.15.x, a connection or file descriptor leak is observed in the service broker.
The multi-tenant broker was re-written resulting in a change in the Xenial stemcell where the lower file descriptor limit is 1024.
With a lot of create-service-key requests, this connections will stay back using up all the 1024 file descriptors.
There are two workarounds available as of May 21st, 2019.
1. Restart RabbitMQ broker job with the following command: monit restart rabbitmq-broker
2. Increase the File Descriptor limit for the process used for the broker job. Follow these steps:
a. On the rabbitmq-broker VM, run the command:
b. Run the following command to increase file descriptor limit to 4096:
Note: Use the <pid_of_process> running the rabbitmq-broker job.
3. The following is an alternative way to a to increase the File Descriptor ulimit on the broker VM.
a. ssh
on to the broker VM.
b. Log in as the vcap
user using this command:
sudo su - vcap
c. Find and record the hard and soft File Descriptor ulimit for the vcap
user:
ulimit -Sn
and ulimit -Hn
Find the global file descriptor limit: cat /proc/sys/fs/file-max
d. Gain root privileges:
sudo -i
.
e. Open
/etc/security/limits.conf
f. Add line vcap soft nofile 20000
to
limits.conf
vcap hard nofile 20000
. Both of these limits have to be less than the maximum global limit or they could eventually crash the VM.g. Run reboot
, this will terminate the ssh
shell.
h. Verify that the File Descriptor ulimit increased by logging in as vcap:
sudo su - vcap
Run
ulimit -n
.
Note: These three procedures do not result in a permanent fix.
This issue is permanently resolved in PCF RabbitMQ tile version 1.15.11 and above. Please monitor the Release Notes for PCF RabbitMQ for the latest updates on this issue.