ARP cache limit reached on large PCF foundation
search cancel

ARP cache limit reached on large PCF foundation

book

Article ID: 293703

calendar_today

Updated On:

Products

Operations Manager

Issue/Introduction

ARP cache limit on Xenial Ubuntu stemcell may be insufficient on large PCF foundations (600+ VMs).

Symptoms:
  • 'monit summary' Failed with following error _ "monit: error connecting to the monit daemon"_
  • From monit.log. Cannot open a connection to the mailserver 'localhost:2825' -- Connection timed out_ BOSH
  • VMs intermittently reports failing/unresponsive VMs.
  • dig doppler.service.cf.internal
    ../../../../lib/isc/unix/socket.c:2104: internal_send: 169.254.0.2#53: Invalid argument
    ../../../../lib/isc/unix/socket.c:2104: internal_send: 19.13.0.246#53: Invalid argument
    ; <<>> DiG 9.10.3-P4-Ubuntu <<>> doppler.service.cf.internal


Environment

Product Version: 2.8

Resolution

Ubuntu Xenial stemcells have a default ARP cache limit was set to 1024. This may be insufficient in large environments with 600+ instances.

You can double size of ARP cache limit with commands such as:
sysctl -w net.ipv4.neigh.default.gc_thresh2=1024
sysctl -w net.ipv4.neigh.default.gc_thresh3=2048
Note: Settings changed by sysctl will be overwritten when the VM is recreated.

gc_thresh2 will make gc more aggressive kicks while bumping gc_thresh3 to 2048 will increase overall limit.

os-conf is the mechanism for arbitrary linux tuning and persisting these settings to BOSH deployed VMs, see:
https://github.com/cloudfoundry/os-conf-release

For an example of tuning a kernel parameter with os-conf, please see: 
https://community.pivotal.io/s/article/how-to-update-tcp-keepalive-parameters-in-vms-in-pivotal-cloud-foundry?language=en_US