Java Apps crash with too many open files after upgrading to PAS 2.2 or 2.1.8
search cancel

Java Apps crash with too many open files after upgrading to PAS 2.2 or 2.1.8

book

Article ID: 297545

calendar_today

Updated On:

Products

VMware Tanzu Application Service for VMs

Issue/Introduction

Symptoms:
Dumping the JVM threads will reveal thousands of file-watcher threads.  Note you can dump the threads by ssh'ing into the container and running command "/home/vcap/app/.java-buildpack/open_jdk_jre/bin/jcmd $PID_OF_APP Thread.Print"
"threadName": "file-watcher-/etc/ssl/certs/ca-certificates.crt"
running "lsof -p $PID_OF_APP" from inside the container will show thousands of open sockets
java       19 101 vcap  202u     unix 0x0000000000000000      0t0   9643879 socket

Environment


Cause

This is a known issue with the Security Provider JAVA Buildpack plugin described in https://github.com/cloudfoundry/java-buildpack/issues/486. This problem manifests itself when the application attempts to open an HTTP request to any remote enpoint. For each request the Security Provider Plugin will open a new file handle to watch the $CF_INSTANCE_CERT and $CF_INSTANCE_KEY files. 

We only see this in PAS release 2.2 and 2.1.8 because of a change to kernel parameter "max_user_watches".  The java app previously was not able to leak more sockets than available file handles.  But now that the max user watches is higher than the max open files for an application.  
  • PAS < 2.2, 2,1.8
    • /proc/sys/fs/inotify/max_user_watches = 8192
    • Process Max Open Files = 16384
  • PAS >= 2.2, 2.1.8
    • /proc/sys/fs/inotify/max_user_watches = 2147483647
    • Process Max Open Files = 16384
The problem is more commonly observed when the app implements the apache http client libraries.  The default spring http clients generally do not experience these issues.

Resolution

The fix for this bug is in JAVA Buildpack version 4.5.1 and later. Pivotal Recommends upgrading to the latest Buildpack.  The fix handles the case where the http client library is creating new security contexts for each request.  
If you are experiencing this issue and are unable to upgrade to the latest Buildpack then here is an available workaround to this problem.  One option would be to modify your app to use the Spring HTTP Client library.  Or if you null out the $CF_INSTANCE_CERT and $CF_INSTANCE_KEY the java app will not spin up new file watcher threads for the diego issued client cert and key. 
This workaround will only help if the app does not require Mutual TLS authentication as this change will break mutual TLS.  The reason is  diego will regenerate new a new client cert and key for the app instances every day and without filewatcher reloading the cert and key the java app will be using an expired certificate. 
  • Using cf cli create two environmental variables for the given app
    • cf set-env MYAPP CF_INSTANCE_CERT ""
    • cf set-env MYAPP CF_INSTANCE_KEY ""