Failed to make HTTP request to '/actuator/health' on port 8080: timed out after 1 seconds
search cancel

Failed to make HTTP request to '/actuator/health' on port 8080: timed out after 1 seconds

book

Article ID: 298328

calendar_today

Updated On:

Products

VMware Tanzu Application Service for VMs

Issue/Introduction

App crashes and get restarted because health check to /actuator/health times out but there does not seem to be anything wrong in app logic.

Environment

Product Version: 2.11

Resolution

There are two main reasons this can happen.

  1. The actuator/health endpoint makes a complex check that takes into consideration different resources or health contributors, such as bound services availability, disk space, LDAP server availability, etc.  If any of the services is down, it will fail to respond within one second, reaching the health check time out.

  2. Too many health contributors (such as many bound services) can make final system health take more than one second to be determined, even if all of them are actually up and running.

Both scenarios will make Diego consider that particular instance to be unhealthy. As a result, Diego stops and deletes the app instance, then reschedules a new app instance. 

In order to solve this, check /actuator/health. If it's difficult to get a response from that endpoint because the app is continuously crashing and restarting, you can change the health check endpoint temporarily to / or any other working endpoint, or even to port healthcheck type so the app can start.

If any of the health contributors (E.g: a bound service) is showing as "DOWN", you would be in the first scenario and you need to check why that service is failing. If your app is not fully dependant on the failing service you can temporarily disable that particular health check so at least app is accessible. In order to do that, check the "management.health.*" properties and set the right property to false. E.g if the failing service is shown under "db" component, set "management.health.db.enabled" to false.  

If all of the health contributors are showing as "UP", you would be in second scenario and then you would need to set the --invocation-timeout flag in "cf set-health-check" to be bigger so the health check has time to verify all health contributors.