After upgrading Tanzu Platform for Cloud Foundry Windows to version 10.2.9, or following a Windows stemcell upgrade, applications fail to start.
The following error is observed in the application logs:
[CELL/0] ERR Failed after xx.xx.xx: startup health check never passed.
[HEALTH/0] [ERR] instance proxy failed to start: Timed out after 2m0s (60 attempts) waiting for startup check to succeed: failed to make TCP connection to <IP>:<PORT>: timed out after 1.00 seconds
This issue typically occurs during a cf push, application restage, or when a stemcell upgrade triggers a container restart.
The error indicates that the readiness health check performed by the Diego cell failed to receive a successful response from the application within the configured timeout period.
In version 10.2.9, changes to the Diego cell or underlying networking logic may result in stricter port-probing. If an application takes longer to bind to its port than the default 60-second timeout, or if the platform attempts to probe multiple ports that are not actively listening, the instance is marked as unhealthy and terminated.
To resolve this, change the health check type from port (the default) to process and increase the startup timeout. This allows the platform to verify that the Windows process is running without waiting for a specific network response that may be delayed during initialization.
Add or update the following properties in your manifest.yml:
applications:
- name: my-windows-app
health-check-type: process
timeout: 180 # Increase timeout to 180 secondsIf you prefer to apply the change to an existing application without a manifest update:
cf set-health-check APP_NAME processcf push APP_NAME -t 180
Review breaking change reference below.
https://techdocs.broadcom.com/us/en/vmware-tanzu/platform/elastic-application-runtime/10-2/eart/breaking-changes.html under section
"Change: Apps are no longer accessible via the Diego Cell IP and Diego Cell host port by default"
If changing the health check type from port (the default) to process is not acceptable. And also increasing timeout to 180 seconds continue to fail. Check for cybersecurity software like Crowdstrike if installed.
Uninstalling Crowdstrike resolve the health check type of port failure in one instance.