Missing crash event
search cancel

Missing crash event

book

Article ID: 298238

calendar_today

Updated On:

Products

VMware Tanzu Application Service for VMs

Environment

Product Version: 2.10

Resolution

Checklist:

When listing app events, the last crash is missing. Instance uptime is more recent than the last crash in the event list. This behavior is happening on different orgs and applications and also the behavior is seen in both apps manager and cf cli.

Diego Restart / Update

This could be happening because the app didn't actually crash, but was cleanly evicted in the process of updating a Diego Cell. This can be checked following these steps:
 
  1. Search for Exit status 143. This is a SIGTERM sent by Diego to cleanly shutdown the app before it's moved to a different Diego Cell.
  2. Check when the Diego Cell wasvupdated. Some config changes on bosh-deployed VMs don't cause VM restart/recreation, only packages or jobs can be updated inside. One way to check this is to run the following bosh command:
    bosh -d <cf-deployment> ssh diego_cell -c "ps -eo pid,etime,cmd | grep gdn"

    Among the output, you'll see lines like the following:

    diego_cell/86d8d37c-e622-4b6d-92f3-2610528f57bf: stdout |  13438 7-14:46:16 /var/vcap/packages/guardian/bin/gdn --config /var/vcap/jobs/garden/config/config.ini server --uid-map-start=1 --uid-map-length=4294967293 --gid-map-start=1 --gi

    After the process id, the process uptime will show. In this case it is "7-14:46:16", which means 7 days, 14 hours, 46 mins, and 16 seconds. Now you can check the app uptime and see if it's close to the gdn process uptime.


If those 2 checks are positive, then your app didn't crash but was restarted as part of the eviction process when a Diego Cell is updated.
 

Cloud Controller & Diego Out of Sync

Another case that commonly causes this is when the Cloud Controller and Diego get out of sync. There is a KPI that we document, https://docs.pivotal.io/application-service/2-10/overview/monitoring/kpi.html#cc-diego-sync, and when this happens, the status of the KPI should be checked. If it indicates the two are out of sync, that can explain why crash records are not showing up.

If this happens, follow the instructions from the documentation for the KPI:
  1. Check the BBS and Clock Global (Cloud Controller clock) logs.
  2. If the problem continues, pull the BBS logs and Clock Global (Cloud Controller clock) logs and contact VMware Tanzu Support. Indicate that the cf-apps domain is not being kept fresh. Support can help to investigate and resolve the issue.