How to terminate all running tasks for an application in Tanzu Application Service for VMs
search cancel

How to terminate all running tasks for an application in Tanzu Application Service for VMs

book

Article ID: 298102

calendar_today

Updated On:

Products

VMware Tanzu Application Service for VMs

Issue/Introduction

This article outlines steps for terminating all running tasks for an application.

Note: There are only a few situation where you would want to terminate all running tasks for an application.

For example, one such scenario is where too many tasks overrun the auctioneer's ability to schedule new containers due to a bug in the org task quota limit. In this scenario, a few apps within an org had scheduled jobs that were able to collectively exceed this limit by a large amount. This affected the platform as a whole, as application restarts and restages were affected. In addition, new applications were unable to be pushed. The work around was to terminate all running tasks for these applications before the foundation could be stable again.

This bug has been patched and the fix is available in the following Tanzu Application Service for VMs (TAS for VMs) with CAPI versions:

  • TAS v2.7.36+ with CAPI release v1.84.16+
  • TAS v2.9.24+ with CAPI release v1.90.12+
  • TAS v2.10.16+ with CAPI release v1.95.6+
  • TAS v2.11.4+ with CAPI release v1.109.1+

For more information, refer to Modify quota validation for tasks that exceed the quota limit.
 

"[Bug Fix] Ensure Cloud Controller organization and space quota validations include limit for tasks run against an app that has been exceeded"


Important: It is important to note that the bug doesn't have to be present for the auctioneer to be impacted. It is only mentioned as an example.

There is no limit on the number of tasks that can be ran in each organization. If an organization quota does not impose the quota and if many orgs collectively try to start many tasks, it is possible to hit the auctioneer's max-in-flight-container limit and cause a similar situation. 

By default, the container-max-inflight setting is 200.

For example: 

a. In the cf manifest:
- name: diego_brain
  jobs:
  - name: auctioneer
    release: diego
    properties:
      diego:
        auctioneer:
          starting_container_count_maximum: 200

b. In the Operations Manager UI:

Screen Shot 2021-06-15 at 4.40.09 PM.png

If you have more than 200 containers in flight, you will see the following error in the auctioneer logs:
{"timestamp":"1620229866.015850067","source":"auctioneer","message":"auctioneer.auction.exceeded-max-inflight-container-creation","log_level":1,"data":{"max-inflight":200,"session":"261536","task-guid":"7069f732-bd9f-4edb-9f9c-56d515857ac2"}}

If app tasks is the reason this max-inflight-container-creation limit is being hit, then the following method can be used to identify the org, space, and app names that have the running tasks and terminate them.

Note: This method will terminate the tasks with brute force and they will not be able to complete; however, foundation stability should be returned.

Environment

Product Version: 2.9

Resolution

1. Identify the applications which have tasks running. The following command needs to be run where cfdot exists (such as any Diego VM) and the output is a CSV with the format of OrgName, SpaceName, ApplicationName:

cfdot tasks | jq -r '.env[] | select(.name=="VCAP_APPLICATION") | .value' | jq -r '[ .organization_name, .space_name, .application_name] | @csv' | sort

 

2. Using the cf CLI, target the Org(s)/Spaces(s) identified in the above output and run the following to cancel all running tasks associated with an application. Substitute YOUR_APPLICATION_NAME with the app name that you wish to terminate all tasks for:

for task in `cf tasks YOUR_APPLICATION_NAME | grep RUNNING | awk '{print $1}'`; do cf terminate-task YOUR_APPLICATION_NAME $task; done


For example:
 

diego_brain/734e9294-f703-44f5-aa9b-e16ad2fc611b:~# cfdot tasks | jq -r '.env[] | select(.name=="VCAP_APPLICATION") | .value' | jq -r '[ .organization_name, .space_name, .application_name] | @csv' | sort
"jgainey","playarea","logger_app"
"jgainey","playarea","logger_app"

$ cf t
api endpoint:   https://api.run-06.slot-59.pez.vmware.com
api version:    2.145.0
user:           admin
org:            jgainey
space:          playarea

jgainey@jgainey-a01 : ~/Documents/
$ for task in `cf tasks logger_app | grep RUNNING | awk '{print $1}'`; do cf terminate-task logger_app $task; done
Terminating task 344 of app logger_app in org jgainey / space playarea as admin...
OK

Terminating task 342 of app logger_app in org jgainey / space playarea as admin...
OK