Application running on TAS is slow, performing poorly, experiencing high latency and/or decreased throughput
search cancel

Application running on TAS is slow, performing poorly, experiencing high latency and/or decreased throughput

book

Article ID: 297470

calendar_today

Updated On:

Products

VMware Tanzu Application Service for VMs

Issue/Introduction

A application deployed to Tanzu Application Service is running slow or performing poorly, like when requests have higher than expected latency or when request throughput is lower than expected, this may be a result of the application being under provisioned or with one or more of the Foundation's Diego Cells being over loaded.

Resolution

When application performance issues crop up, things like increased latency, decreased throughput or slow requests, the first thing to try is to run the CPU entitlements cf cli plugin against the app while the app is under load.

This plugin works on foundations running PAS/TAS 2.5+. It will look at metrics emitted from the foundation and tell you what percent of your application’s entitled or guaranteed CPU is currently being used and what percent is being used on average.

Background

Entitled CPU is based on the CPU shares assigned to an app which is based on its memory limit. A value of 0-100% indicates you’re less than or equal to the CPU you are entitled to. If you see 100%+, it means you’re using more CPU than your entitled or bursting.

Bursting is OK if it happens occasionally. It’s not OK when it happens often. If you see an average reported as 80-100%, that’s cause for concern. If it’s reporting as 100%+, that’s a definite sign that remediation is necessary or the application’s performance could suffer.

This situation can impact performance because it means the app is depending on the ability to burst above its entitled CPU shares. In the best case, when it can burst, the app runs fine. If there is CPU contention from another app like because a Cell is busy, the app will not burst and will get less CPU time, which means performance will suffer.

Please be aware of this possibility, as it’s difficult to detect. There is often a faulty assumption that Cell CPU usage will be 100% or that there will be an unacceptable CPU load avg on the Cell before application performance is impacted. That’s not true. Performance can suffer prior to those conditions manifesting if an application depends on being able to burst above it's CPU limits.

As an example, if you have two apps with equal CPU shares running on a Cell that has four CPUs, one app may use all four CPUs but only if the other app is doing nothing. If both apps need the CPU at the same time then you’ll end up with two CPUs each. The latter case is cutting CPU access in half just because the app cannot burst and consume all four CPUs. The example will get worse, as more apps land on a Cell and apps outnumber CPUs. At the end of the day, the busier a Cell is the less likely an individual app will be able to burst and consume spare CPU cycles, thus the more likely that an app that depends on being able to burst will see a performance impact.

Please keep in mind that the behavior documented in this article isn't a bad thing. It's the expected behavior. Diego & the Linux kernel are doing what was asked of them and preventing one application from consuming more than it's fair share of CPU time. In addition, your application will always be guaranteed at least it's minimum entitled amount of CPU, so for proper performance tuning, you need to tune based on the entitled amount of CPU, not what it can burst to.

Remediations

The underlying issue in this scenario is that the application's CPU demand is higher than what can be serviced by it's CPU limits. Fortunately, there are a handful of remediations.
  1. You can reduce CPU consumption in the app. i.e. make the app ore efficient.
  2. Increase CPU shares i.e. increase the memory limit
  3. Add more application instances, i.e. spread out the work
  4. Restart the app, although this requires some luck to land on a less busy Cell and may only be temporary as app workloads can shift at any time.
  5. Add more CPUs to your Cells. This won't lower CPU usage reported by the plugin, it just provides more CPU capacity for bursting above CPU limits.