The Cloud Controller v3 API has been available in TAS for an extended period and has been exercised by the v7 cf CLI since it was released in TAS 2.10. Scale tests have demonstrated that the v3 API can be up to
three times faster than equivalent calls on the v2 API used in TAS 2.11.
Reason 1 for TAS MySQL CPU spikeThe top-level roles resource is newer to the v3 API (/v3/roles), and thus did not have initially the same real-world validation as endpoints with direct v2 API equivalents. Although this endpoint went through several query optimizations first appearing in TAS 2.13, it was still found to be impactful to foundations with large db tables. The ccdb tables specifically are:
- spaces_auditors
- spaces_developers
- spaces_managers
- spaces_supporters
- organizations_auditors
- organizations_billing_managers
- organizations_managers
- organizations_users
This has since been
patched and is Generally Available starting in CAPI release v1.127.11 which is package with TAS v2.13.16+.
Reason 2 for TAS MySQL CPU spikeApps Manager did not upgrade to fully use the v3 API until TAS 2.13. To provide a comprehensive view of a user's roles, Apps Manager fetches roles that a user has in a given TAS foundation. To keep this data up-to-date, Apps Manager re-fetches the user's roles every thirty seconds. When viewing user roles in each organization or space, Apps Manager will fetch all roles for all the users in that organization or space. These unfiltered queries can result in large response payloads for environments where users have many roles. The polling logic in Apps Manager did not account for cases where the time to fetch roles exceeded the polling interval. In cases like this, a second poll is initiated, even though the first one has not completed yet, compounding to further increase API load. The large numbers of user roles meant that each individual poll from Apps Manager resulted in hundreds of pages of results. Paging through all these results took longer than Apps Manager's polling interval, which resulted in multiple simultaneous fetches from a single instance of Apps Manager resulting in a positive feedback loop of queries that saturated the Cloud Controller's database's CPU, further slowing down queries, which further increased the number of simultaneous requests from Apps Manager.
This was patched via:
- Applying filters to reduce the total number of roles fetched by Apps Manager, thereby speeding up requests and reducing API load.
- Prevent Apps Manager from issuing multiple concurrent requests to the API for the same data. This circuit breaker will help avoid positive feedback loops with negative outcomes.
This patch is Generally Available starting in push-apps-manager-release v676.0.7 which is package with TAS v2.13.13+.
Reason 3 for TAS MySQL CPU spikeAs previously mentioned - Apps Manager did not upgrade to fully use the v3 API until TAS 2.13. This includes other applications within the push-apps-manager-release such as search-server. The search-server application fetches data from TAS MySQL on behalf of AppsManager. When a user clicks the search bar in AppsManager the following takes place:
- AppsManager requests search-server to fetch all organizations, spaces, apps, and service instances.
- Search-server begins a series of CAPI API calls to fetch this data (the following log snippits are cut from a TAS v2.13 environment):
GET /v3/organizations?page=1
GET /v3/organizations?page=2
<continued organizations calls until final page>
GET /v3/spaces?page=1
GET /v3/spaces?page=2
<continued spaces calls until final page>
GET /v3/apps?page=1
GET /v3/apps?page=2
<continued apps calls until final page>
GET /v3/service_instances?page=1
GET /v3/service_instances?page=2
<continued service_instances calls until final page>
- Data is loaded in search-server and made available to AppsManager's search functionality for improved User Experience.
Prior to TAS v2.13 search-server used the v3 CAPI API endpoints for all objects
except for the service instances. For the service instances it used the v2 CAPI API:
GET /v2/service_instances?page=1
GET /v2/service_instances?page=2
Starting in TAS v2.13 search-server began using the v3 CAPI API for service instance objects. The
/v3/service_instances CAPI API endpoint by itself is typically very fast. However, it has been observed when many
/v3/service_instances CAPI calls occur in rapid succession then it can lead to performance degradation and slow queries. This is exactly what happens when clicking the search bar in AppsManager. Search-server application will try to fetch all of the service instances from CAPI and this leads to several simultaneous
/v3/service_instances?page=X calls which may spike TAS MySQL CPU load on foundations with a large amount of service instances (multiple thousands) as it only requests 50 items per page. This github
issue may be related.
This has since been patched by allowing more than 50 items per page from search-server thus greatly reducing the number of concurrent API calls to CAPI. This patch is Generally Available starting in push-apps-manager-release v676.0.11+ which is package with TAS v2.13.20+. This setting is the environment variable API_PER_PAGE on the search-server and apps-manager applications. This API_PER_PAGE environment variable still defaults to 50 but is now configurable up to 5000. At this time, the property has not been exposed as configurable in the tile or platform automation. The plan is to make the default value for this property higher in future TAS releases instead of making it configurable as a property. At this time, the recommendation is to update the environment variable on the search-server and apps-manager applications manually after each push-apps-manager errand run until future TAS versions increase the default value.
ConclusionTAS v2.13.20+ contains vital patches for AppsManager and CAPI performance improvements in foundations with many users, roles, and service instances.
The following metrics for the TAS MySQL VMs are helpful in tracking performance
#mysql
Origin: mysql - Name: /mysql/performance/slow_queries
#system
Origin: system_metrics_agent - Name: system_cpu_user
Origin: system_metrics_agent - Name: system_mem_percent
Origin: system_metrics_agent - Name: system_load_1m
Origin: system_metrics_agent - Name: system_load_5m
Origin: system_metrics_agent - Name: system_load_15m