Rally - WSAPI - Best Practices on filtering and queries.
Updated On:07-04-2020 14:46
CA Agile Central On Premise (Rally), CA Agile Central SaaS (Rally)
This document explains some considerations you may want to take into account when developing your WSAPI programs.
Release: Component: ACSAAS
This article lists the variables that are in-play when making WSAPI queries and should be considered when designing your WSAPI programs.
WSAPI vs LBAPI
1. A practice of exporting all data of certain artifacts, or entire workspace or subscription is quite different than exporting incremental updates of these objects. Not only exporting all data will take longer to perform, but also if used to override previously exported data then only the 'last snapshot' is always preserved and hence there will be no ability to figure out modifications between snapshots or modification dates.
One of the major benefits of our analytics service is that it creates snapshots for every modification.
Lookback API (LBAPI) is a mechanism that allows you to look back into these snapshots and learn of changes that happened overtime. So, for example, if you used LBAPI then you could find all changes that happened between certain dates or certain snapshots. None of that is possible with Web Services API (WSAPI) which holds only the current data. Also, none of that will be possible for a customer who chooses to always export all data and use it to overwrite their previous export.
2. With regard to exporting all data, at any point: To export all that exists currently we can either use WSAPI, or use LBAPI filtering all objects on their 'current' snapshot.
WSAPI's advantage is that it allows to query/export beyond artifacts whereas LBAPI only records artifacts.
LBAPI, however, is better designed for massive queries, it supports much larger page sizes (up to 20000 objects, 10 times larger than WSAPI). If only artifacts are exported then probably better to use LBAPI.
Both mechanisms will run into thresholds at some point where the result count is too large to export. It may result in timeouts, perhaps in partial results returned. Large subscriptions need to consider their requirements and processes, and look to avoid this practice of exporting all data, certainly on a continuous (let alone frequent) basis. We do limit the maximum connections per user per server to 16.
There is no way to predict what the data threshold is. There are too many parameters involved including number of total artifacts, size of data of artifacts, network, clients/applications used, etc. It may even be that large attachments may play a part even if not queried for.
However, as subscriptions grow they will start experiencing longer execution time for their queries as they near this moving 'threshold'.
To help situations of increasing latency less requested data will help. Specifically, Fetch argument.
Do not use 'Fetch=true' in your WSAPI query. 'true' stands for all fields. Hence using this value will export all fields for every requested artifact, this will increase the data volume, possibly by much or too much. Avoiding it will help.
So, use: 'Fetch=<comma separated list of fields>" (for example: "Fetch=Formatted,ID,Name,ObjectID,CustomField_1,CustomField_2, etc etc.. "). The point in specifying fields directly is to:
Avoid unnecessary fields. This includes specifying ALL fields in your query, as it may contain collections, which are separate collections of data. These will add a number of queries, complexity, and data to any API query. Using ‘Fetch=true’ basically pulls all fields, with only a reference to collection fields.
Collection fields are also best avoided if possible as they add data to the request result. Collection fields are usually links between artifacts. Sometimes the collection is 1-1 and sometimes 1-many. 1-many relationships will increase the query latency, so if possible to avoid them it will help. Refer to the WSAPI doc to identify collections.
Paging Smaller page sizes will perform faster as less data is prepared by the server for delivery to the client. WSAPI max page size is '2000'. When experiencing latencies or time-outs then decreasing the page size will help.
There isn't a way to predict by how much to reduce the page-size. Rather, it depends on the amount of data, the subscription size etc - arguments which we don't have exact benchmarks for in the first place.
Reducing the page-size has a cost though. It will require many more runs of that query. So, reducing the page size from '2000' to '200' means that the query needs to run 10 times to retrieve the same amount of data. The best practice here is 'trial and error', simply try out different page sizes and see if they return reliably. So, try page size of '1500', if still not reliable try '1000' etc.. until you get to a page size that seems to perform better.
The 'pagesize' argument goes alongside with the 'start' argument that indicates which page is requested, for example:
start=1,pagesize=200 - returns the first 200 query results. start=201,pagesize=200 - returns the second set of 200 query results, page #2. start=1601,page=200 - returns the 9th set of 200 query results, page #9. start=5501,page=500 - returns the 12th set of 500 query results, page #12.
Filtering Another technique to retrieve all results 'by chunks' is to use sets of filters. The idea here is also to lessen the requested data so it won't timeout, for example: use: ((Name >= "A") and (Name <= "Z")) , then use: ((Name >= "a") and (Name <= "z")) etc..
This example basically suggests running a first query that gets of objects of certain artifact that start with an uppercase, then run a second query that returns all objects of same artifact that start with a lowercase. The idea is to use filters that do not overlap, retrieve data-sets that can be joined together to form the full result-set. It's similar to paging where requesting each page, then combining all pages on the client.
This is a similar idea where you get chunks of your results with each query, then combine them to form your full result-set.
The advantage of this method, though, is that it eases the server performance time as it does not ask for all artifacts and then places them in pages, but here it actually filters the results immediately on the server. This method will most likely run faster.
The caveat with this method is that one should be familiar with data. So, perhaps asking for all objects that start with an uppercase, later ask for all that start with lowercase - perhaps that can work for one endpoint (say, Defect), but not for another (say, UserStory). The reason being is that perhaps user stories have some common convention where most start with 'US'. If most or all result start with a specific prefix, then you'll need to break your queries accordingly, so for example:
((Name >= "USA") and (Name <= "USZ")) , then later: ((Name >= "USa") and (Name <= "USz"))
It can take some effort (possibly for each exported object) to figure out how to break the queries to chunks.
One way to help here, could be to use the 'Artifact' endpoint (instead of each specific artifact), for example:
https://rally1.rallydev.com/slm/webservice/v2.0/artifact?query=((Name >= "A") and (Name <= "Z"))
Another useful filtering method is to use the LastUpdateDate attribute, which is the last update date of an object. It is automatically assigned when an object is created or updated. This is useful when pulling