Rally - WSAPI: Data Extract Recommendations

search cancel

Rally - WSAPI: Data Extract Recommendations

book

Article ID: 127874

calendar_today

Updated On: 10-10-2023

Products

Rally On-Premise Rally SaaS

Issue/Introduction

Several Rally (Agile Central) customers would like to leverage the API to extract data from Rally into other tools. There is sometimes the desire to use a 3rd-party analytics tool to create custom reports or other information.

Rally's API does allow for extraction of data, but there are several caveats, tips and tricks for doing so effectively.

Environment

Release:
Component: ACSAAS

Resolution

WSAPI vs LBAPI

1. A practice of exporting all data of certain artifacts, or entire workspace or subscription is quite different than exporting incremental updates of these objects. Not only exporting all data will take longer to perform, but also if used to override previously exported data then only the 'last snapshot' is always preserved and hence there will be no ability to figure out modifications between snapshots or modification dates.

One of the major benefits of our analytics service is that it creates snapshots for every modification.

Lookback API (LBAPI) is a mechanism that allows you to look back into these snapshots and learn of changes that happened overtime. So, for example, if you used LBAPI then you could find all changes that happened between certain dates or certain snapshots. None of that is possible with Web Services API (WSAPI) which holds only the current data. Also, none of that will be possible for a customer who chooses to always export all data and use it to overwrite their previous export.

2. With regard to exporting all data, at any point:
To export all that exists currently we can either use WSAPI, or use LBAPI filtering all objects on their 'current' snapshot.

WSAPI's advantage is that it allows to query/export beyond artifacts whereas LBAPI only records artifacts.

LBAPI, however, is better designed for massive queries, it supports much larger page sizes (up to 20000 objects, 10 times larger than WSAPI). If only artifacts are exported then probably better to use LBAPI.

See Web Services API documentation at https://rally1.rallydev.com/slm/doc/webservice/
See Lookback API documentation at https://rally1.rallydev.com/analytics/doc/#/manual

Both mechanisms will run into thresholds at some point where the result count is too large to export. It may result in timeouts, perhaps in partial results returned. Large subscriptions need to consider their requirements and processes, and look to avoid this practice of exporting all data, certainly on a continuous (let alone frequent) basis.

There is no way to predict what the data threshold is. There are too many parameters involved including number of total artifacts, size of data of artifacts, network, clients/applications used, etc. It may even be that large attachments may play a part even if not queried for.

However, as subscriptions grow they will start experiencing longer execution time for their queries as they near this moving 'threshold'.

Recommendations

Refer to the Web Services API documentation.

To help situations of increasing latency less requested data will help. Specifically,

Fetch argument

Do not use 'Fetch=true' in your WSAPI query. 'true' stands for all fields. Hence using this value will export all fields for every requested artifact, this will increase the data volume, possibly by much or too much. Avoiding it will help.
So, use: 'Fetch=<comma separated list of fields>" (for example: "Fetch=Formatted,ID,Name,ObjectID,CustomField_1,CustomField_2, etc etc.. "). The point in specifying fields directly is to:
Avoid unnecessary fields. This includes specifying ALL fields in your query, as it may contain collections, which are separate collections of data. These will add a number of queries, complexity, and data to any API query. Using ‘Fetch=true’ basically pulls all fields, with only a reference to collection fields.
Collection fields are also best avoided if possible as they add data to the request result. Collection fields are usually links between artifacts. Sometimes the collection is 1-1 and sometimes 1-many. 1-many relationships will increase the query latency, so if possible to avoid them it will help. Refer to the WSAPI doc to identify collections.

Paging

Smaller page sizes will perform faster as less data is prepared by the server for delivery to the client. WSAPI max page size is '2000'. When experiencing latencies or time-outs then decreasing the page size will help.

There isn't a way to predict by how much to reduce the page-size. Rather, it depends on the amount of data, the subscription size etc - arguments which we don't have exact benchmarks for in the first place.

Reducing the page-size has a cost though. It will require many more runs of that query. So, reducing the page size from '2000' to '200' means that the query needs to run 10 times to retrieve the same amount of data. The best practice here is 'trial and error', simply try out different page sizes and see if they return reliably. So, try page size of '1500', if still not reliable try '1000' etc.. until you get to a page size that seems to perform better.

The 'pagesize' argument goes alongside with the 'start' argument that indicates which page is requested, for example:

start=1,pagesize=200 - returns the first 200 query results.
start=201,pagesize=200 - returns the second set of 200 query results, page #2.
start=1601,page=200 - returns the 9th set of 200 query results, page #9.
start=5501,page=500 - returns the 12th set of 500 query results, page #12.

Such an example could be:

https://rally1.rallydev.com/slm/webservice/v2.0/hierarchicalrequirement?fetch=ObjectID,FormattedID,Name,CreationDate,CreatedBy,LastUpdateDate,Milestones,Owner,Project,FlowState,FlowStateChangedDate,LastBuild,LastRun,PassingTestCaseCount,ScheduleState,ScheduleStatePrefix,TestCaseCount,Package,AcceptedDate,Blocked,Blocker,DefectStatus,Defects,DirectChildrenCount,HasParent,InProgressDate,Iteration,Parent,PlanEstimate,Release,Risks,TestCaseStatus,TestCases,c_BusinessPriority,c_BusinessValue,c_ClassofService,c_Component,Feature,c_MerchantCustomer,c_MPGKanbanState,c_PriorityField,c_ServersRequested,c_StoryType&start=1&pagesize=200

or

https://rally1.rallydev.com/slm/webservice/v2.0/hierarchicalrequirement?fetch=ObjectID,FormattedID,Name,CreationDate,CreatedBy,LastUpdateDate,Milestones,Owner,Project,FlowState,FlowStateChangedDate,LastBuild,LastRun,PassingTestCaseCount,ScheduleState,ScheduleStatePrefix,TestCaseCount,Package,AcceptedDate,Blocked,Blocker,DefectStatus,Defects,DirectChildrenCount,HasParent,InProgressDate,Iteration,Parent,PlanEstimate,Release,Risks,TestCaseStatus,TestCases,c_BusinessPriority,c_BusinessValue,c_ClassofService,c_Component,Feature,c_MerchantCustomer,c_MPGKanbanState,c_PriorityField,c_ServersRequested,c_StoryType&start=201&pagesize=200

Filtering

Another technique to retrieve all results 'by chunks' is to use sets of filters. The idea here is also to lessen the requested data so it won't timeout, for example: use: ((Name >= "A") and (Name <= "Z")) , then use: ((Name >= "a") and (Name <= "z")) etc..

This example basically suggests running a first query that gets of objects of certain artifact that start with an uppercase, then run a second query that returns all objects of same artifact that start with a lowercase. The idea is to use filters that do not overlap, retrieve data-sets that can be joined together to form the full result-set. It's similar to paging where requesting each page, then combining all pages on the client.

This is a similar idea where you get chunks of your results with each query, then combine them to form your full result-set.

The advantage of this method, though, is that it eases the server performance time as it does not ask for all artifacts and then places them in pages, but here it actually filters the results immediately on the server. This method will most likely run faster.

The caveat with this method is that one should be familiar with data. So, perhaps asking for all objects that start with an uppercase, later ask for all that start with lowercase - perhaps that can work for one endpoint (say, Defect), but not for another (say, UserStory). The reason being is that perhaps user stories have some common convention where most start with 'US'. If most or all result start with a specific prefix, then you'll need to break your queries accordingly, so for example:

((Name >= "USA") and (Name <= "USZ")) , then later:
((Name >= "USa") and (Name <= "USz"))

It can take some effort (possibly for each exported object) to figure out how to break the queries to chunks.

One way to help here, could be to use the 'Artifact' endpoint (instead of each specific artifact), for example:

https://rally1.rallydev.com/slm/webservice/v2.0/artifact?query=((Name >= "A") and (Name <= "Z"))

Another useful filtering method is to use the LastUpdateDate attribute, which is the last update date of an object. It is automatically assigned when an object is created or updated. This is useful when pulling

https://rally1.rallydev.com/slm/webservice/v2.0/hierarchicalrequirement?pagesize=2000&fetch=ObjectID,formattedID,Name&query=(LastUpdateDate > “2018-06-19”)

Summary

Assuming the user must use WSAPI to export all data, recommendations are to:

First, use specific fetch fields and don't request what's unnecessary.
Second, use paging. It will take trial & error to see what page size may reliably perform, then they'll need to figure out how many times it needs to run and if it's at all 'doable'. If yes - great.
Third, add filters to the queries and design a 'pull' mechanism that will combine filters and paging to essentially combine and put together all data they needed.

Feedback

thumb_up Yes

thumb_down No