Reducing requested fields
One method to reduce the requested data is the proper use of the Fetch argument. Do not use 'Fetch=true' in your WSAPI query. A 'true' value requests data for all fields. Hence using this value will export all fields for every requested artifact, this will increase the data volume, possibly by too much. Instead, it is recommended to explicitly request the fields that contain your required data by using: 'Fetch=<comma separated list of fields>"
Example:
fetch=Formatted,ID,Name,ObjectID,CustomField_1,CustomField_2
The point in specifying fields directly is to avoid unnecessary fields. This includes specifying ALL fields in your query, as it may contain collections, which are separate collections of data. These will add a number of queries, complexity, and data to any API query. Using ‘Fetch=true’ basically pulls all fields, with only a reference to collection fields. Collection fields are also best avoided if possible as they add data to the request result. Collection fields are usually links between artifacts. Sometimes the collection is 1-1 and sometimes 1-many. 1-many relationships will increase the query latency, so if possible to avoid them it will help. Refer to the WSAPI doc to identify collections.
Shallow Fetch
When requesting fields in the fetch lines we will recursively fetch those attribute values from child objects as well. For example, when fetching Defects and specifying "fetch=Name,Project" you may not require the name values on child objects of the Defect like Projects. In some cases, this can result in performance degradation since values from child objects must also be hydrated.
Utilizing shallowFetch will prevent the API from recursively fetching the Name attribute on the Project object when you only wanted the Name values from the parent work item.
Example:
https://rally1.rallydev.com/slm/webservice/v2.x/defect?shallowFetch=Name,State,Project
Filtering
Another technique to retrieve all results 'by chunks' is to use sets of filters. The idea here is also to lessen the requested data so it won't timeout, for example: use: ((Name >= "A") and (Name <= "M")) , then use: ((Name >= "N") and (Name <= "Z")) etc..
This example basically suggests running a first query that gets of objects of a certain artifact that start with an uppercase, then run a second query that returns all objects of same artifact that start with a lowercase. The idea is to use filters that do not overlap, retrieve datasets that can be joined together to form the full result-set. It's similar to paging where requesting each page, then combining all pages on the client.
This is a similar idea where you get chunks of your results with each query, then combine them to form your full result-set.
The advantage of this method, though, is that it eases the server performance time as it does not ask for all artifacts and then places them in pages, but here it actually filters the results immediately on the server. This method will most likely run faster.
The caveat with this method is that one should be familiar with data. So, perhaps asking for all objects that start with an uppercase, later ask for all that start with lowercase - perhaps that can work for one endpoint (say, Defect), but not for another (say, UserStory). The reason being is that perhaps user stories have some common convention where most start with 'US'. If most or all result start with a specific prefix, then you'll need to break your queries accordingly, so for example:
((Name >= "USA") and (Name <= "USZ"))
then with a follow-up request:
((Name >= "USa") and (Name <= "USz"))
It can take some effort (possibly for each exported object) to figure out how to break the queries to chunks.
One way to help here could be to use the 'Artifact' endpoint (instead of each specific artifact), for example:
https://rally1.rallydev.com/slm/webservice/v2.0/artifact?query=((Name >= "A") and (Name <= "Z"))
Another useful filtering method is to use the LastUpdateDate attribute, which is the last update date of an object. It is automatically assigned when an object is created or updated. By storing the last date time stamp that your integration was last run, you can use that to populate the LastUpdateDate query and only fetch recently changed data.
Example:
https://rally1.rallydev.com/slm/webservice/v2.0/hierarchicalrequirement?pagesize=2000&fetch=ObjectID,formattedID,Name&query=(LastUpdateDate > “2018-06-19”)
Paging
Smaller page sizes will perform faster for slow Internet connections as less data is sent by the server for delivery to the client. WSAPI max page size is '2000'. When experiencing latencies or time-outs then decreasing the page size may help if the query is not too expensive to execute on the database. It's important to understand that a query executed on the database still needs to compute the total potential recordset and while the delivery of the data from our datacenter to your application may be faster with a smaller window size, the database query may be just as slow with a page size of 20 as it is with 2000. Therefore, it may reduce overall load on the server to request 2000 records.
There isn't a way to predict by how much to reduce the page-size. Rather, it depends on the amount of data, the subscription size etc - arguments which we don't have exact benchmarks for in the first place.
Reducing the page-size has a cost though. It will require many more runs of that query. Reducing the page size from '2000' to '200' means that the query needs to run 10 times to retrieve the same amount of data. The best practice here is 'trial and error', simply try out different page sizes and see if they return reliably. So, try page size of '1500', if still not reliable try '1000' etc until you get to a page size that seems to perform better.
The 'pagesize' argument goes alongside with the 'start' argument that indicates which page is requested, for example:
- start=1,pagesize=200 - returns the first 200 query results.
- start=201,pagesize=200 - returns the second set of 200 query results, page #2.
- start=1601,page=200 - returns the 9th set of 200 query results, page #9.
- start=5501,page=500 - returns the 12th set of 500 query results, page #12.
Example 1:
https://rally1.rallydev.com/slm/webservice/v2.0/hierarchicalrequirement?fetch=ObjectID,FormattedID,Name,CreationDate,CreatedBy,LastUpdateDate,Milestones,Owner,Project,FlowState,FlowStateChangedDate,LastBuild,LastRun,PassingTestCaseCount,ScheduleState,ScheduleStatePrefix,TestCaseCount,Package,AcceptedDate,Blocked,Blocker,DefectStatus,Defects,DirectChildrenCount,HasParent,InProgressDate,Iteration,Parent,PlanEstimate,Release,Risks,TestCaseStatus,TestCases,c_BusinessPriority,c_BusinessValue,c_ClassofService,c_Component,Feature,c_MerchantCustomer,c_MPGKanbanState,c_PriorityField,c_ServersRequested,c_StoryType&start=1&pagesize=200
Example 2:
https://rally1.rallydev.com/slm/webservice/v2.0/hierarchicalrequirement?fetch=ObjectID,FormattedID,Name,CreationDate,CreatedBy,LastUpdateDate,Milestones,Owner,Project,FlowState,FlowStateChangedDate,LastBuild,LastRun,PassingTestCaseCount,ScheduleState,ScheduleStatePrefix,TestCaseCount,Package,AcceptedDate,Blocked,Blocker,DefectStatus,Defects,DirectChildrenCount,HasParent,InProgressDate,Iteration,Parent,PlanEstimate,Release,Risks,TestCaseStatus,TestCases,c_BusinessPriority,c_BusinessValue,c_ClassofService,c_Component,Feature,c_MerchantCustomer,c_MPGKanbanState,c_PriorityField,c_ServersRequested,c_StoryType&start=201&pagesize=200
Permissions
For normal users that are assigned permissions on a project level such as project viewer, editor, or admin permissions, a permissions check must be performed for each work item that this user looks up. This permission check isn't noticeable when using the UI to view small bits of data, however, when performing larger data requests, it can impact the performance of a query if that user is granted permissions across a large number of projects.
Because of this, we recommend that integrations are granted workspace admin rights, as this will skip the permission lookup routine since it is known that the account has access to all projects in the workspace. It is possible to create a read-only API key to use in your integration to prevent data from being altered within Rally when using an account with this permission level.
Summary
Assuming the user must use WSAPI to export all data, recommendations are to first use specific fetch or shallowFetch fields and don't request unnecessary data. Second, use paging. It will take trial & error to see what page size may reliably perform, then they'll need to figure out how many times it needs to run and if it's at all 'doable'. If yes - great. Third, add filters to the queries and design a 'pull' mechanism that will combine filters and paging to essentially combine and put together all data they needed.