1. A practice of exporting all data of certain artifacts, or entire workspace or subscription is quite different than exporting incremental updates of these objects. Not only exporting all data will take longer to perform, but also if used to override previously exported data then only the 'last snapshot' is always preserved and hence there will be no ability to figure out modifications between snapshots or modification dates.
One of the major benefits of our analytics service is that it creates snapshots for every modification.
Lookback API (LBAPI) is a mechanism that allows you to look back into these snapshots and learn of changes that happened overtime. So, for example, if you used LBAPI then you could find all changes that happened between certain dates or certain snapshots. None of that is possible with Web Services API (WSAPI) which holds only the current data. Also, none of that will be possible for a customer who chooses to always export all data and use it to overwrite their previous export.
2. With regard to exporting all data, at any point:
To export all that exists currently we can either use WSAPI, or use LBAPI filtering all objects on their 'current' snapshot.
WSAPI's advantage is that it allows to query/export beyond artifacts whereas LBAPI only records artifacts.
LBAPI, however, is better designed for massive queries, it supports much larger page sizes (up to 20000 objects, 10 times larger than WSAPI). If only artifacts are exported then probably better to use LBAPI.
See Web Services API documentation at https://rally1.rallydev.com/slm/doc/webservice/
See Lookback API documentation at https://rally1.rallydev.com/analytics/doc/#/manual
Both mechanisms will run into thresholds at some point where the result count is too large to export. It may result in timeouts, perhaps in partial results returned. Large subscriptions need to consider their requirements and processes, and look to avoid this practice of exporting all data, certainly on a continuous (let alone frequent) basis.
There is no way to predict what the data threshold is. There are too many parameters involved including number of total artifacts, size of data of artifacts, network, clients/applications used, etc. It may even be that large attachments may play a part even if not queried for.
However, as subscriptions grow they will start experiencing longer execution time for their queries as they near this moving 'threshold'.
Refer to the Web Services API documentation.
To help situations of increasing latency less requested data will help. Specifically,
Smaller page sizes will perform faster as less data is prepared by the server for delivery to the client. WSAPI max page size is '2000'. When experiencing latencies or time-outs then decreasing the page size will help.
There isn't a way to predict by how much to reduce the page-size. Rather, it depends on the amount of data, the subscription size etc - arguments which we don't have exact benchmarks for in the first place.
Reducing the page-size has a cost though. It will require many more runs of that query. So, reducing the page size from '2000' to '200' means that the query needs to run 10 times to retrieve the same amount of data. The best practice here is 'trial and error', simply try out different page sizes and see if they return reliably. So, try page size of '1500', if still not reliable try '1000' etc.. until you get to a page size that seems to perform better.
The 'pagesize' argument goes alongside with the 'start' argument that indicates which page is requested, for example:
start=1,pagesize=200 - returns the first 200 query results.
start=201,pagesize=200 - returns the second set of 200 query results, page #2.
start=1601,page=200 - returns the 9th set of 200 query results, page #9.
start=5501,page=500 - returns the 12th set of 500 query results, page #12.
Such an example could be:
https://rally1.rallydev.com/slm/webservice/v2.0/hierarchicalrequirement?fetch=ObjectID,FormattedID,Name,CreationDate,CreatedBy,LastUpdateDate,Milestones,Owner,Project,FlowState,FlowStateChangedDate,LastBuild,LastRun,PassingTestCaseCount,ScheduleState,ScheduleStatePrefix,TestCaseCount,Package,AcceptedDate,Blocked,Blocker,DefectStatus,Defects,DirectChildrenCount,HasParent,InProgressDate,Iteration,Parent,PlanEstimate,Release,Risks,TestCaseStatus,TestCases,c_BusinessPriority,c_BusinessValue,c_ClassofService,c_Component,Feature,c_MerchantCustomer,c_MPGKanbanState,c_PriorityField,c_ServersRequested,c_StoryType&start=1&pagesize=200
or
https://rally1.rallydev.com/slm/webservice/v2.0/hierarchicalrequirement?fetch=ObjectID,FormattedID,Name,CreationDate,CreatedBy,LastUpdateDate,Milestones,Owner,Project,FlowState,FlowStateChangedDate,LastBuild,LastRun,PassingTestCaseCount,ScheduleState,ScheduleStatePrefix,TestCaseCount,Package,AcceptedDate,Blocked,Blocker,DefectStatus,Defects,DirectChildrenCount,HasParent,InProgressDate,Iteration,Parent,PlanEstimate,Release,Risks,TestCaseStatus,TestCases,c_BusinessPriority,c_BusinessValue,c_ClassofService,c_Component,Feature,c_MerchantCustomer,c_MPGKanbanState,c_PriorityField,c_ServersRequested,c_StoryType&start=201&pagesize=200
Another technique to retrieve all results 'by chunks' is to use sets of filters. The idea here is also to lessen the requested data so it won't timeout, for example: use: ((Name >= "A") and (Name <= "Z")) , then use: ((Name >= "a") and (Name <= "z")) etc..
This example basically suggests running a first query that gets of objects of certain artifact that start with an uppercase, then run a second query that returns all objects of same artifact that start with a lowercase. The idea is to use filters that do not overlap, retrieve data-sets that can be joined together to form the full result-set. It's similar to paging where requesting each page, then combining all pages on the client.
This is a similar idea where you get chunks of your results with each query, then combine them to form your full result-set.
The advantage of this method, though, is that it eases the server performance time as it does not ask for all artifacts and then places them in pages, but here it actually filters the results immediately on the server. This method will most likely run faster.
The caveat with this method is that one should be familiar with data. So, perhaps asking for all objects that start with an uppercase, later ask for all that start with lowercase - perhaps that can work for one endpoint (say, Defect), but not for another (say, UserStory). The reason being is that perhaps user stories have some common convention where most start with 'US'. If most or all result start with a specific prefix, then you'll need to break your queries accordingly, so for example:
((Name >= "USA") and (Name <= "USZ")) , then later:
((Name >= "USa") and (Name <= "USz"))
It can take some effort (possibly for each exported object) to figure out how to break the queries to chunks.
One way to help here, could be to use the 'Artifact' endpoint (instead of each specific artifact), for example:
https://rally1.rallydev.com/slm/webservice/v2.0/artifact?query=((Name >= "A") and (Name <= "Z"))
Another useful filtering method is to use the LastUpdateDate attribute, which is the last update date of an object. It is automatically assigned when an object is created or updated. This is useful when pulling
https://rally1.rallydev.com/slm/webservice/v2.0/hierarchicalrequirement?pagesize=2000&fetch=ObjectID,formattedID,Name&query=(LastUpdateDate > “2018-06-19”)
Assuming the user must use WSAPI to export all data, recommendations are to: