Symptoms:
- No alarms or metrics from integration products
- No tickets being created in ServiceNow
- Cannot access Kubernetes / Openshift / DX Console / OI Console.
- Pods crashing
- NFS filesystem full
DX Platform 2x
The following is a high-list of techniques and suggestions to employ to reduce data retention for Elastic:
A) Check Elastic Stats
B) Change data retention to all Tenants
C) Change data retention to a specific tenant
D) Change data retention to specific Elastic indices
E) Disable or reduce Elastic snapshots
F) How to delete specific old indices immediately?
AIOPs Data Stores and Flow Interactions
A) Check Elastic Stats
NOTE: update http(s)://{es_endpoint} with your own elastic_endpoint
a) Check Elastic indices by size:
http(s)://{es_endpoint}/_cat/indices/?v&s=ss:desc&h=health,store.size,pri.store.size,pri,rep,store.size,pri.store.size,docs.count,docs.deleted,index,cds
For example:
b) Check Elastic health:
http(s)://{es_endpoint}/_cluster/health?pretty&human
For example:
Recommendations:
-Increase disk space in NFS server
-Reduce data retention as documented in next points
B) Change data retention for all TENATS
default retention period is 45 days
"In the OpenShift Web Console, go to the Digital Operational Intelligence project.
Go to Applications, Deployments, doireadserver.
Select Environment to view the environment variables.
Set the value of JARVIS_TENANT_RETENTION_PERIOD as needed.
Click Save."
C) Change data retention to a specific tenant
a) Obtain the tenant_id from Settings > Connector Parameters > Cohort ID
b) Go to Jarvis API onboarding
http(s)://<jarvis-api-endpoint>
c) Change data retention, for example reduce data retention from default 45 to 10 days
Execute: PATCH /onbaarding/tenants
Body:
{
"product_id":"ao",
"retention_period":<# of days>,
"tenant_id":"<tenant_id>",
}
Click Execute, expected Code Result = 204
Verify the change, execute: GET /onboarding/tenants(product_id='{product_id}',tenant_id='{tenant_id}')
Product_id = ao
Enter the tenant id
Click Execute, expected Code Result = 200
D) Change data retention to specific Elastic indices, for example: metrics_uim to 2 days AND metrics_anomaly to 30 days
1) Identify which integrations or features are causing the high ingestion of data (UIM, spectrum, capm, caapm, log, anomalies, etc)
To list all indices by creation date:
http(s)://{es_endpoint}/_cat/indices/?v&s=cds:desc&h=health,store.size,pri.store.size,pri,rep,store.size,pri.store.size,docs.count,docs.deleted,index,cds
To list all incidents by size:
http(s)://{es_endpoint}/_cat/indices/?v&s=ss:desc&h=health,store.size,pri.store.size,pri,rep,store.size,pri.store.size,docs.count,docs.deleted,index,cds
You can narrow your search by filtering only specific indices, for example to list UIM indices only:
http:// {es_endpoint}/_cat/indices/*uim*?v&s=cd:desc&h=h,,ps.size,pri,rep,store.size,pri.store.size,docs.count,docs.deleted,index,cds
To list anomaly indices:
http:// {es_endpoint}/_cat/indices/*anomaly*?v&s=cd:desc&h=h,,ps.size,pri,rep,store.size,pri.store.size,docs.count,docs.deleted,index,cds
2) Reduce data retention using PATCH /onbaarding/tenants
Body syntax:
{
"product_id":"ao",
"retention_period": <retention_days>,
"tenant_id":"<tenant_id>",
"tenant_doc_type_details":[
{
"doc_type_id":"<doc_type#1>",
"doc_type_version":"<doc_type_version#1>",
"retention_period":<doc_type_rention_days>
},
{
"doc_type_id":"<doc_type#2>",
"doc_type_version":"<doc_type_version#2>",
"retention_period":<doc_type_rention_days>
}
...
]
}
How you obtain the doc_type and doc_type_version for specific indices?
In this example, we are looking for the doc_type definition of the UIM metric index:
Execute: GET /onboarding/doc_type(product_id='{product_id}')
Click Try it out
product_id = ao
Click Execute
we can use the browser search to locate the doc_type defition, in this example "itoa_metrics_uim":
We can now proceed to change the retention at tenant and doc type level, for example : tenant retention = 20, metrics_uim = 2 and metrics_anomaly = 15
{ "product_id":"ao",
"retention_period":20,
"tenant_id":"<your_tenant_id>",
"tenant_doc_type_details":[
{
"doc_type_id":"itoa_metrics_uim",
"doc_type_version":"1",
"retention_period":2
},
{
"doc_type_id":"itoa_metrics_anomaly",
"doc_type_version":"1",
"retention_period":15
}
]
}
Click Execute, expected Code Result = 204
Verify the change, execute: GET /onboarding/tenants(product_id='{product_id}',tenant_id='{tenant_id}')
Expected Code Result = 200,
E) How to delete specific old indices immediately?
If you need space available as soon as possible, then you can delete one or more of problematic indices by using
curl -X DELETE http(s)://{es_endpoint}/<index_name>
In below example, we have found that UIM and Anomaly are the problematic indices
http(s)://{es_endpoint}/_cat/indices/*metrics*?s=index,cds&h=index,ss,cds
First, we identify the oldest indices, this example:
ao_itoa_metrics_anomaly_1_9
ao_itoa_metrics_anomaly_1_10
ao_itoa_metrics_uim_1_8
ao_itoa_metrics_uim_1_9
then, we execute : curl -X DELETE http(s)://{es_endpoint}/<index_name> as below:
curl -X DELETE http(s)://{es_endpoint}/ao_itoa_metrics_anomaly_1_9
curl -X DELETE http(s)://{es_endpoint}/ao_itoa_metrics_anomaly_1_10
curl -X DELETE http(s)://{es_endpoint}/ao_itoa_metrics_uim_1_8
curl -X DELETE http(s)://{es_endpoint}/ao_itoa_metrics_uim_1_9
IMPORTANT: always delete the oldest incidents