The following error event is being generated:
CAPM_EventDesc: Batch process job UsageTelemetryETLJob failed.
CAPM_EventCategory: ACCOUNTING
What is its cause and effect?
DX NetOps CAPM 23.3.x
This occurs due to the insufficient memory on the Data Repository (DR) Vertica nodes and results in us not getting accurate license usage metrics when this occurs, but it doesn't impact the product or its functionality in any way. The system will retry once per day.
The issue is highlighted in the following error seen in the DA karat.log:
ERROR | nstance_Worker-2 | 2024-09-17T00:00:03,747 | ExceptionLog | .ca.im.core.util.ExceptionLogger 104 | m.ca.im.common.core.util | | An existing application exception RECURRED (Key=f1d7e3b4077b1b496693d3dc5b99347ce23ec069), Recurrence
count=2 : Exception encountered while performing usage telemetry batch job. : StatementCallback; SQL [INSERT /*+label(standard_device_telemetry_load)*/ INTO dim_item_telemetry(tstamp, resolution, tenant_item_id,item_type_qname, item_count, max_p
ort_range_value, insert_time) SELECT tstamp, resolution, tenant_item_id, item_type_qname, COUNT(1) as tenant_count, max_port_range_value, CLOCK_TIMESTAMP() FROM ( SELECT 1726545600 AS tstamp, 43200 AS resolution, t.string_value as tenant_item_id
, pi.device_item_id, 'Standard Device' AS item_type_qname,COUNT(1) AS item_count, CASE WHEN COUNT(1) BETWEEN 0 AND 200 THEN 200 WHEN COUNT(1) BETWEEN 201 AND 400 THEN 400 ELSE 401 END AS max_port_range_value FROM poll_item pi JOIN v_interface if
ON (if.item_id = pi.item_id) JOIN item i ON (pi.item_id = i.item_id) JOIN v_item_facet v ON ((pi.device_item_id=v.item_id) AND v.facet_qname='{http://im.ca.com/inventory}ManageableDevice') JOIN (select ai.item_id, ai.string_value from v_attribu
te_instance AS ai where ai.attr_qname = '{http://im.ca.com/core}Tenant.Name') AS t ON (t.item_id=i.tenant_item_id) WHERE NOT EXISTS ( SELECT 1 FROM v_item_facet v1 WHERE v1.item_id = pi.device_item_id AND ( v1.facet_qname IN ('{http://im.ca.com/
inventory}WirelessAccessPointDevice'))) AND NOT EXISTS ( SELECT 1 FROM v_telemetry_expired_retired_devices AS vrd WHERE vrd.device_item_id=pi.device_item_id ) GROUP BY t.string_value,pi.device_item_id UNION SELECT 1726545600 AS tstamp, 43200 AS
resolution, t.string_value as tenant_item_id, d.item_id as device_item_id, 'Standard Device' AS item_type_qname,COUNT(1) AS item_count, CASE WHEN COUNT(1) BETWEEN 0 AND 200 THEN 200 WHEN COUNT(1) BETWEEN 201 AND 400 THEN 400 ELSE 401 END AS max_
port_range_value FROM device d JOIN item i ON (d.item_id=i.item_id) JOIN v_item_facet v ON ((d.item_id = v.item_id) AND v.facet_qname = '{http://im.ca.com/inventory}ManageableDevice')JOIN ( SELECT ai.item_id, ai.string_value FROM v_attribute_ins
tance AS ai WHERE ai.attr_qname = '{http://im.ca.com/core}Tenant.Name') AS t ON (t.item_id = i.tenant_item_id) WHERE NOT EXISTS ( SELECT 1 FROM device d2 JOIN poll_item pi ON (d2.item_id=pi.device_item_id) JOIN v_interface v ON (pi.item_id=v.ite
m_id) WHERE d2.item_id = d.item_id)AND NOT EXISTS ( SELECT 1 FROM v_item_facet v1 WHERE v1.item_id = d.item_id AND v1.facet_qname IN ('{http://im.ca.com/inventory}WirelessAccessPointDevice'))AND NOT EXISTS ( SELECT 1 FROM v_telemetry_expired_ret
ired_devices AS vrd2 WHERE vrd2.device_item_id=d.item_id ) GROUP BY t.string_value, d.item_id) AS v1 GROUP BY v1.tstamp, v1.resolution, v1.tenant_item_id, v1.item_type_qname, v1.max_port_range_value ]; [Vertica][VJDBC](3587) ERROR: Insufficient
resources to execute plan on pool etl_processing [Request Too Large:Memory(KB) Exceeded: Requested = 12752856, Free = 12295118 (Limit = 12295118, Used = 0)]; nested exception is java.sql.SQLTransientException: [Vertica][VJDBC](3587)
In the above is the SQL query (in yellow) being run on Vertica. The query is so large, Vertica doesn't have enough memory to run it.
Ensure that you have sufficient memory allocated for each node. Broadcom supports a minimum of 64GB per node however, you may need more depending on the usage profile of the system. You can check the requirements using our Online Sizer:
Broadcom : DX NetOps Sizing Tool
The usage telemetry collection is being updated in 24.3, and these errors will be resolved when the feature is delivered in that release even if the memory is insufficient and does not meet sizing requirements.