Archive Manager consumed all the Memory and Swap causing the SpectroSERVER to shutdown


Article ID: 7587


Updated On:


CA Spectrum


We had production outage with our MLS SpectroSERVER. The SpectroSERVER had shutdown because the Archive Manager consumed all the available memory and swap space. 

This had an impact on the other 7 SpectroSERVERs and I had to failover all 8 SpectroSERVERs and recycled Spectrum to recover from the issue.


The Stack pulled from the core file contained the following information:

0000000100c846f1`_lwp_kill+8(6, 0, 1001af388, ffffffffffffffff, ffffffff77c3e000, 0) 
0000000100c847a1`abort+0x118(1, 1d8, ffffffff77aca32c, 1f1efc, 0, 0) 
0000000100c84881`__1cQCs_out_of_memory6F_v_+0x3c(ffffffff7cd06918, 88, ffffffff7cd05bf8, 0, ffffffff78100200, 0) 
0000000100c84931`__1cOCs_new_handler6F_v_+0x3c(ffffffff7cd068a0, 40, ffffffff7cd05bf8, 0, ffffffff77c3e000, 10019bac0) 
0000000100c849e1`__1c2n6FL_pv_+0x44(2b, 1, 0, ffffffff7d79ff20, 0, ffffffff77e0f4a0) 
0000000100c84a91`__1cHattrdup6FpkvnKCsAttrDescMCsAttrType_e__pv_+0x30(1629a7860, 1100c84c41, 1629a7860, ffffffff77ad17dc, 
ffffffff77c3e000, 2b) 
0000000100c84b51`__1cJCsVarData2t5B6Mr0_v_+0x38(100b108c0, 1f326a9e0, 11, 1, 100b108c0, ffffffff77e0f4a0) 
0000000100c84c11`__1cNCsVarDataListEcopy6Mrk0_nHCsErrorJCsError_e__+0x2c(6e3843510, 1629aae60, 1f326a9e0, 0, 100b07670, 
0000000100c84cc1`__1cNCsVarDataList2G6Mrk0_r0_+0x1c(6e3843510, 1629aae60, 100b07670, 6e3843448, 0, 6e3843530) 
0000000100c84d71`__1cOCsEventMessageEcopy6Mrk0i_v_+0x124(6e3843360, 1629aacb0, 0, 0, ffffffff77c3e000, ffffffff7d79ff20) 
0000000100c84e51`__1cOCsEventMessage2t5B6Mrk0i_v_+0x90(6e3843360, 1629aacb0, 1001a2868, 6e3843530, ffffffff7d79ff20, 0) 
0000000100c84f01`__1cLCsEventListEcopy6Mrk0_v_+0x6c(100b0bd80, ec0, c00, 7, ffffffff7d79ff20, 2400) 
0000000100c84fb1`__1cLCsEventList2t5B6Mrk0_v_+0x1c(100b0bd80, 259c1e1d0, 1000, 100b0bd88, ffffffff7d79ff20, 
0000000100c85061`__1cQCsScrollResponseEcopy6kM_pnOCsVnmParmBlock__+0x40(259c1e1b0, 1000, 1dab68, ffffffff7d79ff20, 
ffffffff77c3e000, 100b0bd60) 
0000000100c85111`__1cICsVnmMsg2t5B6Mrk0_v_+0x54(100b07640, 1f325f510, 0, 0, 100b07640, ffffffff77e0f4a0) 
0000000100c851c1`__1cQCsNMgrClientListGnotify6MpnICsVnmMsg__nHCsErrorJCsError_e__+0xb8(100beb5d8, 1f325f510, 1014937b0, 
101541310, 101541318, ffffffff7ed1fdf0) 
0000000100c85271 __1cRCsEventAttachListGnotify6MrknOCsEventMessage__nHCsErrorJCsError_e__+0x138(100beb5d8, 1001b0, 1629aac98, 259c1e1b0 
, 100000, 100e57c80) 
0000000100c85321 __1cMCsLogManagerNprocess_event6MrnOCsEventMessage_rknNCsLscpeHandle__nHCsErrorJCsError_e__+0x5c(100beb5c0, 1f326c620, 
1f326c830, b1d2a, b1d29, 1001a3078) 
0000000100c853d1 __1cRCsEventLogManagerNprocess_event6MrnOCsEventMessage_rknNCsLscpeHandle__nHCsErrorJCsError_e__+0xc(100beb5c0, 
1f326c620, 1f326c830, 1f325f5a0, 10003e71c, 1001a3078) 
0000000100c85481 __1cMCsLogManagerJqueue_log6M_v_+0x6c(100beb5c0, 1f326c5a0, 1001af000, 1001afde0, 1001af, 100000) 
0000000100c85531 __1cMCsLogManagerUqueue_thread_wrapper6Fp0_v_+4(100beb5c0, 100c0a680, 0, 0, 1e, 0) 
0000000100c855e1`__1cGThreadUcall_thread_function6M_v_+8(100c3f340, 1e, 0, 100040ef8, 0, 0) 
0000000100c85691`moot_thread_start+0x30(1e, 70, 0, ffffffff7d917328, ffffffff78100200, 0) 
0000000100c85741`__makecontext_v2+0x108(0, 0, 0, 0, 0, 0) 


Component: SPCAEM


Sustaining Engineering developed the Spectrum_10.01.01.D229 Debug patch that has added some additional debugging code to better understand the memory growth for ArchMgr.

The Archive Manager will now log such details on an interval in the ARCHMGR.OUT:

Jul 25 10:38:37 : ArchMgr performance statistics:

                        Total   Change  Rate(/sec)

Seconds counted:        1261    61

Queued messages:        3201    3012    49.4    (now 0, avg. 467, max 2296)

Dequeued messages:      3201    3012    49.4

Notifications:          3201    3012    49.4

Inserted messages:      2982    2914    47.8

Failed Inserts:         0       0       0.0

Requests:               164     41      0.7     (now 0, avg. 1, max 1)

        (avg. dur. 0.0 sec, max dur. 0.0 sec)

Responses:              164     41      0.7

No responses:           0       0       0.0

Sent messages:          0       0       0.0

Attachments:            0       0       0.0     (now 0, avg. 0, max 0)

Detachments:            0       0       0.0

Memory Usage: 30.7MB real / 134.8MB virtual


The debug is disabled by default. To enable the debug you must modify the .configrc file and add the report_performance=TRUE statement. By default the performance data will be written to the ARCHMGR.OUT event 10 minutes (3600 seconds). The interval can be specified by adding the report_perf_interval=<number_seconds> to the .configrc file if needed. 


The patch also ships a second ArchMgr binary. The ArchMgr_memc binary provides a second option for gathering more memory debug data. This should only be used at the request of Sustaining Engineering. To use the ArchMgr_memc, you simply make a backup copy of the regular ArchMgr binary. The make a copy of ArchMgr_memc and name it ArchMgr. Run 'touch checkMemory' in the $SPECROOT/SS/DDM directory before starting Archive Manager.


Provide a copy of the ARCHMGR to Sustaining Engineering for review.