Solaris Agent 12.2.10 is generating cores when is it submitting File transfer. The FT jobs remains in status "Connecting"
During the test period where traces where collected the agent generated 81 cores.
Job trace:
AIN-THREAD 20211223/082607.848 send_IPC_internal(type=CHANNEL_CLOSE,msg(100a05e50,msgID=15148,addr=100a061f0,len=2024,pos=0,flag=00000000)) --> MAIN-THREAD 20211223/082607.848 U0009909 TRACE: (internal IPC message) 100a061f0 02024 00000000 4348414E 4E454C5F 434C4F53 45000000 >CHANNEL_CLOSE...< 00000010 2343434D 534F434B 30303030 30383835 >#CCMSOCK00000885< 00000020 00000000 00000000 00000007 00000000 >................< 00000030 0000004C 00000375 00000000 00000000 >...L...u........< 00000040 2A495043 28414745 4E542900 00000000 >*IPC(AGENT).....< 00000050= 00000000 00000000 00000000 00000000 >................< 000001A0 00000000 61C4248A 00000000 00000000 >....a.$.........< 000001B0= 00000000 00000000 00000000 00000000 >................< 000005C0 00000000 00000000 00040000 00004D58 >..............MX< 000005D0 00000000 00000000 00000000 00000000 >................< 000005E0 00000000 00000000 00000001 01D35FF0 >.............._.< 000005F0= 00000000 00000000 00000000 00000000 >................< 000007E0 00000000 00000000 >........< MAIN-THREAD 20211223/082607.849 send_IPC_internal <-- (OK) MAIN-THREAD 20211223/082607.849 ccm_channel_destroy: closing socket = 76 MAIN-THREAD 20211223/082607.849 ccm_channel_destroy: destroy queue lock = 10199b2f8 MAIN-THREAD 20211223/082607.849 ccm_channel_destroy <-- (destroyed) MAIN-THREAD 20211223/082607.849 ccm_close(ccm=1007a1210) --
The core with returns with gdb output the following or similar:
### Solaris modular debugger [email protected]:/var/cores/gf0zsxas169t,# file core_gf0zsxas169t_ucxju64_72836_72836_1640244490_20141 core_gf0zsxas169t_ucxju64_72836_72836_1640244490_20141: ELF 64-bit MSB core file SPARCV9 Version 1, from 'ucxju64' [email protected]:/var/cores/gf0zsxas169t,# mdb core_gf0zsxas169t_ucxju64_72836_72836_1640244490_20141 Loading modules: [ libc.so.1 ld.so.1 ] ucxju64:core> ::state mdb: invalid command 'state': unknown dcmd name ucxju64:core> ::status debugging core file of ucxju64 (64-bit) from gf0zsxas169t file: /opt/zones/gf0zsxas169t/root/opt/automic/Agents12.2.10/bin/ucxju64 initial argv: /opt/automic/ServiceManager12.2.10/bin/../../agent/bin/ucxju64 -i/opt/automic/S threading model: raw lwps status: process terminated by SIGBUS (Bus Error), addr=ffffffffffffffff ucxju64:core> ::quit [email protected]:/var/cores/gf0zsxas169t,#
Message sequence from a specific RunID
U00063085 FT '1142116769': File Transfer with partner 'GF0ZSXDB084T' started - sending. U02003069 Connection 'GF0ZSXDB084T,(s=77,ID=860)' renamed to '*FTX(GF0ZSXDB084T,1142116769)'. U00063094 FT '1142116769': Agent process 'FTX(1142116769)' with PID='27518' has been initiated. U00063094 FT '1142116769': Agent process 'FTX(1142116769)' with PID='27518' has been initiated. U00063095 FT '1142116769': Agent process 'FTX(1142116769)' with PID='27518' is up and running. U00063095 FT '1142116769': Agent process 'FTX(1142116769)' with PID='27518' is up and running. U00063087 FT '1142116769': Selection started with filter '/XFERT/home/xfer/CommonReportEngine/DATA/transfer/EHW/21RPTCE895QUENU20211214.XML' ... U00063016 FT '1142116769': The file '/XFERT/home/xfer/CommonReportEngine/DATA/transfer/EHW/21RPTCE895QUENU20211214.XML' does not exist. U00063089 FT '1142116769': Files selected: '1'. U00063018 FT '1142116769': Cannot open file '/XFERT/home/xfer/CommonReportEngine/DATA/transfer/EHW/21RPTCE895QUENU20211214.XML'. Error: 'errno 2, No such file or directory'. U00011409 FT '1142116769': File Transfer ended abnormally. U02000007 Connection to Agent '*FTX(GF0ZSXDB084T,1142116769)(s=77,ID=860)' terminated. U02002040 Disconnected from '*IPC(FTX,1142116769)' (socket handle = 's=79,ID=861').
Component : Solaris System Agent -- Versions 12.3.9 and 21.0.3 and previous service packs
This is bug of the Solaris Agent : An issue has been solved where the Solaris Agent threw core dumps occasionally during File transfer.
This bug is fixed in the following releases:
Hotfix: 12.3.9 HF1 of the Solaris Agent - available.
Service Pack : 21.0.4 - pending.
A workaround is possible: consisting in activating a trace mode on this agent. This brings the agent in a stable status, the problem however is the danger of causing a file system full issue, because of the huge amount data that this mode introduces.
For example with these flags set on the agent process was running file:
- tcp/ip=4 ,- ft_debug=3 ,- memory=1