Description :When restarting a node, the CDJ uses a lot of cpu for a long time and shows inaccurate status of job runs on univiewer console. The file u_fmsb60 of the concerned are is quite big and there are lots of lines containing this string: "SSSSSSS99999999999999"
In order to measure the startup time of the CDJ, you can compare this two lines:
On my case, with a u_fmsb60.dta file of 100MB, it takes 6 minutes 30 seconds to sync:
################## | 2016-12-06 14:43:16 |INFO |X|CDJ|pid=12356.140102397495040| u_cdj_main | CDJ starting (multithread version) ... | 2016-12-06 14:49:47 |INFO |X|CDJ|pid=12356.140102366459648| u_cdj_thread_receive_msg_ | Synchronization request OK: successfully connected to EURPRD IO server X on local ##################
Cause
Cause type: Defect Root Cause: The issue was due to the fact that the SBSESS records (lines containing SSSSSSS99999999999999) on the u_fmsb60.dta file were not purged correctly in particular circumstances.
Environment
OS: All
Resolution
In order to fix the issue, you first need to upgrade the impacted node to 6.7.41 and then purge manually the u_fmsb60.dta via the tool uxpursbsess.
Procedure to use it on Unix / Linux on the area EXP:
1. Stop the node: unistop 2. Backup the target u_fmsb60.dta ( cp $UNI_DIR_DATA/exp/u_fmsb60.dta $UNI_DIR_DATA/exp/u_fmsb60.bak) 3. Simulate the purge of the SBSESS orphan records to see if the issue is present on the target area:
uxpursbsess exp simulate any
4. If the issue is present ( the value of sbsess deleted is different than 0 ), then launch it without the parameter simulate so that the purge is performed:
uxpursbsess exp any
5. Launch an offline reorg so that the file size is shrunk: unireorg 6. Start again the node: unistart 7. You can check the size of the file before and after, the difference should be quite important.