Customer has about 160,000 jobs in Autosys. Initial collection of jobs in WCC (or after you perform a wcc_monitor -d ALL and restart WCC) collects all of the 160,000 jobs properly.
After a bit of time, they restart WCC without changing anything in WCC. The job collection count in WCC starts going down, from 160,000, it goes back all the way to 30,000. It does not seem to go back as much from that point onwards
WCC 11.4.7/12.x when working with Autosys, it issues chunked data collection from autosys, 30k records in each chunk.
When it needs to next 30k records but, for some reason autosys gives less than 30k records, WCC sees that as a potential change in the jobs between WCC and Autosys. It deletes all the rest of the records except the last obtained 30k chunk.
This was seen in situations where autosys database has same job, but with multiple is_active=1 records and are also the latest job version (is_currver=1) in the ujo_job table. Normally for a given job, only one record in ujo_job table exists, where is_active=1.
Release : 12.0
Solution there is to ensure only one is_active=1 record exists for a given job.
Below type of query can be used to understand what jobid records in ujo_job having multiple is_active=1 and is_currver=1 records.
SELECT joid, COUNT(*) FROM ujo_job
WHERE joid>0 AND is_active=1 AND is_currver=1 AND
COALESCE(job_name,' ') LIKE '%' GROUP BY joid HAVING COUNT(*) > 1;
Work with Broadcom support team to get appropriate records fixed (usually older job version record is set with is_active=0 and a commit is issued).
After all such duplicate records are fixed, do a fresh collection from WCC (wcc_monitor -u user -p password -d ALL) and now restart of WCC should not cause any deletion of job collections.