When attempting to publish schema changes, receiving an error during startup attempt. Error that presents is "error 1067: The process terminated unexpectedly". A subsequent startup attempt will cause the schema changes to fail, relegating the changes into several ver_old files.
C:\PROGRA~2\CA\SERVIC~1\site
ddict.sch.ver_old
tblobj.cfg.ver_old
C:\PROGRA~2\CA\SERVIC~1\site\mods
server_secondary_custom.ver.ver_old
C:\PROGRA~2\CA\SERVIC~1\site\mods\majic
wsp.mods.pub.ver_old
wsp.mods.ver_old
In the scenario below that leads to the problem:
CA Service Desk Manager (all versions) running Advance Availability.
There is a checksum failure taking place during startup. The stdlogs across all servers involved may indicate the following messages during startup attempts
Logging which may present in the given servers. Entries of interest highlighted.
Server1 may report:
04/24 14:12:29.24 SERVER1 pdm_rfbroker_nxd 372 ERROR rfbroker.c 1422 Register server SERVER1 (24267) failed: checksum mismatch
04/24 14:12:29.25 SERVER1 pdm_rfbroker_nxd 372 SIGNIFICANT api_misc.c 585 Requesting shutdown of system. Reason (Server checksum mismatch)
Server2 may in turn report:
04/24 14:12:29.20 SERVER2 pdm_rfbroker_nxd 6988 ERROR ServerStatusMonitor. 1956 Unable to register server SERVER1 (24267) as checksum count 3 would exceed maximum of 2 (2 servers registered)
In the given scenario, there will also exist a Server3/App server in play and has its own checksum values that are causing the checksum tests to fail.
During startup of the given servers in Advance Availability, a checksum test is done on the ddict.sch, majic, and wsp.mods files to ensure consistency across these servers. What is happening is that Server1, Server2, and Server3 will each have differing content in some or all of these files, causing the checksum test to fail when Server1/SB tries to start under such conditions,
A common scenario that could give rise to this situation: all three servers each have their own unique version of the ddict.sch files which is causing the fault.
In such a scenario with schema changes that are being made, the Server3/App may not have been regularly recycled previously, which could cause its own ddict.sch, majic, and wsp.mods files to fall out of sync with both Server1 or Server2. When a custom change is being rolled out, the general procedure would be to establish the changes first between BG and SB Servers, then do a recycle to disseminate the changes out to the other constituent App servers in the environment.
Preventive action is to perform a recycle of Services on the SB and app servers before any schema updates are performed on Server1.
Revising the above steps:
Server1/SB services should then startup successfully. You can then run "pdm_server_control -b" on Server1/SB to failover and make Server1/BG and Server2/SB, then recycle Server2 and the app servers to have them synchronise with Server1/BG.
If you are in a situation where the schema updates appear to be lost on Server1, started services after receiving the "error 1067" message, you will need to manually restore the ver_old files as the active files on Server1/SB.
Server1/SB services should then startup successfully. You can then run "pdm_server_control -b" on Server1/SB to failover and make Server1/BG and Server2/SB, then recycle Server2 and the app servers to have them synchronise with Server1/BG.