CAHCHECK is a CA Common Services address space that hosts health checks on behalf of CA products and components.
Many CA product health checks are hosted directly by the owning CA product address space. Some CA products do not have a permanent address space in which to host the health checks. Some CA products that have a permanent address space want some or all of the health checks to be hosted in a more permanent address space. The CAHCHECK address space was created to host CA product health checks that have these requirements. CA product health checks provide the following value for your site:
- Improve product availability by eliminating outages due to configuration option errors - Point out product features that are activated to get maximum benefit from a product - Point out settings that can optimize the performance of a product. This is similar to having a product expert constantly reviewing the product settings in your environment.
The CAHCHECK address space is optional, but we strongly recommend that you use it to receive the full benefit of CA product health checks. It requires the CAMASTER common service system address space.
Sometimes it can happen the CACHECK task starts using a lot of CPU possibly at the same time every week for a period of some hours and only on a single lpar in the Sysplex. This could be an issue because it causes the system to hit it's softcap value.
Are there any known issues with CACHECK otherwise how to find out what is causing the problem.
Environment
Z/OS - CA 1 - Common Services and CA Health Checker started
Resolution
In order to correctly approach this situation there are some steps to follow:
1. apply all the GA PTFs published for Common Services CA Health Checker Component + all the Hyper PTFs for Common Services 2. check for all the CA Products installed and that require Health Checker Interface alerts so that all the hyper GA PTFs of these products should also be applied
We have seen in the past this problem caused by CA 1, analyzing the output of TMSDIAG utility for CA 1 during an high CPU usage time-frame and comparing it with the system dump of CA Health Checker task taken during the same time-frame.
Basically the CA1_USED_DSNB_FREE_CHAIN HealthCheck is designed to be activated once-a-week and only on 1 LPAR in a sysplex environment. It will run the entire free-chain (one-record at a time, very slowly) and the TMSDIAG output will report the last time this checker ran. In case of a large TMC with more or less 10,000,000 DSNB's defined and only about 4.3 million DSNB's used, that means the free-chain actually contains 5.7 million DSNB's and reading that chain 1-record at a time takes a lot of CPU cycles and a lot of time.
So, this HealthCheck can be disabled using the HZSPRMxx member in Z/OS system parmlib, but it is necessary to run CA 1 utility TMSPTRS on a regular basis (weekly would be fine) - to take the 'free dsbns' situation under control.
In summary: as for the high-CPU, that is simply because the 1 LPAR in the Sysplex is "assigned" to run this HealthCheck. Why, is rather random and decided by the HealthCheck system. It just knows that this HealthCheck needs to run on 1 LPAR in the PLEX. And the high-CPU is simply because many free DSNB's to analyze.
In any other case, Please collect a System OS DUMP of CA Health Checker task during the high CPU usage time-frame and open a case with Broadcom Support.