When utilizing HSM servers with SSL Visibility appliances, the SSLv does health checks with the HSM server or HSM agent. If a health check fails you will see the following examples of log messages in the syslog_unfiltered.log file:
May 29 14:52:08 SSLv_01 ssldata[16138]: [E] HSM resigning CA [63ead84212088677, appliance: x.x.x.x, key alias: SSLVFR] has failed.
May 16 15:44:11 SSLV_02 ssldata[16138]: [E] HSM resigning CA [63ead7931200456, appliance: y.y.y.y, key alias: SSLVFR2] has failed.
The SSLV does an initial health check when an HSM is added to the SSLV config. If the HSM in question is not receiving re-signing requests (in-active / unused), then periodic health checks are performed.
On HSMs actively re-signing certificates, that re-sign request itself is used as a health check.
When the SSLV does not receive a re-signing response back within 5 seconds, it will mark the re-sign attempt as failed and mark that specific connection to the HSM as unhealthy. After flows on one connection are flushed, SSLV will do health check and as long as HSM responds to health check, it will not be marked as "failed".
If Health Check fails, the SSLV marks the HSM in question as unhealthy and all flows waiting for a re-sign response will get flushed out and dropped. The SSLV will then perform health checks every second, incrementing up to 30 seconds, until a successful health check is received. These checks will use the same connection and will stay open for up to 5 seconds after the health check is sent.
When a HSM resigning ca goes down and we send a health check at every second, that health check connection will stay up until we get a good response or until the health check frequency goes beyond 5 seconds. Then we close and open a new one.
While troubleshooting a health check failure the the HSM server or HSM agent, the following steps can help to localize the issue:
- request the server.xml file for the HSM server or HSM agent.
- request the system logs from the HSM server or HSM agent. These should be located in the /root/hsm/logs/ directory.
- request the access logs. They will be in the /root/hsm/logs/ directory and will have a naming convention of "localhost_access_log.YYYY-MM-DD.txt".
- request a pcap from the MGMT interface filtering on "port 8443 or port 53".
- request the stats from https://<Agent IP address>:8082/hsm page.