CAS Management Portal randomly becoming unresponsive
search cancel

CAS Management Portal randomly becoming unresponsive

book

Article ID: 439212

calendar_today

Updated On:

Products

CAS-VA

Issue/Introduction

Admin reports that one of their Content Analysis Server becomes unresponsive when trying to access the Management GUI or SSH access.

Scanning of objects sent via the Proxy continued to work as expected.

Console access allows connections to the console with login page partly appearing but somewhat responsive (cannot type any command), and ping to the interface is successful.

Portal and log files show "CASHostName has failed to respond xxxx consecutive times" and ProxySG ICAP heath checks indicate all working well.

Two virtual CAS devices running 3.2.2 installed on ESXi platform, but only one CAS device impacted.

Problem CAS recovered via reboot from Hypervisor level, but issue happened again one month later, and 6 weeks after that on same host.

Memory allocation/CPU utilisation same in working and non working setup, and ESXi host not overloaded or migrating host at the time.

Only difference between the two CAS devices is that CAS2 (problem CAS) has a route towards the Management Center (SMC) out the management interface - this is because the SMC is on the app interface subnet and wants to talk to the CAS on it's management subnet/interface.

Environment

Content Analysis Server (CAS) 3.2.2.

ProxySG.

Cause

Memory management issue addressed in 4.1.1.1.

Resolution

Upgrade CAS device to latest 4.1 codebase.

Additional Information

Thousands of "link is not ready" messages appeared in the logs, and the Malware Analysis support included with CAS has some special networking requirements to deal with Malware network access. While Malware Analysis was not enabled in our situation above, it is doing regular checks on its status, and some of those checks are failing due to network issues. While this does not affect Content Analysis directly, the repeated failures are slowing using up resources until the appliance gets to a point where it runs out of resources.

The components that are failing and exhausting the supply of file handles available are all MA processes, and CAS 4.1 does not exhibit the same issue.