Transaction timeout errors in BrightmailLog.log when managing remote SMG scanners

book

Article ID: 159351

calendar_today

Updated On:

Products

Messaging Gateway

Issue/Introduction

 

When managing some Symantec Messaging Gateway (SMG) scanners, you observe that some scheduled tasks or audit log queries fail and leave a timeout error in the BrightmailLog.log

 BrightmailLog.log

May 13 2014 06:08:22 [RuleAgentTask_5] [ConduitRuleHelper] INFO - Heuristic and URL Spam Filter ruleset updated for host 10.11.12.13.
May 13 2014 06:25:31 [RuleAgentTask_5] [AgentEvent] WARN - Connection timed out : 10.11.12.13
May 13 2014 06:25:31 [RuleAgentTask_5] [AgentHelper] ERROR - An agent error has occurred in the following code path:
 java.lang.Exception
        at com.symantec.smg.controlcenter.agent.AgentHelper.logError(AgentHelper.java:458)
        at com.symantec.smg.controlcenter.agent.ScriptHelper.isRebootLocked(ScriptHelper.java:1326)
        at com.symantec.smg.controlcenter.disasterrecovery.VersionManager.isHostUpdating(VersionManager.java:630)
        at com.symantec.smg.controlcenter.monitoring.ruleupdate.RuleAgentTask.run(RuleAgentTask.java:90)
        at java.lang.Thread.run(Unknown Source)
May 13 2014 06:25:31 [RuleAgentTask_5] [AgentHelper] ERROR - --- Host Name: 10.11.12.13
May 13 2014 06:25:31 [RuleAgentTask_5] [AgentHelper] ERROR - --- Agent Port Number: 41002
May 13 2014 06:25:31 [RuleAgentTask_5] [AgentHelper] ERROR - The response object is null.
May 13 2014 06:25:31 [RuleAgentTask_5] [ScriptHelper] ERROR - com.symantec.smg.controlcenter.BrightmailException: The Agent running on 10.11.12.13 is temporarily unreachable. Please check the specified host. ; nested exception is:
         java.net.SocketException: Connection timed out
May 13 2014 06:25:31 [RuleAgentTask_5] [RuleAgentTask] INFO - agent task not executed, host is updating, or DB is restoring: RuleAgentTask_5

Cause

This is an issue with how some network hardware interacts with how SMG distributes network traffic across the Control Center - Agent connections.

The Control Center opens two persistent connections to the agent on each SMG scanner and distributes transactions across these connections. This distribution of traffic does not appear to be well load balanced  and one of the connections may go idle for thirty minutes and sometimes up to an hour as all traffic is handled by the other connection. This can cause some network hardware like firewalls or load balancers which maintain an internal list of active connections to silently drop the idle connection from their list or otherwise reset it. When the Control Center later attempts to reuse the dropped or reset connection, the transaction it attempts to assign to that connection will fail with either a time out or network error depending on whether the connection was silently dropped or reset.

This appears to primarily be an issue with connections to scanners in remote data centers as those environments are more likely to have connections traversing firewalls and similar network hardware but may occur in other network environments.

Resolution

Ensure that idle TCP connections to port 41002 are not dropped by network hardware for at least 60 minutes as this will reduce the frequency with which a connection is idle long enough to be timed out.

This issue is being investigated by Symantec product engineering and may be addressed via changes to the software in a later release.


Applies To

Symantec Messaging Gateway