Persistent Connections tries to connect directly in CEM mode and WebSocket connections keeps timing out
search cancel

Persistent Connections tries to connect directly in CEM mode and WebSocket connections keeps timing out

book

Article ID: 175823

calendar_today

Updated On:

Products

IT Management Suite

Issue/Introduction

Persistent Connections tries to connect directly to the SMP Server when in CEM mode (Internet), if the SMP Server IP address can be resolved.

Unable to establish persistent communication via CEM Gateway.

source='SMAIO.SSLProxy.Socket' module='AeXNetComms.dll' process='AeXNSAgent.exe'
<![CDATA[[1A:OUT_SRV: D20 -> 12F4, CONN: 48992C18] Connect[12F4] failed, error: The semaphore timeout period has expired (0x00000079)]]>

One of the scenarios reported is as follows:

Using Persistent Connections on a Split Tunnel VPN.  Due to the volume of simultaneous people working remotely we have enabled Split Tunnel on our VPN appliances and added Altiris to the split. This is forcing laptops to go to CEM directly over the internet rather than using bandwidth on the the VPN devices. We have blocked the SMP and each of the Site Servers configured in the CEM policy. The Persistent Connections to the SMP used with CEM is working. However, the agents are not getting a Persistent Connection to the Task Server. They are getting a legacy connection.

Environment

Symantec Management Platform 8.5 prior to 8.5 RU4

Cause

Known Issue.

The agent logs show that neither the SMP nor TS Persistent Connections could be established. The reason was simple - both the SMP and TS FQDNs could be resolved to IP successfully and the agent assumes it should connect to them directly ignoring the Internet Gateway even if it could not connect to the server later.

Our HTTP transport works differently - it tries the CEM connection in case the direct connection fails. This is a miss in our websocket implementation that had been bothering clients for a while.

Broadcom development made a simpler fix (until a more permanent implementation can be done) that works this way:

  • agent in CEM network tries establishing a persistent connection to NS or TS
  • agent resolves server fqdn and receives the valid IP
  • agent tries connecting to that IP
  • the connection times out a minute later (default timeout is 60 seconds)
  • previously the agent would give up and do not try CEM connection even on the next retry
  • now fqdn of the timed out server is registered in "bad server" list
  • persistent connection fails
  • connection retry starts a minute later (default interval is 1 minute)
  • agent tries resolving fqdn, succeeds but finds out that fqdn is in "bad server" list
  • agent does not try connecting directly but connects via CEM instead and succeeds
  • if persistent connection is broken when reconnection will occurs through CEM again
  • once and our that "bad server" list gets cleaned up. This is needed to cover cases when server is not available temporarily because of maintenance or some
    other network problem and agent in fact can connect directly to the server
  • "bad server" list also gets cleaned up when network interfaces change on the client side, i.e. when connecting/disconnecting VPN for example.

The disadvantage of the fix was that it required a couple of minutes (depending on the websocket timeouts and intervals) for the Persistent Connections to be established in such a scenario on the first attempt.

With the ideal fix the delay at first connection attempt would be equal to the websocket connection timeout (1 min), the agent would connect through gateway right away, but that fix would require much more of retesting.

Resolution

This was a known issue for WebSockets(Persistent Connection) to try to connect to the SMP Server directly when the SMP Servers IP address could be resolved.
The Symantec Management Agent assumes a LAN connection. This was addressed with our ITMS 8.5 RU4 release.