Investigating and resolving Failover issue with both appliances showing up as "Master"

Products

ProxySG Software - SGOS

Issue/Introduction

Resolution

For the issue reported, please refer to the below.

Note 1

The failover state of both appliances would assume the role of Master when the multicast packets sent from one device is not reaching the other unit. One of the possible causes is that any intermediate network device like a switch is blocking the multicast packets. Refer to the Tech. Article with URL below.

https://support.symantec.com/en_US/article.TECH242572.html

We will need to take a PCAP with filter "ip host 224.0.1.1" concurrently on both the units and ensure the multicast from one unit is reaching the other and vice versa. If you can see the multicast packets on both the units, then, a review of the implementation would be necessary, to see if any part of the implementation is broken. If this is a new deployment, looking out for possible gaps would be necessary.

Note 2

When configuring failover customers often run into problems with the multicast addresses used. Some multicast aware switches expect to see traffic at lower numbered multicast addresses rather than higher numbered ones. The recommendation would be to configure something in the 224.0.0.0/24 range to avoid these kinds of issues. The following documentation uses 224.1.2.3 as an example but I would still recommend configuring the address in the previously mention subnet. See the following document for further information:

Also the advertisement interval for the hellos to be sent between the failover pair should be set to 1 second rather than the default of 40 seconds. This allows for the failover to occur much faster.

Also ensure that IGMP snooping on the switch is turned off globally or on the port the SG is connected to or some other workaround is applied so that IGMP snooping does not interfere with the operation of failover. Please consult the following documentation for further insights into dealing troubleshooting failover and dealing with common issues.

After validating your configuration, another cause for trouble may come from a device on your network that routes traffic from one appliance to the others. It's important to make sure that multicast traffic is permitted to travel between appliances. A simultaneous packet capture from all appliances in the failover group, taken with a capture filter of the multicast-address (e.g. “ip host 224.0.0.1” without the quotes) will report the multicast traffic from all appliances should see multicast packets being sent only by the active master appliance's source IP address. If two appliances are sending multicast packets at the same time, this indicates that the switch or router is not passing the multicast packets.
When testing failover, the next available passive appliance should miss 3 consecutive multicast packets from the active master appliances before it becomes authoritative for the shared VIP and starts to intercept and manage traffic. This means, with the default Advertisement Interval of 40 seconds, you need to wait for about 2 minutes before the next passive appliance will take over.

All appliances must be on the same network (i.e. same subnet, same broadcast domain).
The Virtual IP (VIP) must be the same on all members of the failover group. This ensures that if the active appliance goes offline, the next available passive appliance becomes the authority for the shared VIP address
The multicast address must be the same on each appliance. This is how the appliances communicate active/passive state information with one another and is crucial to the
Only one appliance in the failover group should have the Master setting enabled. Keep the Relative Priority value at the default of 100.
The advertisement interval should be the same on each appliance to avoid delays in switching the master in the event of a failure.
When defining a Virtual IP address to use for your failover group, choose an IP that is not already assigned to a network adaptor, but is on the same subnet as the other appliance IP addresses.
The Shared Secret must be the same across all members of the failover group. If you suspect this to be the cause of the issue, define a new password and enter it on each appliance in the failover group, one after the other and apply the changes.

Once more, please note the below.

ProxySG requires multicast traffic to flow from one unit to the other. Many networks don't propagate multicast traffic across different networks.
The Virtual IP (VIP) used for Failover must belong to an active network. If the VIP has been configured for a network that these proxies are not members of, traffic from one proxy to the other will not be routed correctly.