Issues with failover: Primary & Backup devices show up as "Master"

Products

ISG Proxy ProxySG Software - SGOS

Issue/Introduction

The reported issue is linked with the multicast packets and not the appliance, provided the failover implementation has been done correctly.

Resolution

In this update we have also shared detailed, best practice, troubleshooting steps which will help you validate your configurations. Refer to the attached Tech. Article, for the detailed troubleshooting steps.

Note 1:

Having investigated a known case, please note that the failover state of both appliances would assume the role of Master when the multicast packets sent from one device is not reaching the other unit. One of the possible causes is that any intermediate network device like a switch is blocking the multicast packets. Refer to the Tech. Article with URL below.

https://knowledge.broadcom.com/external/article?legacyId=TECH242572

You will need to take a PCAP with filter "ip host 224.0.1.1" concurrently on both the units and ensure the multicast from one unit is reaching the other and vice versa. If you can see the multicast packets on both the units, then, a review of the implementation would be necessary, to see if any part of the implementation is broken. If this is a new deployment, looking out for possible gaps would be necessary.

Note 2:

When configuring failover customers often run into problems with the multicast addresses used. Some multicast aware switches expect to see traffic at lower numbered multicast addresses rather than higher numbered ones. The recommendation would be to configure something in the 224.0.0.0/24 range to avoid these kinds of issues. The following documentation uses 224.1.2.3 as an example but I would still recommend configuring the address in the previously mention subnet. See the following document for further information:

Also the advertisement interval for the hellos to be sent between the failover pair should be set to 1 second rather than the default of 40 seconds. This allows for the failover to occur much faster.

Also ensure that IGMP snooping on the switch is turned off globally or on the port the SG is connected to or some other workaround is applied so that IGMP snooping does not interfere with the operation of failover.

Please follow the Tech. Article with the URL below, for further insights into dealing troubleshooting failover and dealing with common issues.

https://knowledge.broadcom.com/external/article/228051/investigating-and-resolving-failover-iss.html

Once more, please note the below.

ProxySG requires multicast traffic to flow from one unit to the other. Many networks don't propagate multicast traffic across different networks.

The Virtual IP (VIP) used for Failover must belong to an active network. If the VIP has been configured for a network that these proxies are not members of, traffic from one proxy to the other will not be routed correctly.

Additional docs.:

https://knowledge.broadcom.com/external/article/166471/how-to-setup-multiple-proxysgs-to-provid.html

https://knowledge.broadcom.com/external/article/170518/failover-configuration-operation-and-tro.html

For a particular case study, it was confirmed that the VM network was dropping the multicast traffic, and to resolve this, the below steps were executed from the vSphere client.

On the vSphere Client Home page, click Networking and navigate to the distributed switch.
From the Actions menu, select Settings > Edit Settings.
In the dialog box that displays the settings of the switch, click Advanced.
From the Multicast filtering mode drop-down menu, select IGMP/MLD snooping, and click OK.