Symantec PAM Cluster - Miscellaneous or partial networking errors from a single node in the cluster

Products

CA Privileged Access Manager (PAM)

Issue/Introduction

Several clients who have set a cluster VIP that was not in the same subnet as all of the PAM appliances have found several different problems depending on the exact nature of the misconfiguration.

1. All cluster nodes in a single site cluster set to one subnet and the VIP set to an IP in another subnet

Possibility 1 Error: 406: PAM-CMN-5084: Turning the cluster on failed

Possibility 2 The cluster starts fine but has errors when trying to reach out to machines in the VIP subnet

2. Some cluster nodes in a single site cluster set to one subnet and the VIP set to an IP in same subnet as the Primary cluster node.

Possibility 1 The cluster starts fine but after a failover event the cluster is not stable and the VIP is not accessible on the new primary node

Possibility 2 The cluster starts fine but after a failover and fail back the node that became the primary cannot access certain network segments and may appear to be out of the cluster,

Environment

Release : 3.x, 4.0

Component :

Cause

The root cause with these errors is the IP either cannot be used in the alternate subnet or the routing table in the PAM appliance is improperly set for the subnet in question.

The PAM appliance is a Debian based linux server. Configuring a load balanced VIP is accomplished by binding the VIP IP address on the network card defined in the cluster configuration as the second ip (see cluster network configuration below) . This would mean that eth0:0 would contain the base IP of the appliance and the VIP address would be bound to eth0:1. Since both IPs are on the same physical network card they ideally should be in the same subnet for several reasons. When two address are assigned to the same network from different subnets advanced routing rules should be applied to function properly. Since Symantec PAM does not provide any ability to apply advanced routing rules problems in general networking can occur if the two IPs are not in the same subnet.

Additionally when the cluster fails over the primary serveive to another cluster node the VIP Ip address is moved to eth0:1 on the new primary server. This again needs to be fully accessible to all other cluster nodes so again this needs to all exist in the same subnet

A properly configure PAM primary appliance will look like this

root@PAMPrimary:~# route
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
default 10.xxx.xxx.xxx 0.0.0.0 UG 0 0 0 eth0
xxx.xxx.xxx.xxx 0.0.0.0 255.xxx.xxx.xxx U 0 0 0 eth0
localhost 0.0.0.0 255.xxx.xxx.xxx UH 0 0 0 lo
172.17.x.x 0.0.0.0 255.255.0.0 U 0 0 0 docker0

root@PAMPrimary:~# ifconfig eth0
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 10.xxx.xxx.xxx netmask 255.xxx.xxx.xxx broadcast 10.xxx.xxx.xxx
inet6 xxxx::xxxx:xx:xxxx:xxxx prefixlen 64 scopeid 0x20<link>
ether 52:54:00:0f:8d:6c txqueuelen 1000 (Ethernet)
RX packets 309258785 bytes 96522743679 (89.8 GiB)
RX errors 15278743 dropped 35 overruns 0 frame 15278743
TX packets 297646202 bytes 74715208774 (69.5 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

root@PAMPrimary:~# ifconfig eth0:1
eth0:1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 10.xxx.xxx.xxx netmask 255.xxx.xxx.xxx broadcast 10.46.129.255
ether 52:54:00:0f:8d:6c txqueuelen 1000 (Ethernet)

An improperly configured PAM appliance will look like this (shown here from a PAM 4.0.1 appliance)

root@PAMPrimary:~# route
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
default 10.xxx.xxx.xxx 0.0.0.0 UG 0 0 0 eth0
10.xxx.xxx.xxx 0.0.0.0 255.xxx.xxx.0 U 0 0 0 eth0

10.xxx.xxx.xxx 0.0.0.0 255.xxx.xxx.0 U 0 0 0 eth0
localhost 0.0.0.0 255.xxx.xxx.xxx UH 0 0 0 lo
172.17.x.x 0.0.0.0 255.255.0.0 U 0 0 0 docker0

root@PAMPrimary:~# ifconfig eth0
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 10.xxx.xxx.xxx netmask 255.xxx.xxx.xxx broadcast 10.xxx.xxx.xxx
inet6 xxxx::xxxx:xx:xxxx:xxxx prefixlen 64 scopeid 0x20<link>
ether 52:54:00:0f:8d:6c txqueuelen 1000 (Ethernet)
RX packets 309258785 bytes 96522743679 (89.8 GiB)
RX errors 15278743 dropped 35 overruns 0 frame 15278743
TX packets 297646202 bytes 74715208774 (69.5 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

root@PAMPrimary:~# ifconfig eth0:1
eth0:1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 10.xxx.xxx.xxx netmask 255.xxx.xxx.xxx broadcast 10.xxx.xxx.xxx
ether 52:54:00:0f:8d:6c txqueuelen 1000 (Ethernet)

Resolution

Using IP's that all belong to the same subnet for each node and the VIP for the cluster site will allow for proper and stable communication.

Note: The requirement for setting all nodes and VIP addresses in the same subnet is defined in the PAM manuals for all releases but with the release of Symantec 4.1 and higher this can change based on the use of an external load balancer specifically.

Additional Information

The simple fact that the cluster can start in some misconfigurations for IPs in different subnets does not mean that it is ok nor supported. Advanced routers or switch like network behavior due to VLAN or ESX network management can offset the problem configuration and allow for communications in some case but will become problematic to troubleshoot. Network changes or a Vmotion of the VMware appliance may break the communications that have been allowed due to the internal network switching at the VMWare host.