When using Symantec Encryption Management Server (PGP Server) in a cluster, the following issues could be signs of a cluster related issue:
To check if replication is working, try adding an administrator to each of the PGP server cluster nodes and see if they replicate to each of the other nodes. This is typically the easiest indicator that replication is working.
NTP or VMware time sync (not both)
The PGP Encryption Server needs to have the same time on each of the cluster servers. If you have configured "Local time", it's possible there could be clock skew, such as the following cluster log entry:
Using an NTP server is a good idea to ensure the PGP Server are all going to have the same time.
Note: If you are running in VMWare you may not be able to use both NTP and VMWare Time Sync.
Either deactivate time sync or do not use NTP. See the following article for information on time as it relates to Symantec Encryption Management Server:
If you still cannot get clustering to work, do not attempt to break the cluster. Doing so will require rejoining the cluster, and if the database is large, this is a time-consuming event. Typically support can fix a clustering issue without having to break the cluster. Breaking the cluster typically introduces other complexities into the equation so it's just better to work with support on this. You may need to configure putty access to the server to attempt additional troubleshooting steps. See the following article for more information on setting up SSH access to the PGP server:
153592 - Access Encryption Management Server by using SSH
There are conditions where adding a PGP server cluster node to the existing cluster is needed for redundancy. When this is done with Web Email Protection, the replication can be handled in a few different ways.
Depending on the service, WEP content may not be replicated to all nodes. If the WEP service is disabled on a cluster node, users may appear on all cluster nodes but their Web Email Protection content would not be enabled. If this is the case, cluster nodes will not attempt to send WEP data to the other nodes and this creates multiple rings where there could be some nodes that have multiple upstream or downstream servers. This is all expected. For best results and for best redundancy, the WEP service should be enabled on all cluster nodes, and then the "All" setting should be set to each node to ensure all WEP data is replicated in the ring:
As you can see in the screenshot above, the WEP service is enabled, but Message Replication is "Off".
This means that WEP can be used, but messages that originate on this server will stay on this server and will not replicate to the other nodes.
The best scenario for redundancy is to be able to have all replication enabled for All nodes, and then have the service enabled on all PGP servers.
This will ensure messages are always on all the nodes, so if one server is unavailable, the messages are always on the other nodes.
This means that each server should be appropriately resourced so that they have the same amount of drive space, memory, etc.
Once Message Replication is set to "All", then this will replicate messages to all nodes and each server will have roughly the same hard drive space used on the cluster nodes.
Something to keep in mind when you enable WEP on all nodes and set Message Replication to All, depending on the amount of data, this could take some time to replicate over.
For example, if WEP is using 50GBs of space, it could take 8 hours to replicate all the data to the nodes. When this replication is happening, other replication tasks may take time, so you will want to plan accordingly.
Assume that while this is happening, that node will not be fully functional until it completes.
Once you have enabled the WEP service and clustering to "All", it is a good idea to restart services from the System\General tab of the PGP server and this should trigger replication to re-calculate and start processing all the replication data.
Some useful commands may come into play during your troubleshooting:
To verify connectivity issues within cluster members:
pgprepctl topo
To verify the incoming and outgoing connections of a node:
pgprepctl debug list
Used to monitor the queues and make sure replication is working:
pgprepctl info
Interpreting this data can be accomplished by consulting with Symantec Encryption Support.
EPG-26105 - Global_ID value missing after failed cluster join
For more information on Clustering, see the following article:
153721 - Creating a Cluster with Symantec Encryption Management Server
153476 - How many PGP Servers are supported in a cluster (Symantec Encryption Management Server)?
153412 - Troubleshooting: Symantec Encryption Management Server Clustering
222372 - Encryption Management Server clustering and replication uses network Interface 1