API Portal Horizontal Scaling - containers not starting on docker swarm additional nodes.
search cancel

API Portal Horizontal Scaling - containers not starting on docker swarm additional nodes.

book

Article ID: 277672

calendar_today

Updated On:

Products

CA API Developer Portal

Issue/Introduction

Using latest portal ova appliance (Debian) and portal version 5.2.2. Developer portal is deployed on first node and containers start fine.

Once second manager node is added in the docker swarm cluster without portal labels the node is added as a manager the coordinator, dispatcher and zookeeper container start successfully on the second node. 

Once the additional node is updated with the "portal=true" and "portal.persist=true" labels the ingress container on this node never starts and  Kafka is also being attempted to be started on this node but it never succeed.

https://techdocs.broadcom.com/us/en/ca-enterprise-software/layer7-api-management/api-developer-portal/5-2/install-configure-and-upgrade/install-portal-on-docker-swarm/scale-api-portal/horizontal-scaling.html

It looks the pods cannot communicate reliable between the two docker swarm nodes using the swarm networks.

Environment

Portal 5.2.x on Debian ova 

Cause

The problem is caused by a combination of Linux kernel version with certain versions of VMware which is causing intermittent network issue with the docker swarm created networks.

Known issues with VMware (portainer.io)

Resolution

The following solutions can be used to workaround this issue .

Turning off tx-checksum-ip-generic on the ssg_eth0 interface of the appliances.

sudo ethtool -K ssg_eth0 tx-checksum-ip-generic off

turning off tx-checksum-ip-generic to off does not persist on reboot. 

In order for this to persist add the below line in /etc/network/interfaces

post-up /sbin/ethtool -K ssg_eth0 tx-checksum-ip-generic off

or create the Docker Swarm with a different data path port than the default and update the firewall to allow this port .

The portal.sh script will create the swarm with the default config , update the following section to set another data port 

NODE=$(docker node ls -q 2> /dev/null || true)
if [[ -z $NODE ]]; then
        docker swarm init > /dev/null
fi

You can change this to for example :

NODE=$(docker node ls -q 2> /dev/null || true)
if [[ -z $NODE ]]; then
        docker swarm init --data-path-port=38888> /dev/null
fi

remove the portal stack and redeploy the docker swarm nodes.