search cancel

Troubleshooting Gateway application failures, performance concerns, and service outages

book

Article ID: 42511

calendar_today

Updated On:

Products

STARTER PACK-7 CA Rapid App Security CA API Gateway

Issue/Introduction

Under certain circumstances, the Gateway may fail to process message traffic. This may result in a loss or degradation of availability as one or more nodes in a cluster are unable to process adequate amounts of traffic.

Environment

Release:
Component: APIGTW

Cause

This behavior may present in one of the following ways:

  1. A protected service is no longer receiving traffic from the Gateway.
  2. One or more nodes are reported as offline or unknown via the Enterprise Service Manager.
  3. One or more nodes are reported as down via the Layer 7 Policy Manager Dashboard.
  4. The Gateway (SSG) service is not running.
  5. The Gateway log files are not generating new log entries.

Resolution

Troubleshooting


There are several pieces of information that should be obtained before the Gateway appliance or the Gateway service is restarted. Restarting the application or service may result in critical diagnostic data being lost. If a restart or reboot is performed then diagnostics may need to wait for the next occurrence of the issue. Please note that these commands can be run against live production environments without causing further downtime or availability concerns.


System statistics


The following are all commands that should be run from the privileged shell of the API Gateway. For more information on accessing the privileged shell of the API Gateway, please refer to the product documentation page titled "Privileged Shell for Root Commands".

  1. top -n 1 -b > /home/ssgconfig/top
  2. ps -e -o pid,args --forest > /home/ssgconfig/ps-forest
  3. ps awwx -mo pid,lwp,stime,time,c,cmd > /home/ssgconfig/ps-lwp
  4. egrep "8080|8443|9443" /proc/net/ip_conntrack > /home/ssgconfig/ip_conntrack_port
  5. cat /proc/sys/net/ipv4/netfilter/ip_conntrack_count > /home/ssgconfig/ip_conntrack_count
  6. ethtool -S ethX > /home/ssgconfig/ethtool-ethX (Note: The value "X" should correspond to one or more interfaces on the Gateway appliance)
  7. iptables -nvL > /home/ssgconfig/iptables-counter
  8. ss -o state established \( sport = :8080 or sport = :8443 or sport = :9443 \) \ dst 0.0.0.0/0 | egrep -v Recv-Q | wc -l
    • The above command counts the number of established inbound connections. That command should be run on every node in the cluster.
  9. ss -o state established \( sport = :8080 or sport = :8443 or sport = :9443 \) \ dst 0.0.0.0/0 | grep -v ^0 | egrep -v Recv-Q | wc -l
    • The above command counts the number of queued inbound connections. That command should be run on every node in the cluster.
  10. ss -o state established \( dport = :http or dport = :https \) \ dst 0.0.0.0/0 | egrep -v Recv-Q | wc -l
    • The above command counts the number of outbound connections. That command should be run on every node in the cluster.


Garbage collection (GC)

  1. sudo su gateway
  2. /opt/SecureSpan/JDK/bin/jstat -gcutil `cat /opt/SecureSpan/Gateway/node/default/var/ssg.pid` 10s > ~/gc_output.txt
    • The above command gathers the garbage collection data every ten seconds and puts it into the gc_output.txt file. That command should be left to run for as long as possible (5 to 60 minutes) and the file should then be provided to CA Support.
    • If prescribed by a CA Support Engineer to collect this data over a longer period of time (i.e. days or weeks), the following steps should be completed instead of the command above:
      1. Edit the following file: /opt/SecureSpan/Gateway/node/default/etc/conf/node.properties
      2. Add the following line to the file in step one above: node.java.opts = -verbosegc -XX:+PrintGCDetails -Xloggc:/tmp/gc.log
      3. Save the file after the modification in step two above.
      4. Restart the API Gateway service to implement the change: service ssg restart
        • At this time, a file will be written to for garbage collection diagnostic data at /tmp/gc.log. This diagnostic data should be running for a period of time as prescribed by a CA Support Engineer, and submitted back to CA Support at the requested date. After such time, it may be directed to comment the line in step two above and proceed through steps three and four again to disable the garbage collection diagnostic process.


Thread dump

A thread dump will provide the viewer with information on what a particular Java application is doing within a particular Java Virtual Machine (JVM). Please perform the following commands from the privileged shell of the API Gateway appliance:

  1. sudo su gateway
  2. ps awwx | grep Gateway.jar | grep -v grep | awk '{print $1}' | xargs -I{} /opt/SecureSpan/JDK/bin/jstack {} > /tmp/thread.tdump

Heap dump

A heap dump is the memory state of the Java application within the Java Virtual Machine. It can be useful for diagnosing how the Gateway is using its allocated memory. Please perform the following commands from the privileged shell of the API Gateway appliance:

  1. sudo su gateway
  2. ps awwx | grep Gateway.jar | grep -v grep | awk '{print $1}' | xargs /opt/SecureSpan/JDK/bin/jmap -dump:live,format=b,file=/tmp/heap.hprof

Environment configuration

 It is important to know how the API Gateway is deployed and how it may have been configured. The following files and commands will be useful for ascertaining the status of the API Gateway deployment and how it is configured

  1. rpm -q ssg ssg-appliance
  2. rpm -q --verify ssg ssg-appliance
  • Please provide the contents of any files listed in the above command output by attaching them to the support case.
  1. ls -halt /opt/SecureSpan/Gateway/runtime/modules/assertions/
  2. ls -halt /opt/SecureSpan/Gateway/runtime/lib/ext
  3. netstat -tnap
  4. ps awwx
  5. dmidecode -t 1
    • The above command will confirm if the appliance is a physical or virtual appliance which assists CA Support in providing a solution specific to your environment.
  6. free -m
  7. vmstat -t 1 240  
    • Please note that this is information is most useful while the issue is actually happening. You can adjust the time value depending on your needs.
  8. last | grep reboot

Configuration and log files

The following files give Layer 7 Support a glimpse into how an API Gateway appliance is currently running. Please provide unabridged and complete copies of the following files:

  1. /opt/SecureSpan/Gateway/node/default/var/logs/*
  2. /opt/SecureSpan/Controller/var/logs/*
  3. /opt/SecureSpan/Gateway/node/default/etc/conf/*.properties
  4. /opt/SecureSpan/Controller/etc/conf/host.properties
  5. /var/log/messages
  6. /var/log/dmesg
  7. /var/log/bash_commands.log