Gateway servers are slowing down after few service calls

book

Article ID: 129160

calendar_today

Updated On:

Products

STARTER PACK-7 CA Rapid App Security CA API Gateway

Issue/Introduction

We have 2 gateways nodes in a cluster load balanced, and both the servers are slowing down after few of the service calls and not responding afterwards. We don't see high CPU or Memory usage, But the services are not working and also we are not able to connect through Policy Manager. We are getting error "Conection to the gateway has been broken". If we restart the servers everything seems to be fine for few minutes and again the issue comes up.

Cause

The setting causes context switching that increases under load and we have seen in some instance where the context switching can overwhelm the gateway sub systems. We are moving our gateway to have this setting disabled and should be done as a preventative measure in all environments
 

Environment

 

Gateway 9.2 CR 7

2 nodes Primary/Secondary

Only primary is in the Load Balancer

Resolution

Add the following setting tp the system.properties file 
/opt/SecureSpan/Gateway/node/default/etc/conf/system.properties
com.l7tech.server.log.console=false

Additional Information

[Analysis] 
************
Collected good DCT before test 
Collected DCT after server unresponsive -  packet trace 

Analysed/compare data 
Look to disable the log sink options to send to a remote syslog. At least do this as a test

Thread Name
SyslogMessageSender-UDP-syslogvhi.MYCOMPANY.COM/10.0.0.1:514
State
Blocked
The above is from the thread dump

Disable Oracle Audit - error occurs all throughout the log 
Unable to get jdbc connection datasource: Oracle_Audit_ReadOnly

Collected another DCT - packet trace 
Second data with Oracle out of picture and syslog - same pattern

SSG log threadID 8067 locks/consume cycles of the gateway process
2019-02-19T13:05:19.784-0600 WARNING 8067 com.l7tech.server.policy.assertion.ServerAuditDetailAssertion: -5: Request Headers: accept:*/*,
2019-02-19T13:06:56.389-0600 WARNING 8067 com.l7tech.server.policy.assertion.ServerAuditDetailAssertion: -5: Request Cookies: _
2019-02-19T13:07:58.392-0600 WARNING 8067 com.l7tech.server.policy.assertion.ServerAuditDetailAssertion: -5: Response Headers: 
2019-02-19T13:07:58.392-0600 WARNING 8067 com.l7tech.server.policy.assertion.ServerAuditDetailAssertion: -5: Response Cookies: 
2019-02-19T13:07:58.393-0600 WARNING 8067 com.l7tech.server.policy.assertion.ServerAuditDetailAssertion: -5: Request routed to: https://SERVER.MYCOMPANY.COM:443/private/payment/list?customerId=999999999999999
2019-02-19T13:07:58.393-0600 WARNING 8067 com.l7tech.server.message: Message processed successfully

DCT performs  separate backtrace dumps of the gateway process  20 seconds apart 

Dump 1 isolate thread 8067
"tomcat-exec-executor-408" #8067 daemon prio=5 os_prio=0 tid=0x00007f0530214000 nid=0x5bcf runnable [0x00007f0499b2f000]
java.lang.Thread.State: RUNNABLE
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:326)
at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:291)
at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:295)
at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:141)
- locked <0x00000005248ec1e0> (a java.io.OutputStreamWriter)
at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:229)
at java.util.logging.StreamHandler.flush(StreamHandler.java:259)
- eliminated <0x00000005248ec140> (a com.l7tech.server.log.ConsoleMessageSink$L7ConsoleHandler)
at com.l7tech.server.log.ConsoleMessageSink$L7ConsoleHandler.publish(Unknown Source)
- locked <0x00000005248ec140> (a com.l7tech.server.log.ConsoleMessageSink$L7ConsoleHandler)
.
.
at java.lang.Thread.run(Thread.java:748)

Locked ownable synchronizers:
- <0x00000007b0c2f000> (a java.util.concurrent.ThreadPoolExecutor$Worker)