CAPM Web is available but not accepting logins.

book

Article ID: 208382

calendar_today

Updated On:

Products

CA Performance Management - Usage and Administration DX NetOps

Issue/Introduction

CAPM Web is available but won't accept logins.  A restart of the 4 web services resolves the issue temporarily. 

Seeing the following errors in the DM Service wrapper log 

INFO   | jvm 1    | 2021/02/10 05:50:20 | Caused by: java.net.SocketTimeoutException: SocketTimeoutException invoking http://localhost:8481/dm/cachecallbacks/DATA_SOURCE/https%3A%2F%2Faustx-capc-web-01%3A8182%2Fpc%2Fcenter%2Fwebservice%2Finvalcache: Read timed ou

INFO   | jvm 1    | 2021/02/08 07:27:45 | ERROR | Register Cache Callbacks Scheduler-4 | 2021-02-08 07:27:45,856 | org.springframework.scheduling.support.TaskUtils$LoggingErrorHandler
INFO   | jvm 1    | 2021/02/08 07:27:45 |       | Unexpected error occurred in scheduled task.
INFO   | jvm 1    | 2021/02/08 07:27:45 | org.apache.cxf.jaxrs.client.ClientWebApplicationException: org.apache.cxf.interceptor.Fault: Could not send Message.
INFO   | jvm 1    | 2021/02/08 07:27:45 |     at org.apache.cxf.jaxrs.client.AbstractClient.checkClientException(AbstractClient.java:485)
INFO   | jvm 1    | 2021/02/08 07:27:45 |     at org.apache.cxf.jaxrs.client.AbstractClient.preProcessResult(AbstractClient.java:472)
INFO   | jvm 1    | 2021/02/08 07:27:45 |     at org.apache.cxf.jaxrs.client.ClientProxyImpl.doChainedInvocation(ClientProxyImpl.java:524)
INFO   | jvm 1    | 2021/02/08 07:27:45 |     at org.apache.cxf.jaxrs.client.ClientProxyImpl.invoke(ClientProxyImpl.java:198)
INFO   | jvm 1    | 2021/02/08 07:27:45 |     at com.sun.proxy.$Proxy52.registerCacheCallback(Unknown Source)
INFO   | jvm 1    | 2021/02/08 07:27:45 |     at com.ca.im.portal.services.sync.InvalidateCacheRSImpl.registerCacheCallback(InvalidateCacheRSImpl.java:65)
INFO   | jvm 1    | 2021/02/08 07:27:45 |     at com.ca.im.portal.api.services.item.GroupEntryCache.registerInvalidateCacheCallback(GroupEntryCache.java:101)
INFO   | jvm 1    | 2021/02/08 07:27:45 |     at sun.reflect.GeneratedMethodAccessor149.invoke(Unknown Source)
INFO   | jvm 1    | 2021/02/08 07:27:45 |     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
INFO   | jvm 1    | 2021/02/08 07:27:45 |     at java.lang.reflect.Method.invoke(Method.java:498)
INFO   | jvm 1    | 2021/02/08 07:27:45 |     at org.springframework.scheduling.support.ScheduledMethodRunnable.run(ScheduledMethodRunnable.java:64)
INFO   | jvm 1    | 2021/02/08 07:27:45 |     at org.springframework.scheduling.support.DelegatingErrorHandlingRunnable.run(DelegatingErrorHandlingRunnable.java:53)
INFO   | jvm 1    | 2021/02/08 07:27:45 |     at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
INFO   | jvm 1    | 2021/02/08 07:27:45 |     at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
INFO   | jvm 1    | 2021/02/08 07:27:45 |     at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
INFO   | jvm 1    | 2021/02/08 07:27:45 |     at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
INFO   | jvm 1    | 2021/02/08 07:27:45 |     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
INFO   | jvm 1    | 2021/02/08 07:27:45 |     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
INFO   | jvm 1    | 2021/02/08 07:27:45 |     at java.lang.Thread.run(Thread.java:748)
INFO   | jvm 1    | 2021/02/08 07:27:45 | Caused by: org.apache.cxf.interceptor.Fault: Could not send Message.
INFO   | jvm 1    | 2021/02/08 07:27:45 |     at org.apache.cxf.interceptor.MessageSenderInterceptor$MessageSenderEndingInterceptor.handleMessage(MessageSenderInterceptor.java:64)

INFO   | jvm 2    | 2021/02/08 07:40:28 | WARN  | [email protected]@7f2faf96{HTTP/1.1,[http/1.1]}{0.0.0.0:8481} | 2021-02-08 07:40:28,098 | org.eclipse.jetty.server.AbstractConnector                       
INFO   | jvm 2    | 2021/02/08 07:40:28 |       | 
INFO   | jvm 2    | 2021/02/08 07:40:28 | java.io.IOException: Too many open files
INFO   | jvm 2    | 2021/02/08 07:40:28 |     at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
INFO   | jvm 2    | 2021/02/08 07:40:28 |     at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
INFO   | jvm 2    | 2021/02/08 07:40:28 |     at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
INFO   | jvm 2    | 2021/02/08 07:40:28 |     at org.eclipse.jetty.server.ServerConnector.accept(ServerConnector.java:385)
INFO   | jvm 2    | 2021/02/08 07:40:28 |     at org.eclipse.jetty.server.AbstractConnector$Acceptor.run(AbstractConnector.java:648)
INFO   | jvm 2    | 2021/02/08 07:40:28 |     at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:698)
INFO   | jvm 2    | 2021/02/08 07:40:28 |     at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:804)
INFO   | jvm 2    | 2021/02/08 07:40:28 |     at java.lang.Thread.run(Thread.java:748)


Checking the amount of connections to the DM Service using the command below showed over a thousand and most in a CLOSE_WAIT state 

netstat -an | grep 8481

Cause

A fire wall is closing connections to the mySql server. These closed connections are causing the DM service to keep these connections in a CLOSE_WAIT state. This ends up eventually locking up the DM Service. 

Environment

Release : 20.2

Component : IM Reporting / Admin / Configuration

Resolution

We need to change the following setting on all Vertica nodes and the DA's. 

You can change the settings by doing the following as root. Note that for the new settings to take effect, you must restart the process:
# echo 600 > /proc/sys/net/ipv4/tcp_keepalive_time
# echo 60 > /proc/sys/net/ipv4/tcp_keepalive_intvl
# echo 20 > /proc/sys/net/ipv4/tcp_keepalive_probes

Use sysctl to change them and make them persistent. You must make these changes on all Vertica nodes and relevant SQL clients:
in /etc/sysctl.conf
net.ipv4.tcp_keepalive_intvl = 60
net.ipv4.tcp_keepalive_probes = 20
net.ipv4.tcp_keepalive_time = 600