CAPM Web is available but won't accept logins. A restart of the 4 web services resolves the issue temporarily.
Seeing the following errors in the DM Service wrapper log
INFO | jvm 1 | 2021/02/10 05:50:20 | Caused by: java.net.SocketTimeoutException: SocketTimeoutException invoking http://localhost:8481/dm/cachecallbacks/DATA_SOURCE/https%3A%2F%2Faustx-capc-web-01%3A8182%2Fpc%2Fcenter%2Fwebservice%2Finvalcache: Read timed ou
INFO | jvm 1 | 2021/02/08 07:27:45 | ERROR | Register Cache Callbacks Scheduler-4 | 2021-02-08 07:27:45,856 | org.springframework.scheduling.support.TaskUtils$LoggingErrorHandler
INFO | jvm 1 | 2021/02/08 07:27:45 | | Unexpected error occurred in scheduled task.
INFO | jvm 1 | 2021/02/08 07:27:45 | org.apache.cxf.jaxrs.client.ClientWebApplicationException: org.apache.cxf.interceptor.Fault: Could not send Message.
INFO | jvm 1 | 2021/02/08 07:27:45 | at org.apache.cxf.jaxrs.client.AbstractClient.checkClientException(AbstractClient.java:485)
INFO | jvm 1 | 2021/02/08 07:27:45 | at org.apache.cxf.jaxrs.client.AbstractClient.preProcessResult(AbstractClient.java:472)
INFO | jvm 1 | 2021/02/08 07:27:45 | at org.apache.cxf.jaxrs.client.ClientProxyImpl.doChainedInvocation(ClientProxyImpl.java:524)
INFO | jvm 1 | 2021/02/08 07:27:45 | at org.apache.cxf.jaxrs.client.ClientProxyImpl.invoke(ClientProxyImpl.java:198)
INFO | jvm 1 | 2021/02/08 07:27:45 | at com.sun.proxy.$Proxy52.registerCacheCallback(Unknown Source)
INFO | jvm 1 | 2021/02/08 07:27:45 | at com.ca.im.portal.services.sync.InvalidateCacheRSImpl.registerCacheCallback(InvalidateCacheRSImpl.java:65)
INFO | jvm 1 | 2021/02/08 07:27:45 | at com.ca.im.portal.api.services.item.GroupEntryCache.registerInvalidateCacheCallback(GroupEntryCache.java:101)
INFO | jvm 1 | 2021/02/08 07:27:45 | at sun.reflect.GeneratedMethodAccessor149.invoke(Unknown Source)
INFO | jvm 1 | 2021/02/08 07:27:45 | at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
INFO | jvm 1 | 2021/02/08 07:27:45 | at java.lang.reflect.Method.invoke(Method.java:498)
INFO | jvm 1 | 2021/02/08 07:27:45 | at org.springframework.scheduling.support.ScheduledMethodRunnable.run(ScheduledMethodRunnable.java:64)
INFO | jvm 1 | 2021/02/08 07:27:45 | at org.springframework.scheduling.support.DelegatingErrorHandlingRunnable.run(DelegatingErrorHandlingRunnable.java:53)
INFO | jvm 1 | 2021/02/08 07:27:45 | at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
INFO | jvm 1 | 2021/02/08 07:27:45 | at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
INFO | jvm 1 | 2021/02/08 07:27:45 | at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
INFO | jvm 1 | 2021/02/08 07:27:45 | at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
INFO | jvm 1 | 2021/02/08 07:27:45 | at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
INFO | jvm 1 | 2021/02/08 07:27:45 | at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
INFO | jvm 1 | 2021/02/08 07:27:45 | at java.lang.Thread.run(Thread.java:748)
INFO | jvm 1 | 2021/02/08 07:27:45 | Caused by: org.apache.cxf.interceptor.Fault: Could not send Message.
INFO | jvm 1 | 2021/02/08 07:27:45 | at org.apache.cxf.interceptor.MessageSenderInterceptor$MessageSenderEndingInterceptor.handleMessage(MessageSenderInterceptor.java:64)
INFO | jvm 2 | 2021/02/08 07:40:28 | WARN | [email protected]@7f2faf96{HTTP/1.1,[http/1.1]}{0.0.0.0:8481} | 2021-02-08 07:40:28,098 | org.eclipse.jetty.server.AbstractConnector
INFO | jvm 2 | 2021/02/08 07:40:28 | |
INFO | jvm 2 | 2021/02/08 07:40:28 | java.io.IOException: Too many open files
INFO | jvm 2 | 2021/02/08 07:40:28 | at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
INFO | jvm 2 | 2021/02/08 07:40:28 | at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
INFO | jvm 2 | 2021/02/08 07:40:28 | at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
INFO | jvm 2 | 2021/02/08 07:40:28 | at org.eclipse.jetty.server.ServerConnector.accept(ServerConnector.java:385)
INFO | jvm 2 | 2021/02/08 07:40:28 | at org.eclipse.jetty.server.AbstractConnector$Acceptor.run(AbstractConnector.java:648)
INFO | jvm 2 | 2021/02/08 07:40:28 | at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:698)
INFO | jvm 2 | 2021/02/08 07:40:28 | at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:804)
INFO | jvm 2 | 2021/02/08 07:40:28 | at java.lang.Thread.run(Thread.java:748)
Checking the amount of connections to the DM Service using the command below showed over a thousand and most in a CLOSE_WAIT state
netstat -an | grep 8481
A fire wall is closing connections to the mySql server. These closed connections are causing the DM service to keep these connections in a CLOSE_WAIT state. This ends up eventually locking up the DM Service.
Release : 20.2
Component : IM Reporting / Admin / Configuration
We need to change the following setting on all Vertica nodes and the DA's.
You can change the settings by doing the following as root. Note that for the new settings to take effect, you must restart the process:
# echo 600 > /proc/sys/net/ipv4/tcp_keepalive_time
# echo 60 > /proc/sys/net/ipv4/tcp_keepalive_intvl
# echo 20 > /proc/sys/net/ipv4/tcp_keepalive_probes
Use sysctl to change them and make them persistent. You must make these changes on all Vertica nodes and relevant SQL clients:
in /etc/sysctl.conf
net.ipv4.tcp_keepalive_intvl = 60
net.ipv4.tcp_keepalive_probes = 20
net.ipv4.tcp_keepalive_time = 600