We are deploying API Portal 4.4 with Docker swarm. We are having lots of problems with portal_rabbitmq service. It tries continuously attempt to start, but it seems there is some problems and not being to able to connect rabbitmq.
The logs report the following errors:
Jan 20 13:00:53 aes000a103123 2aa7df3c03dd[20872]: [QueryServer.log] [2020-01-20 12:00:53,449] [INFO ] [o.s.a.r.c.CachingConnectionFactory] [org.springframework.amqp.rabbit.RabbitListenerEndpointContainer#0-26649] [|] [|||] Attempting to connect to: rabbitmq:5672
Jan 20 13:00:54 aes000a103123 2aa7df3c03dd[20872]: [system-portal.log] [2020-01-20 12:00:54,051] [INFO ] [c.c.a.d.a.ApplicationTransactionInterceptor] [http-nio-9045-exec-4] [|] [||curl/7.64.0|] transaction entry
Jan 20 13:00:54 aes000a103123 2aa7df3c03dd[20872]: [system-portal.log] [2020-01-20 12:00:54,053] [INFO ] [c.c.a.d.a.ApplicationTransactionInterceptor] [http-nio-9045-exec-4] [|] [||curl/7.64.0|] Total time taken for Request URL::http://localhost:9045/v1/echo:: Time Taken=2ms
Jan 20 13:00:54 aes000a103123 2aa7df3c03dd[20872]: [system-portal.log] [2020-01-20 12:00:54,053] [INFO ] [c.c.a.d.a.ApplicationTransactionInterceptor] [http-nio-9045-exec-4] [|] [||curl/7.64.0|] transaction exit
Jan 20 13:00:54 aes000a103123 3723f0cbda7a[20872]: 127.0.0.1 - - [20/Jan/2020:12:00:54 +0000] "GET /nginx_status HTTP/1.1" 200 109 "-" "curl/7.61.1"
Jan 20 13:00:54 aes000a103123 3723f0cbda7a[20872]: 2020/01/20 12:00:54 [info] 38#38: *19793 client 127.0.0.1 closed keepalive connection
Jan 20 13:00:54 aes000a103123 b22338b8f42e[20872]: [system-portal.log] [2020-01-20 12:00:54,873] [INFO ] [c.c.a.p.t.v.s.HealthCheck] [http-nio-9001-exec-6] [|] [|||] HealthStatus : UP {}
Jan 20 13:00:58 aes000a103123 2aa7df3c03dd[20872]: [QueryServer.log] [2020-01-20 12:00:58,470] [INFO ] [o.s.a.r.l.SimpleMessageListenerContainer] [org.springframework.amqp.rabbit.RabbitListenerEndpointContainer#0-26649] [|] [|||] Restarting Consumer@3a988fa: tags=[[]], channel=null, acknowledgeMode=AUTO local queue size=0
Jan 20 13:00:58 aes000a103123 2aa7df3c03dd[20872]: [QueryServer.log] [2020-01-20 12:00:58,472] [INFO ] [o.s.a.r.c.CachingConnectionFactory] [org.springframework.amqp.rabbit.RabbitListenerEndpointContainer#0-26650] [|] [|||] Attempting to connect to: rabbitmq:5672
Jan 20 13:00:58 aes000a103123 2aa7df3c03dd[20872]: [QueryServer.log] [2020-01-20 12:00:58,472] [ERROR] [o.s.a.r.l.SimpleMessageListenerContainer] [org.springframework.amqp.rabbit.RabbitListenerEndpointContainer#0-26650] [|] [|||] Failed to check/redeclare auto-delete queue(s).
Jan 20 13:00:58 aes000a103123 2aa7df3c03dd[20872]: org.springframework.amqp.AmqpIOException: java.net.UnknownHostException: rabbitmq
Jan 20 13:00:58 aes000a103123 2aa7df3c03dd[20872]: at org.springframework.amqp.rabbit.support.RabbitExceptionTranslator.convertRabbitAccessException(RabbitExceptionTranslator.java:71)
Jan 20 13:00:58 aes000a103123 2aa7df3c03dd[20872]: at org.springframework.amqp.rabbit.connection.AbstractConnectionFactory.createBareConnection(AbstractConnectionFactory.java:530)
Jan 20 13:00:58 aes000a103123 2aa7df3c03dd[20872]: at org.springframework.amqp.rabbit.connection.CachingConnectionFactory.createConnection(CachingConnectionFactory.java:702)
Jan 20 13:00:58 aes000a103123 2aa7df3c03dd[20872]: at org.springframework.amqp.rabbit.connection.ConnectionFactoryUtils.createConnection(ConnectionFactoryUtils.java:214)
Jan 20 13:00:58 aes000a103123 2aa7df3c03dd[20872]: at org.springframework.amqp.rabbit.core.RabbitTemplate.doExecute(RabbitTemplate.java:2076)
Jan 20 13:00:58 aes000a103123 2aa7df3c03dd[20872]: at org.springframework.amqp.rabbit.core.RabbitTemplate.execute(RabbitTemplate.java:2050)
Jan 20 13:00:58 aes000a103123 2aa7df3c03dd[20872]: at org.springframework.amqp.rabbit.core.RabbitTemplate.execute(RabbitTemplate.java:2030)
Jan 20 13:00:58 aes000a103123 2aa7df3c03dd[20872]: at org.springframework.amqp.rabbit.core.RabbitAdmin.getQueueProperties(RabbitAdmin.java:403)
Jan 20 13:00:58 aes000a103123 2aa7df3c03dd[20872]: at org.springframework.amqp.rabbit.listener.AbstractMessageListenerContainer.attemptDeclarations(AbstractMessageListenerContainer.java:1777)
Jan 20 13:00:58 aes000a103123 2aa7df3c03dd[20872]: at org.springframework.amqp.rabbit.listener.AbstractMessageListenerContainer.redeclareElementsIfNecessary(AbstractMessageListenerContainer.java:1758)
Jan 20 13:00:58 aes000a103123 2aa7df3c03dd[20872]: at org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer$AsyncMessageProcessingConsumer.initialize(SimpleMessageListenerContainer.java:1195)
Jan 20 13:00:58 aes000a103123 2aa7df3c03dd[20872]: at org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer$AsyncMessageProcessingConsumer.run(SimpleMessageListenerContainer.java:1041)
Jan 20 13:00:58 aes000a103123 2aa7df3c03dd[20872]: at java.lang.Thread.run(Thread.java:748)
Jan 20 13:00:58 aes000a103123 2aa7df3c03dd[20872]: Caused by: java.net.UnknownHostException: rabbitmq
Jan 20 13:00:58 aes000a103123 2aa7df3c03dd[20872]: at java.net.InetAddress.getAllByName0(InetAddress.java:1281)
Jan 20 13:00:58 aes000a103123 2aa7df3c03dd[20872]: at java.net.InetAddress.getAllByName(InetAddress.java:1193)
Jan 20 13:00:58 aes000a103123 2aa7df3c03dd[20872]: at java.net.InetAddress.getAllByName(InetAddress.java:1127)
Jan 20 13:00:58 aes000a103123 2aa7df3c03dd[20872]: at com.rabbitmq.client.DnsRecordIpAddressResolver.resolveIpAddresses(DnsRecordIpAddressResolver.java:83)
Jan 20 13:00:58 aes000a103123 2aa7df3c03dd[20872]: at com.rabbitmq.client.DnsRecordIpAddressResolver.getAddresses(DnsRecordIpAddressResolver.java:73)
Jan 20 13:00:58 aes000a103123 2aa7df3c03dd[20872]: at com.rabbitmq.client.ConnectionFactory.newConnection(ConnectionFactory.java:1098)
Jan 20 13:00:58 aes000a103123 2aa7df3c03dd[20872]: at com.rabbitmq.client.ConnectionFactory.newConnection(ConnectionFactory.java:1054)
Jan 20 13:00:58 aes000a103123 2aa7df3c03dd[20872]: at com.rabbitmq.client.ConnectionFactory.newConnection(ConnectionFactory.java:1218)
Jan 20 13:00:58 aes000a103123 2aa7df3c03dd[20872]: at org.springframework.amqp.rabbit.connection.AbstractConnectionFactory.createBareConnection(AbstractConnectionFactory.java:492)
Jan 20 13:00:58 aes000a103123 2aa7df3c03dd[20872]: ... 11 common frames omitted
Release : 4.4
Component : API PORTAL
As I look at the logs, it seems the docker service "portal_rabbitmq" was not started properly.
We need to check whether the rabbitmq service was healthy or not.
We need the following command result from the customer.
Command:- "docker inspect --format='{{json .State.Health}}' <rabbitmq_containername>"
The above command will let us know whether rabbitmq conatiner was healthy or not . As the deployment is in docker swarm mode, so if the container is unhealthy docker swarm will kill and restart the container.
The problem appears to be related to message service, rabbitmq being timed out. By default it is set to 10 seconds.
10 seconds timeout should be more than sufficient; however, it seems that the portal server has a very low bandwidth connection which results in timeout.
For a workaround, you can increase the timeout which will help to keep the rabbbitmq message container healthy.
"docker service update --health-timeout=80s <rabbitmq_servicename>"
This will be set until the docker portal stack is removed