search cancel

After upgrading to DX APM 22.1 version pods are not starting

book

Article ID: 255766

calendar_today

Updated On:

Products

DX Operational Intelligence

Issue/Introduction

K8s cluster on premise DX APM 21.3 HF1 upgraded to the DX APM 22.1 version. The upgrade seems successful. Also able to login as master admin and tenant admin. Later in the day application gave problems, and it was not possible to access Admin UI, We stopped and started the DX APM platform through ./dx-admin, after this there are many PODs that are not starting. We performed stop and start multiple times via ./dx-admin, but issue continue to occur. jarvis-elasticsearch POD is showing following error message.

[WARN ][o.e.d.SeedHostsResolver  ] [jarvis-elasticsearch] failed to resolve host [jarvis-elasticsearch]
java.net.UnknownHostException: jarvis-elasticsearch
    at java.net.InetAddress$CachedAddresses.get(InetAddress.java:797) ~[?:?]
    at java.net.InetAddress.getAllByName0(InetAddress.java:1519) ~[?:?]
    at java.net.InetAddress.getAllByName(InetAddress.java:1378) ~[?:?]
    at java.net.InetAddress.getAllByName(InetAddress.java:1306) ~[?:?]
    at org.elasticsearch.transport.TcpTransport.parse(TcpTransport.java:597) ~[elasticsearch-7.16.3.jar:7.16.3]
    at org.elasticsearch.transport.TcpTransport.addressesFromString(TcpTransport.java:539) ~[elasticsearch-7.16.3.jar:7.16.3]
    at org.elasticsearch.transport.TransportService.addressesFromString(TransportService.java:1111) ~[elasticsearch-7.16.3.jar:7.16.3]
    at org.elasticsearch.discovery.SeedHostsResolver.lambda$resolveHostsLists$0(SeedHostsResolver.java:152) ~[elasticsearch-7.16.3.jar:7.16.3]
    at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:718) ~[elasticsearch-7.16.3.jar:7.16.3]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
    at java.lang.Thread.run(Thread.java:829) [?:?]

Environment

Release : 22.1

Resolution

Even though jarvis-elasticsearch PODs were running, but there were errors about connection to host.
It seems like the communication between the 3 jarvis-elasticsearch nodes are not getting established properly.
Took the following steps on jarvis-elasticsearch nodes. ran the command.
modprobe br_netfilter
scale down and up the jarvis-elasticsearch deployment.
This resolved the issue on 2 nodes, but 3rd node continue to show the issue.
Took the following steps on 3rd node.
systemctl restart docker
systemctl restart kubelet
modprobe br_netfilter
scale down and up the jarvis-elasticsearch deployment.
This resolve the issue with jarvis-elasticsearch node communication.
After few minutes the other PODs started running. After few more minutes we tried to login to the DX platform and able to login as master admin and tenant admin.