NSX Application Platform shows Degraded status and ANALYTICS_SERVICE is down after upgrade
search cancel

NSX Application Platform shows Degraded status and ANALYTICS_SERVICE is down after upgrade

book

Article ID: 319052

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

Symptoms:
1. After an NSX Application Platform upgrade to 4.1.1, the NSX UI shows NAPP status is Degraded and ANALYTICS_SERVICE is down. 


2. The nsx-config pod is stuck with Status: Init 4/5.
root@nsxmgr:~# napp-k get pods | grep nsx-config | grep -v metrics
NAME                                                     READY   STATUS      RESTARTS         AGE
nsx-config-XXXXXXXX-XXXXX                  0/1        Init:4/5      33 (5m15s ago)   4h53m


3. Command napp-k logs <nsx-config pod name> -c wait-for-druid-supervisor-ready shows an error similar to:

INFO:root:Supervisor: pace2druid_policy_intent_config status: UNHEALTHY_SUPERVISOR
INFO:root:Supervisor pace2druid_policy_intent_config is not ready

or
INFO:root:Supervisor: pace2druid_manager_realization_config status: UNHEALTHY_SUPERVISOR
INFO:root:Supervisor pace2druid_manager_realization_config is not ready



4. Run napp-k get pods | grep druid-overlord to get the name of the druid-overlord pod
napp-k logs <druid-overlord pod name>, shows warning similar to:

2023-07-25T08:05:08,492 WARN [IndexTaskClient-pace2druid_manager_realization_config-0] org.apache.druid.indexing.common.IndexTaskClient - submitRequest failed for [https://X.X.X.X:8104/druid/worker/v1/chat/index_kafka_pace2druid_manager_realization_config_XXXXXXXXXXXX_cainaphn/status]
java.net.ConnectException: Connection timed out (Connection timed out)


Cause

There's a chance for this issue to occur upon upgrading to 4.1.1, or when all druid pods are restarted at the same time.

Resolution

There is currently no resolution for this issue. Workaround steps should be followed at this time.

Workaround:
Workaround this issue with the following steps from the root shell of an NSX Manager:

Step 1. Run napp-k get pods | grep druid-overlord to get the name of the druid-overlord pod

Step 2. Restart the druid-overlord pod by running napp-k delete <druid-overlord pod name>

Additional Information

Impact/Risks:
Users will not be able to use NAPP and NSX Intelligence while this issue is present.