Portal Service metrics unavailable after reboot of docker host on portal 4.4

Products

CA API Developer Portal

Issue/Introduction

API Developer Portal 4.4 after rebooting the system the customer found the Metrics for API usage is not showing in the API portal analytics from the time the reboot was done.

The Docker journal log shows

portal_coordinator.heulwy6io17vwij501ebwo5id.i72mux5oezrfmrv4gwpdfwp55 2020-09-03 03:45:07 UTC 2020-09-03T03:45:07,629 ERROR [KafkaSupervisor-apim_metrics_hour] org.apache.druid.indexing.seekablestream.supervisor.SeekableStreamSupervisor - SeekableStreamSupervisor[apim_metrics_hour] failed to handle notice: {class=org.apache.druid.indexing.seekablestream.supervisor.SeekableStreamSupervisor, exceptionType=class org.apache.druid.java.util.common.ISE, exceptionMessage=Previous sequenceNumber [343522] is no longer available for partition [0]. You can clear the previous sequenceNumber and start reading from a valid message by using the supervisor's reset API., noticeClass=RunNotice}

Environment

Release : 4.4

Component : API PORTAL

Resolution

This error could occur after a reboot of Druid where not all containers are stopped in the right sequence.

To resolve this error check the portal coordinator container for the following error

portal_coordinator.heulwy6io17vwij501ebwo5id.i72mux5oezrfmrv4gwpdfwp55 2020-09-03 03:45:07 UTC 2020-09-03T03:45:07,629 ERROR [KafkaSupervisor-apim_metrics_hour] org.apache.druid.indexing.seekablestream.supervisor.SeekableStreamSupervisor - SeekableStreamSupervisor[apim_metrics_hour] failed to handle notice: {class=org.apache.druid.indexing.seekablestream.supervisor.SeekableStreamSupervisor, exceptionType=class org.apache.druid.java.util.common.ISE, exceptionMessage=Previous sequenceNumber [343522] is no longer available for partition [0]. You can clear the previous sequenceNumber and start reading from a valid message by using the supervisor's reset API., noticeClass=RunNotice}

or

portal_coordinator.heulwy6io17vwij501ebwo5id.i72mux5oezrfmrv4gwpdfwp55 2020-09-03 03:45:07 UTC 2020-09-03T03:45:07,629 ERROR [KafkaSupervisor-apim_metrics org.apache.druid.indexing.seekablestream.supervisor.SeekableStreamSupervisor - SeekableStreamSupervisor[apim_metrics_hour] failed to handle notice: {class=org.apache.druid.indexing.seekablestream.supervisor.SeekableStreamSupervisor, exceptionType=class org.apache.druid.java.util.common.ISE, exceptionMessage=Previous sequenceNumber [343522] is no longer available for partition [0]. You can clear the previous sequenceNumber and start reading from a valid message by using the supervisor's reset API., noticeClass=RunNotice}

There are two supervisor task which could stop processing metrics "apim_metrics_hour"and "apim_metrics"

Solution :

start a shell into the portal portal_coordinator container :

docker exec -it $(docker ps --filter name=portal_coordinator -q) /bin/sh

Run the following command to get the current supervisor running

$ curl -X GET http://localhost:8081/druid/indexer/v1/supervisor
["apim_metrics_hour","apim_metrics"]

run the following curl command to get the current status for the metrics indexer

curl -X GET -H 'Content-Type:application/json' http://localhost:8081/druid/indexer/v1/supervisor/apim_metrics_hour/status

{"id":"apim_metrics_hour","generationTime":"2020-09-04T12:46:04.039Z","payload":{"dataSource":"apim_metrics_hour","stream":"apim_metrics","partitions":1,"replicas":1,"durationSeconds":3600,"activeTasks":[{"id":"index_kafka_apim_metrics_hour_f9c8b2384a92f14_ehgflkec","startingOffsets":{"0":672},"startTime":"2020-09-04T12:36:12.123Z","remainingSeconds":3008,"type":"ACTIVE","currentOffsets":{"0":672},"lag":{"0":0}}],"publishingTasks":[],"latestOffsets":{"0":672},"minimumLag":{"0":0},"aggregateLag":0,"offsetsLastUpdated":"2020-09-4T12:45:49.406Z","suspended":false,"healthy":true,"state":"RUNNING","detailedState":"RUNNING","c

Run the following command to reset the supervisor for "apim_metrics_hour"

curl -X POST http://localhost:8081/druid/indexer/v1/supervisor/apim_metrics_hour/reset

Run the following command to reset the supervisor for "apim_metrics"

curl -X POST http://localhost:8081/druid/indexer/v1/supervisor/apim_metrics/reset

Close the shell

Verify the portal_coordinator log file and check if the above error does not occur anymore

Verify the analytics dashboard, new Api data should be available again .