After upgrading PCF Metrics tile from 1.2 to 1.3.x, smoke-tests fail.
`curl localhost:9200/_cluster/health?pretty` reports cluster health with status red
Error Message:
Elasticsearch flow /tmp/build/8bf04831/metrics-app-dev-release-bumped/src/github.com/pivotal-cf/metrics-data/cmd/smoke_tests/elasticsearch_test.go:52 Ingests logs from firehose into elasticsearch [It] /tmp/build/8bf04831/metrics-app-dev-release-bumped/src/github.com/pivotal-cf/metrics-data/cmd/smoke_tests/elasticsearch_test.go:51 Never received app logs - something in the firehose -> elasticsearch flow is broken Summarizing 1 Failure: [Fail] Elasticsearch flow [It] Ingests logs from firehose into elasticsearch /tmp/build/8bf04831/metrics-app-dev-release-bumped/src/github.com/pivotal-cf/metrics-data/cmd/smoke_tests/elasticsearch_test.go:50
The elastic search indexes were created and when they were trying to replicate the upgrade happened and left them in a corrupt state.
The solution in this KB is a last resort if you are unable to fix elastic search master "red" status by restarting the app and other steps outlined in https://docs.pivotal.io/pcf-metrics/1-3/troubleshooting.html#smoke-test
DO NOT perform this procedure if many or all of the indices are in "red" status. This procedure is meant to address condition where a few indices are corrupt and stuck in "red" status.
Perform the steps:
1.) SSH to elasticsearch_master node:
$ bosh ssh elasticsearch_master/0
2.) Identify the indices with status red:
$ curl localhost:9200/_cat/indices?v | sort green open app_logs_1504677600 1 1 209948 0 35.8mb 17.9mb green open app_logs_1504699200 1 1 0 0 318b 159b green open app_logs_1504785600 1 1 0 0 318b 159b green open app_logs_1504807200 1 1 0 0 318b 159b health status index pri rep docs.count docs.deleted store.size pri.store.size red open app_logs_1504720800 1 1 red open app_logs_1504742400 1 1 red open app_logs_1504764000 1
3.) Delete the indices with status red:
$ curl -XDELETE http://localhost:9200/app_logs_1504720800
Note: This has potential to delete application log data. Do not execute if this logging is critical.