App Metrics 'log-store' keep failing with influxdb error "panic: runtime error: slice bounds out of range [:-7]"
search cancel

App Metrics 'log-store' keep failing with influxdb error "panic: runtime error: slice bounds out of range [:-7]"

book

Article ID: 298286

calendar_today

Updated On:

Products

VMware Tanzu Application Service for VMs

Issue/Introduction

Customer (with TAS: 2.11.13 / App Metrics : 2.1.2) reported the issue that one of the log-store VM keep getting failing, due to the log-store process issue.
log-store-vms/xxxxx:~# monit summary
Process 'cf-auth-proxy' running
Process 'log-store' Does not exist
Process 'loggregator_agent' running
Process 'nozzle' running
Process 'router' running
Process 'prom_scraper' running
Process 'route_registrar' running
Process 'bosh-dns' running
Process 'bosh-dns-resolvconf' running
Process 'bosh-dns-healthcheck' running
Process 'system-metrics-agent' running
System 'system_0c547d6e-0396-4cc3-aaa8-138a0111c331' running
In the log-store.stderr.log, error message as below:
{"level":"info","timestamp":"2022-02-18T04:42:18.030432279Z","caller":"tsm1/file_store.go:544","message":"Opened file","engine":"tsm1","service":"filestore","path":"/var/vcap/store/log-store/influxdb/21/data/logs/default/1641945600000000000/000000544-000000003.tsm","id":2,"duration":"4.747535ms"}
panic: runtime error: slice bounds out of range [:-7]


Environment

Product Version: 2.11

Resolution

The error message indicate the crash is in the influxdb. This has been reported to the influx GitHub https://github.com/influxdata/influxdb/issues/19916 and pending on upstream investigation and fix. For this moment, the only workaround is to remove affected file.

The workaround steps as below -
Step 1. SSH to the problematic log-store-vms instance VM and run "influx_inspect verify" command on issued influxdb directory, command as blow:

/var/vcap/packages/influx-inspect/influx_inspect verify -dir /var/vcap/store/log-store/influxdb/21

Step2. Remove the one after last healthy like following one 1641945600000000000/000000588-000000002.tsm.

/var/vcap/store/log-store/influxdb/21/data/logs/default/1641945600000000000/000000588-000000001.tsm: healthy

Once it's done, the log-store process should go into running state.

Note: The is the more effective way to have minimum data loss.