log-store-vms/xxxxx:~# monit summary Process 'cf-auth-proxy' running Process 'log-store' Does not exist Process 'loggregator_agent' running Process 'nozzle' running Process 'router' running Process 'prom_scraper' running Process 'route_registrar' running Process 'bosh-dns' running Process 'bosh-dns-resolvconf' running Process 'bosh-dns-healthcheck' running Process 'system-metrics-agent' running System 'system_0c547d6e-0396-4cc3-aaa8-138a0111c331' runningIn the
log-store.stderr.log,
error message as below:
{"level":"info","timestamp":"2022-02-18T04:42:18.030432279Z","caller":"tsm1/file_store.go:544","message":"Opened file","engine":"tsm1","service":"filestore","path":"/var/vcap/store/log-store/influxdb/21/data/logs/default/1641945600000000000/000000544-000000003.tsm","id":2,"duration":"4.747535ms"} panic: runtime error: slice bounds out of range [:-7]
The error message indicate the crash is in the influxdb. This has been reported to the influx GitHub https://github.com/influxdata/influxdb/issues/19916 and pending on upstream investigation and fix. For this moment, the only workaround is to remove affected file.
The workaround steps as below -
Step 1. SSH to the problematic log-store-vms instance VM and run "influx_inspect verify" command on issued influxdb directory, command as blow:
/var/vcap/packages/influx-inspect/influx_inspect verify -dir /var/vcap/store/log-store/influxdb/21
Step2. Remove the one after last healthy like following one 1641945600000000000/000000588-000000002.tsm.
/var/vcap/store/log-store/influxdb/21/data/logs/default/1641945600000000000/000000588-000000001.tsm: healthy
Once it's done, the log-store process should go into running state.
Note: The is the more effective way to have minimum data loss.