Prometheus-server pod got CrashLoopBackOff due to err="opening storage failed: get segment range: segments are not sequential"

search cancel

Prometheus-server pod got CrashLoopBackOff due to err="opening storage failed: get segment range: segments are not sequential"

book

Article ID: 388424

calendar_today

Updated On:

Products

Tanzu Kubernetes Grid

Issue/Introduction

Prometheus-server got CrashLoopBackOff status.

# kubectl -n tanzu-system-monitoring get pods
NAME                   READY   STATUS             RESTARTS         AGE
prometheus-server-###  1/2     CrashLoopBackOff   30 (3m32s ago)   132m

Error message

# kubectl -n tanzu-system-monitoring logs prometheus-server-### -c prometheus-server | grep error
ts=2025-02-27T05:31:26.411Z caller=main.go:1159 level=error err="opening storage failed: get segment range: segments are not sequential"

Environment

Tanzu Kubernetes Grid

Cause

The prometheus-server pod fails to start due to data inconsistencies in the /data/wal/ directory, which is stored in the persistent volume.

https://github.com/prometheus/prometheus/issues/5342

Resolution

Caution - Prometheus lost the past data(Data collected in the last few hours to a few days from WAL and active chunks).

Mount the persistent volume via a new pod and fix the data inconsistencies manually.

1. Stop the prometheus-server pod

namespace=tanzu-system-monitoring
kubectl -n $namespace scale deployment/prometheus-server --replicas=0

2. Create a new pod to mount the persistent-volume of the prometheus-server

pvc=prometheus-server
image=gcr.io/google-samples/node-hello:1.0
cat > recovery-pod-for-prometheus.yaml <<EOF
apiVersion: v1
kind: Pod
metadata:
  name: recovery-prometheus
  namespace: ${namespace}
spec:
  volumes:
  - name: prometheus-server-pv
    persistentVolumeClaim:
      claimName: ${pvc}
  containers:
  - name: hello-world
    image: ${image}
    command: [ "sleep", "365d" ]
    volumeMounts:
    - name: prometheus-server-pv
      mountPath: "/data"
EOF
kubectl apply -f recovery-pod-for-prometheus.yaml

3. Delete /data/wal/ + /data/chunks_head

kubectl -n $namespace exec -it recovery-prometheus -- bash
rm -r /data/wal/*
rm -r /data/chunks_head/*
exit

4. Resume the prometheus-server pod

# Delete the temporary pod
kubectl delete -f recovery-pod-for-prometheus.yaml

# prometheus-server: 0 --> 1
kubectl -n $namespace scale deployment/prometheus-server --replicas=1

# Check - prometheus-server pod is "Running"
kubectl -n $namespace get pods

Feedback

thumb_up Yes

thumb_down No