The Log Management Health Dashboard reports that event archive failures are greater than zero. This metric indicates that one or more archive operations did not finish successfully.
Log Management stores log messages into a set of special files called indexes. There is a separate set of indexes for each partition. Newly received log messages are written into an active index. The active index is periodically closed for new writes and a new one is created.
Archiving runs in the background after the active index is closed. a small metadata file is written to the designated external archive location (NFS or on-premises S3-compatible object storage), then the log store creates a snapshot of the just-closed index in the archive repository for that log partition. Failures can happen if external storage or the log store is unhealthy, if the index is not ready for a snapshot, if the metadata file cannot be written, or if the snapshot step fails (including when a snapshot grows too large).
Event archive failures can stem from one or more of the following:
On the Log Management Health Dashboard, confirm the alert and use logs for your deployment to tie failures to a time range, log partition, or index name as mentioned below under “Logs to look for”.
Normal progress (informational)
Shard readiness and retries (often transient)
Hard failures
External storage and metadata upload
Archive operations may be retried automatically for certain short-lived conditions (for example, unassigned shards on the index). Retries are limited; ongoing failures need investigation.