Gemfire: A DiskAccessException has occurred while writing to the disk for disk store.......The cache will be closed. org.apache.geode.cache.DiskAccessException:
search cancel

Gemfire: A DiskAccessException has occurred while writing to the disk for disk store.......The cache will be closed. org.apache.geode.cache.DiskAccessException:

book

Article ID: 440713

calendar_today

Updated On:

Products

VMware Tanzu Data Suite

Issue/Introduction

After upgrading to Gemfire 10.0.8, some nodes show the error below, and do not show up in the output of gfsh> list members even if the process is up.

A DiskAccessException has occurred while writing to the disk for disk store [DISK_STORE_NAME]. The cache will be closed.
org.apache.geode.cache.DiskAccessException: For Region: /__PR/_B__[REGION_NAME]_[BUCKET_ID]: Failed reading from /[BASE_DIR]/[PROFILE_NAME]/data/[CLUSTER_NAME]/servers/[HOST_NAME].[SERVER_NAME]/[DISK_STORE_NAME]/BACKUP[DISK_STORE_NAME]_[OPLOG_ID].  oplogID, [OPLOG_ID] Offset being read=9038402 Current Oplog Size=10327851 Actual File Size,10327851 IS ASYNCH MODE,false IS ASYNCH WRITER ALIVE=false, caused by java.io.IOException: Input/output error
 at gemfire//org.apache.geode.internal.cache.Oplog.basicGetForCompactor(Oplog.java:5480)
 at gemfire//org.apache.geode.internal.cache.Oplog.getBytesAndBitsForCompaction(Oplog.java:4143)
 at gemfire//org.apache.geode.internal.cache.Oplog.compact(Oplog.java:5940)
 at gemfire//org.apache.geode.internal.cache.DiskStoreImpl$OplogCompactor.compact(DiskStoreImpl.java:2920)
 at gemfire//org.apache.geode.internal.cache.DiskStoreImpl$OplogCompactor.run(DiskStoreImpl.java:2980)
 at gemfire//org.apache.geode.internal.cache.DiskStoreImpl$2.run(DiskStoreImpl.java:4563)
 at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
 at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
 at java.base/java.lang.Thread.run(Thread.java:842)
Caused by: java.io.IOException: Input/output error
 at java.base/java.io.RandomAccessFile.readBytes(Native Method)
 at java.base/java.io.RandomAccessFile.read(RandomAccessFile.java:405)
 at java.base/java.io.RandomAccessFile.readFully(RandomAccessFile.java:469)
 at gemfire//org.apache.geode.internal.cache.persistence.UninterruptibleRandomAccessFile.readFully(UninterruptibleRandomAccessFile.java:95)
 at gemfire//org.apache.geode.internal.cache.persistence.UninterruptibleRandomAccessFile.readFully(UninterruptibleRandomAccessFile.java:89)
 at gemfire//org.apache.geode.internal.cache.Oplog.basicGetForCompactor(Oplog.java:5457)

Environment

All Gemfire 10.x.x versions 

Cause

The primary failure was a java.io.IOException: Input/output error encountered during the operation of the OplogCompactor for the disk store.

Specific Error: Failed reading from the operation log file: /[BASE_DIR]/[PROFILE_NAME]/data/[CLUSTER_NAME]/servers/[HOST_NAME].[SERVER_NAME]/[DISK_STORE_NAME]/BACKUP[DISK_STORE_NAME]_[OPLOG_ID]

GemFire is designed to close the cache automatically when a disk access exception occurs to prevent data corruption. This led to a cascading shutdown of all distribution managers and membership services.

Resolution

 

  • Investigate the underlying host hardware for disk health issues or file system corruption at the directory path: /[BASE_DIR]/[PROFILE_NAME]/data/[CLUSTER_NAME]/servers/[HOST_NAME].[SERVER_NAME]/[DISK_STORE_NAME]/

  • Before attempting to restart the node and after ensuring file system health, validate the offline disk store mentioned in the logs by running the following command:

    • gfsh validate offline-disk-store --name=[DISK_STORE_NAME] --disk-dirs=/[BASE_DIR]/[PROFILE_NAME]/data/[CLUSTER_NAME]/servers/[HOST_NAME].[SERVER_NAME]/[DISK_STORE_NAME]/