Writing data from GPDB cluster to ECS with PXF meets error "Indicates that the version ID specified in the request does not match an existing version. "
search cancel

Writing data from GPDB cluster to ECS with PXF meets error "Indicates that the version ID specified in the request does not match an existing version. "

book

Article ID: 417846

calendar_today

Updated On:

Products

VMware Tanzu Greenplum

Issue/Introduction

This issue happens when the data directories in ECS bucket do not exist, either because they are manually deleted or cleaned up by some processes. 

Error message shows as follows:

[2025-11-09, 06:17:49 PST] {xxxxxx.py:xxxx} ERROR - Task failed with exception Traceback (most recent call last):
  File "/usr/xxx/xxx/operators/gpdb_xxx_xxxx.py", line 100, in execute
    cur.execute(sql)
psycopg2.errors.RaiseException: 08000 PXF server error : innerMkdirs on s3a://xxxx/xxx/xxx/xxxxx: com.amazonaws.services.s3.model.AmazonS3Exception: Indicates that the version ID specified in the request does not match an existing version. (Service: Amazon S3; Status Code: 404; Error Code: NoSuchVersion; Request ID: 0ae95c87:19a64bc2054:192:3f; S3 Extended Request ID: null; Proxy: null), S3 Extended Request ID: null  (seg2 10.xx.xx.xx:xxxx pid=1438325)

Environment

GPDB 6.x

PXF versions under 6.11.2

Cause

When Greenplum writes data to an external system, it first checks whether the target directory exists. If the directory is missing, every segment tries to create it, and normally this is harmless because creating an existing directory should do nothing.

But in S3-compatible systems, each directory-create request updates the directory's version. So when multiple segments try to create the directory at the same time, the version keeps changing. This leads to random failures, such as:

AmazonS3Exception: the version ID in the request does not match an existing version

This can happen even when another segment already successfully created the directory.

Resolution

PXF v6.11.2 and above now have the enhancement to handle this situation. The enhancement mainly does 3 things:

  1. Changes the logic so that only send one mkdir request per segment host if the intermediate directory does not already exist.
  2. Adds a random offset so that there are not multiple create requests at the same time across different PXF JVMs.
  3. Adds retry logic in place to ensure that should we hit such a failure during the requests, we will check again to see if the folder exists instead of throwing an error.