GPSS RabbitMQ Jobs Incorrectly Marked as SUCCESS When RabbitMQ Becomes Unavailable
search cancel

GPSS RabbitMQ Jobs Incorrectly Marked as SUCCESS When RabbitMQ Becomes Unavailable

book

Article ID: 432538

calendar_today

Updated On:

Products

VMware Tanzu Data VMware Tanzu Data Intelligence VMware Tanzu Greenplum VMware Tanzu Greenplum / Gemfire

Issue/Introduction

When using RabbitMQ as a source in Greenplum Stream Server (GPSS), jobs that are actively consuming data may unexpectedly transition to a SUCCESS state if the RabbitMQ broker becomes unavailable (for example, if the broker is stopped, the port is blocked, or network connectivity is lost).

Under normal operation, GPSS jobs continuously poll the RabbitMQ queue and remain in a RUNNING state even when the queue is temporarily empty. The job resumes processing automatically when new messages arrive.

However, if RabbitMQ becomes unavailable during job execution, the job may stop processing and be incorrectly marked as SUCCESS, even though the upstream source is no longer reachable and the job did not complete its intended streaming operation.

In such scenarios:

  • The job transitions from RUNNING → SUCCESS.

  • No automatic retry occurs, even if retry parameters are configured in the job YAML.

  • When RabbitMQ becomes available again, the job does not restart automatically.

  • Manual job restart is required to resume processing.

This behavior may lead to a false positive success state, where the job appears completed while data ingestion has stopped.

Environment

GPDB 7.X

GPSS - 2.1.0

 

Cause

The issue occurs when the RabbitMQ connection is unexpectedly closed while the job is running.

When the queue reader detects that the upstream channel has been closed, the internal reader logic treats the closure as a normal termination condition rather than a failure scenario. As a result:

  1. The upstream reader exits gracefully.

  2. The current batch processing finishes.

  3. GPSS finalizes the job execution.

  4. The job status is recorded as SUCCESS.

Since the termination is not interpreted as an error, the retry logic configured in the job schedule is not triggered.

Log messages observed in this scenario may include entries like:

- The queue reader channel has been closed.

- The upstream source has closed.

- The job finished successfully.

 

Resolution

Workaround:

Currently, this behavior requires manual intervention to resume processing.

If RabbitMQ becomes unavailable and the GPSS job transitions to SUCCESS, administrators should:

  1. Verify that the RabbitMQ broker is reachable and functioning.

  2. Restart the GPSS job manually.

This re-establishes the connection to RabbitMQ and resumes data ingestion.

Permanent Fix:

This issue will be fixed in GPSS 2.3. Please keep a watch on Product release notes for the same.