SSP Deployment Stuck at 88% Due to Debezium Pod Failing to Start

Products

VMware vDefend Firewall VMware vDefend Firewall with Advanced Threat Prevention

Issue/Introduction

Symptoms:

During the Security Services Platform (SSP) deployment (version 5.0), you might observe that the deployment process stalls or freezes at 88% completion. This is commonly caused by the debezium-onprem pod failing to start successfully due to a socket connection error with PostgreSQL.

How to Identify the Issue:

Check Pod Status: SSH into the SSP Installer Appliance (SSP-I) VM using root credentials, then run:

k get pods -n nsxi-platform | grep debezium-onprem

You will likely see output similar to:

nsxi-platform debezium-onprem-5dd68cb978-7djdg 0/1 Running

This indicates that the pod is stuck in a non-ready state (0/1 containers ready).
View Pod Logs: Replace <pod-name> below with the exact name copied from the previous command:

k logs -n nsxi-platform <pod-name>

Look for errors similar to the following in the logs:

[2025-03-28 18:28:36,100] ERROR Failed testing connection for jdbc:postgresql://postgresql-ha-postgresql-0...
org.postgresql.util.PSQLException: The connection attempt failed.
Verify SecOps Logs (Optional): You may also check the SSP-I appliance logs for additional context:

less /var/log/secop/secops.log
Look for Helm chart installation errors:

ERROR install/install.go:89 Error occurred while installing chart
ERROR deployment/chart.go:135 Failed to install helm chart
WARN deployment/chart.go:137 Chart installation failed
Deployment/debezium-onprem failed to start. Current status: 0/1 available

Environment

Security Services Platform(SSP) 5.0

Cause

When the debezium-onprem pod starts, it attempts to connect to the PostgreSQL service using its internal FQDN. PostgreSQL, by default, performs a reverse IP lookup for incoming connections to log the client hostname.

However, at this stage of the SSP deployment:

Kubernetes CoreDNS may not have fully registered the debezium service yet.
The reverse DNS request is forwarded to the upstream DNS resolver.
Since debezium uses a private cluster IP, upstream DNS cannot resolve it.
Most DNS servers respond with NXDOMAIN (non-existent domain), which is handled gracefully.
But in some customer environments, the upstream DNS returns SERVFAIL instead.

If the DNS response is SERVFAIL, Debezium treats this as a critical error and fails to initialize, stalling the deployment at 88%.

This behavior depends on how your upstream DNS server handles reverse lookups for unknown internal IPs — a configuration outside SSP's direct control.

Resolution

You can work around this by disabling PostgreSQL's reverse hostname lookup feature.

When the deployment failed, retry the deployment. During the retry, when you see the PostgreSQL stateful set coming up find the stateful set for postgres & update the env POSTGRESQL_LOG_HOSTNAME to "false"

SSH into the SSP Installer Appliance (SSP-I) VM using root credentials:

1. Edit PostgreSQL StatefulSet:

2. Restart PostgreSQL StatefulSet:

This allows Debezium to start without relying on reverse DNS lookups, and the deployment should proceed beyond 88%.