SSP Deployment Stuck at 88% Due to Debezium Pod Failing to Start
search cancel

SSP Deployment Stuck at 88% Due to Debezium Pod Failing to Start

book

Article ID: 394173

calendar_today

Updated On:

Products

VMware vDefend Firewall VMware vDefend Firewall with Advanced Threat Prevention

Issue/Introduction

Symptoms:

During the Security Services Platform (SSP) deployment (version 5.0), you might observe that the deployment process stalls or freezes at 88% completion. This is commonly caused by the debezium-onprem pod failing to start successfully due to a socket connection error with PostgreSQL.

How to Identify the Issue:

  1. Check Pod Status: SSH into the SSP Installer Appliance (SSP-I) VM using root credentials, then run:

    k get pods -n nsxi-platform | grep debezium-onprem

    You will likely see output similar to:

    nsxi-platform   debezium-onprem-5dd68cb978-7djdg   0/1   Running
     
    This indicates that the pod is stuck in a non-ready state (0/1 containers ready).
  2. View Pod Logs: Replace <pod-name> below with the exact name copied from the previous command:

    k logs -n nsxi-platform <pod-name>

    Look for errors similar to the following in the logs:

    [2025-03-28 18:28:36,100] ERROR Failed testing connection for jdbc:postgresql://postgresql-ha-postgresql-0...
    org.postgresql.util.PSQLException: The connection attempt failed.
     
  3. Verify SecOps Logs (Optional): You may also check the SSP-I appliance logs for additional context:

    less /var/log/secop/secops.log

    Look for Helm chart installation errors:

    ERROR install/install.go:89 Error occurred while installing chart
    ERROR deployment/chart.go:135 Failed to install helm chart
    WARN  deployment/chart.go:137 Chart installation failed
    Deployment/debezium-onprem failed to start. Current status: 0/1 available
     

Environment

Security Services Platform(SSP) 5.0

Cause

When the debezium-onprem pod starts, it attempts to connect to the PostgreSQL service using its internal FQDN. PostgreSQL, by default, performs a reverse IP lookup for incoming connections to log the client hostname.

However, at this stage of the SSP deployment:

  • Kubernetes CoreDNS may not have fully registered the debezium service yet.

  • The reverse DNS request is forwarded to the upstream DNS resolver.

  • Since debezium uses a private cluster IP, upstream DNS cannot resolve it.

  • Most DNS servers respond with NXDOMAIN (non-existent domain), which is handled gracefully.

  • But in some customer environments, the upstream DNS returns SERVFAIL instead.

If the DNS response is SERVFAIL, Debezium treats this as a critical error and fails to initialize, stalling the deployment at 88%.

This behavior depends on how your upstream DNS server handles reverse lookups for unknown internal IPs — a configuration outside SSP's direct control.

Resolution

You can work around this by disabling PostgreSQL's reverse hostname lookup feature.

When the deployment failed, retry the deployment. During the retry, when you see the PostgreSQL stateful set coming up find the stateful set for postgres & update the env POSTGRESQL_LOG_HOSTNAME to "false"

SSH into the SSP Installer Appliance (SSP-I) VM using root credentials:

1. Edit PostgreSQL StatefulSet:

k -n nsxi-platform edit sts postgresql-ha-postgresql

Locate the following environment variable in the YAML:
 
- name: POSTGRESQL_LOG_HOSTNAME
  value: "true"

Change "true" to "false" and save the file.

2. Restart PostgreSQL StatefulSet:

k -n nsxi-platform rollout restart statefulset postgresql-ha-postgresql
 
This allows Debezium to start without relying on reverse DNS lookups, and the deployment should proceed beyond 88%.
 
This issue has been addressed in SSP 5.1 where the Debezium startup no longer fails due to DNS reverse lookup behavior.