The PubSub pod may fail to activate, leading to a complete system degradation. Symptoms include the pod reporting zero ready replicas and frequent liveness probe failures. Intelligence UI doesn't load as a result
All NAPP environments
The main root cause is that pubsub is taking longer to come up and by the time it starts reporting health status, readiness probe declares it as dead.
Vmware by Broadcom is aware of this issue and the fix will be merged in a future release.
To workaround this issue, follow these steps:
(1) Edit the PubSub deployment by accessing NSX manager via SSH using root account :
napp-k edit deployment pubsub -n nsxi-platform
(2) Increase the initialDelaySeconds for both liveness and readiness probes from 180 to 300 seconds. Update the configurations as follows:
Change the liveness and readiness probe initialDelaySeconds from 180 to 300. '
livenessProbe:
failureThreshold: 3
httpGet:
path: /actuator/health/liveness
port: http
scheme: HTTP
initialDelaySeconds: 300 <-------------- update from 180 to 300.
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
name: pubsub
ports:
- containerPort: 8080
name: http
protocol: TCP
- containerPort: 8443
name: https
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /actuator/health/readiness
port: http
scheme: HTTP
initialDelaySeconds: 300 <-------------- update from 180 to 300.
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
This adjustment allows the PubSub pod additional time to become ready before the health checks are executed.