Failed Recommendation Spark Job Driver Pod Not Deleted
search cancel

Failed Recommendation Spark Job Driver Pod Not Deleted

book

Article ID: 370444

calendar_today

Updated On:

Products

VMware vDefend Firewall with Advanced Threat Prevention

Issue/Introduction

The failed recommendation driver pod is not removed and remains in the cluster indefinitely.
The TTL (time to live) of a Spark application is 2 hours, as specified in the Spark Operator API documentation.This means that a terminated pod will be removed after 2 hours. The deletion of a Spark application will remove all its driver pods and executor pods. However, we have observed scenarios where the Spark application is deleted but the associated pod is not removed, resulting in the failed recommendation Spark job pod remaining in the cluster indefinitely.

 

To validate this, follow the steps below.

(a) Get the Driver Pod :
On the NSX Manager, run the following command:

napp-k get pods | grep rec-

You should find the recommendation Spark job driver pod with the prefix 'rec-' and the suffix '-driver', as in this example:

napp-k get pods | grep rec-

NAMESPACE    NAME                                           READY   STATUS        RESTARTS        AGE 
rec-8db20510-1d2e-11ef-af60-4d5b491b64d2-driver             0/1     Error         0               7d3h

From the AGE and STATUS columns, you can see that this driver pod is 7 days old and in an error state, indicating that it has not been removed.

(b) Check the Spark Application CRD
Next, check the Spark application custom resource definition (CRD):

napp-k get sparkapplication | grep <driver pod name without -driver suffix>

Example:
napp-k get sparkapplication | grep rec-8db20510-1d2e-11ef-af60-4d5b491b64d2

There should be no spark application with the same name as the recommendation spark job driver pod listed above.

Resolution

Delete the recommendation spark job driver pod with the following command:

napp-k delete pod <name of the recommendation spark job driver pod>
 
Example:
napp-k delete pod rec-8db20510-1d2e-11ef-af60-4d5b491b64d2-driver