NSX Malware Prevention and Network Detection and Response upgrade to 3.2.1 or 3.2.1.1 fails with pods in ImagePullBackOff state
search cancel

NSX Malware Prevention and Network Detection and Response upgrade to 3.2.1 or 3.2.1.1 fails with pods in ImagePullBackOff state

book

Article ID: 319051

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

Customer will not be able to use NSX ATP 3.2.1 or NSX ATP 3.2.1.1 build.


Malware Prevention and Network Detection and Response upgrade is failing in below scenarios:
     - From NSX Advanced Threat Prevention (ATP) 3.2.0 to NSX ATP 3.2.1 / 3.2.1.1
     - From NSX ATP 3.2.1 to NSX ATP 3.2.1.1
 
Other Symptoms:

  1. Failed status of NDR and cloud-connector pods will be shown on Upgrade UI screen.
  2. For NDR upgrade, few pods with prefix as "nsx-ndr" will be in ImagePullBackOff state.
  3. For MPS upgrade, few pods with prefix as "cloud-connector" will be in ImagePullBackOff state.
  4. Although upgrade fails once customer hits the upgrade button, MPS & NDR functionality still work the same as before. Note that failure in upgrade only impacts upgrade but does NOT impact any of existing functionality. 
     
    Log location: NAPP support bundle

Cause

  1. In upgrade workflow, user needs to mention helm and docker repositories, from where the latest images should be pulled for platform and installed features.
  2. For MPS and NDR, pods are pointing to incorrect docker registry. As a result, some pods have an ImagePullBackOff status.

Resolution

Resolution will be available in NSX Advanced Threat Prevention 4.0.1

Workaround:

  1. SSH into NSX Manager, elevate to root user with st en
  2. Identify the pods which are in ImagePullBackOff
        Command: napp-k get pods | grep "ImagePullBackOff"
        
        NDR failing pods
        NAME                                                           READY   STATUS           RESTARTS   AGE
        nsx-ndr-upload-config-5c56785b85-qv64h                         0/2     ImagePullBackOff   0          6d
        nsx-ndr-worker-file-event-processor-7f55cf97d6-d6d8p           0/2     ImagePullBackOff   0          6d
        nsx-ndr-worker-file-event-uploader-d48c7fbd-smvtz              0/2     ImagePullBackOff   0          6d
        nsx-ndr-worker-ids-event-processor-7f96d9c87f-wp929            0/2     ImagePullBackOff   0          6d
        nsx-ndr-worker-monitored-host-uploader-85d6d46fdc-nd7g4        0/2     ImagePullBackOff   0          6d
        nsx-ndr-worker-ndr-event-processor-6947fb9cb8-jj5kh            0/2     ImagePullBackOff   0          6d
        nsx-ndr-worker-ndr-event-uploader-578b5dbfb-2s9j8              0/2     ImagePullBackOff   0          6d
        
        MPS failing pods
        NAME                                                              READY   STATUS         RESTARTS   AGE
        cloud-connector-check-license-status-5dffd77ff4-9zpff             0/2   ImagePullBackOff  0        3m27s
        cloud-connector-proxy-78b7fb7857-zf5gr                            0/2   ImagePullBackOff  0        3m27s
        cloud-connector-update-license-status-795d865864-x7b52            0/2   ImagePullBackOff  0        3m27s
        reputation-service-5d498b65f8-2htvx                               0/1   ImagePullBackOff  0        24s
        reputation-service-feature-switch-watcher-notifier-dependedr2nn   0/1   ImagePullBackOff  0        76s
  3. Get the deployment name for the failing pods by matching the prefix 

    Command: napp-k get deployments

        NDR deployments
        NAME                                                              READY   UP-TO-DATE   AVAILABLE   AGE
        nsx-ndr-upload-config                                             1/1     1            1           163m
        nsx-ndr-worker-file-event-processor                               1/1     1            1           4h25m
        nsx-ndr-worker-file-event-uploader                                1/1     1            1           3h13m
        nsx-ndr-worker-ids-event-processor                                1/1     1            1           3h13m
        nsx-ndr-worker-monitored-host-uploader                            1/1     1            1           3h13m
        nsx-ndr-worker-ndr-event-processor                                1/1     1            1           3h13m
        nsx-ndr-worker-ndr-event-uploader                                 1/1     1            1           3h13m

        MPS deployments
        NAME                                                              READY   UP-TO-DATE   AVAILABLE   AGE
        cloud-connector-check-license-status                              1/1     1            1           4h25m
        cloud-connector-proxy                                             1/1     1            1           3h13m
        cloud-connector-update-license-status                             1/1     1            1           3h13m
        reputation-service                                                1/1     1            1           3h13m
        reputation-service-feature-switch-watcher-notifier                1/1     1            1           3h13m

     

  4.  Edit the deployment and update the image tag with the correct docker registry (the one that was provided by the user during upgrade workflow). Note that only the registry part of the image field should be updated: "harbor.nsbu.eng.vmware.com/nsx_intelligence_ob/clustering" 

      
        Command: napp-k edit deployment cloud-connector-check-license-status
        This will get opened in vi editor mode. If you want to open in file editor mode, execute this additional command:
        export KUBE_EDITOR=vim.tiny

        For instance, if the docker registry provided by the user was "projects.registry.vmware.com/nsx_application_platform/clustering", then we need to make the below update.
       
    Existing value example: 
        image: harbor.nsbu.eng.vmware.com/nsx_intelligence_ob/clustering/nsx-cloud-connector-check-nsx-licensing-status-with-lastline-cloud:123-c33a1aa7.bionic
       
    Corrected value: 
        image: projects.registry.vmware.com/nsx_application_platform/clustering/nsx-cloud-connector-check-nsx-licensing-status-with-lastline-cloud:123-c33a1aa7.bionic
       

  5. Repeat step 4 for all the deployments mentioned in Step 3 for a given vertical/feature.
        Note that for MPS, we will not see cloud connector and reputation service failing pods at the same time.
        a. Workaround needs to be applied for cloud connector pods first.
        b. Once upgrade of cloud connector pods are successful, then we will see reputation service pods in ImagePullBackOff state.
        c. As and when we see new ImagePullBackOff pods coming up, we need to apply the workaround.

  6. After executing the above steps, upgrade will be successful and the status can be seen as Complete on UI. Also, we can verify the same by executing below command and validate the version.
        Command: napp-h list

        Continue to monitor backend pods again after successful Upgrade to check if any pod is in ImagePullbackOff state. If so, then we have to repeat steps 2, 3 & 4 mentioned above.

 

 

Additional Information

After upgrade, if a user wants to uninstall the MPS or NDR feature feature, execute the below commands for force deletion.
    Commands:
    napp-k delete job cloud-connector-reset --grace-period=0 --force
    napp-k delete job cloud-connector-cleanup --grace-period=0 --force