CF App 502 errors / Router requests get connection refused after upgrade of Twistlock
search cancel

CF App 502 errors / Router requests get connection refused after upgrade of Twistlock

book

Article ID: 432294

calendar_today

Updated On:

Products

VMware Tanzu Platform Core VMware Tanzu Application Service

Issue/Introduction

  • The Prisma Twistlock Defender tile is installed in the foundation.
  • Twistlock Defender has recently been upgraded to version 34.0.4.145 or later.
    • Find the Defender version at the following location on the Diego cell:

      /var/vcap/packages/defender/version.txt

  • Users cannot connect to Apps Manager via curl nor web browser after Diego Cell repave/reboot.
  • This impacts ALL applications installed on Elastic Application Runtime (TAS) foundations.
  • Receiving 502/503 and 404 errors via web browsers. The primary failure seen in this scenario is application 502 responses.
  • GoRouter logging will report errors like:

    • /var/vcap/sys/log/gorouter/gorouter.stdout.log:


      "error":"incomplete request (dial tcp <DIEGO_CELL_IP>:61026: connect: connection refused)","retriable":true}

    • /var/vcap/sys/log/gorouter/gorouter.stderr.log:

      http: proxy error: dial tcp <DIEGO_CELL_IP>:61030: connect: connection refused

       

  • Gathering logging from the application itself with cf logs <APP_NAME> will return errors like:

    x_cf_routererror:"endpoint_failure (dial tcp <DIEGO_CELL_IP>:61000: connect: connection refused)"


    These same errors will be presented in the /var/vcap/sys/log/access.log on the GoRouter.



  • Despite these errors, running cf app <APP_NAME> will show all instances of the application "running"
  • Review the iptables mangle table in the Diego Cell to determine if this exact problem is being encountered:
    • From SSH to the Diego Cell run the following to review iptables rules:

      sudo iptables -L -t mangle


    • Filter on the PREROUTING chain and list line numbers to see if TWISTLOCK jump rule has been added:

      Example of PROBLEM environment:

      sudo iptables -L PREROUTING -t mangle --line-numbers | head -3
      Chain PREROUTING (policy ACCEPT)
      num  target     prot opt source               destination
      1    TWISTLOCK-NET-PRE  all  --  anywhere             anywhere


      Example of HEALTHY environment:

      sudo iptables -L PREROUTING -t mangle --line-numbers | head -3
      Chain PREROUTING (policy ACCEPT)
      num  target     prot opt source               destination
      1    netin--<CONTAINER_ID>  all  --  anywhere             anywhere  



    • This can be checked across all Diego Cells in the foundation with the following command:

      bosh -d cf-<ID> ssh diego_cell -c "sudo iptables -L PREROUTING -t mangle --line-numbers | head -3"

Environment

EAR/TAS foundations using Prisma Twistlock Defender tile with Defender versions 34.04.145+

Cause

In Twistlock Defender service update to version 34.04.145 or later, the Defender service inserts PREROUTING and POSTROUTING jump rules to the iptables mangle table. These jump rules are inserted at the top of the list and prepend any of the default rules used by Diego Cell to manage the virtual network required to deliver traffic to the application container logical IP on the internal proxy port. This hijacks the virtual routing and prevents traffic external to the Diego Cell from being "marked" by the default rules created for each container, which prevents traffic from being routed to the virtual interface for the container.

 

As a result, GoRouter will receive 502 responses from the Diego cell and will respond accordingly.

Resolution

Prisma Release notes report this is fixed in 34.04.156.  See below excerpt from Prisma release notes

Resolved a regression in Quinn update 34.04.145 that makes the Defender set iptables/nftables entries in your environment, even when there is no policy such as Cloud Native Network Security (CNNS) or DNS monitoring requesting it.

This bug affects all 34.04 Defenders. If the Defender is deployed on Tanzu Application Service (TAS), this may lead to severe connectivity issues.

The referenced hotfix addresses this issue and ensures that iptables/nftables rules are set only when it is explicitly reflected in the policy.

 

The steps below provide 2 workaround options to manually remove the PREROUTING rule Twistlock created, which will allow traffic delivery from GoRouter to the container application port as expected until a resolution is applied the Prisma agent:

  1. Manually correct this on problem foundations using Bosh commands:

    • Search across all Diego Cells in a foundation for the TWISTLOCK-NET-PRE rule that is inserted into the PREROUTING chain on the mangle table:

      bosh -d cf-<ID> ssh diego_cell -c "sudo iptables -L PREROUTING -t mangle --line-numbers | head -3"


      Example output for environment that has been updated with problematic Twistlock values:

      • bosh -d cf-<ID> ssh diego_cell -c "sudo iptables -L PREROUTING -t mangle --line-numbers | head -3"
        Using environment '10.###.###.###' as client 'ops_manager'

        Using deployment 'cf-<ID>'

        Task 3764. Done
        diego_cell/<ID>: stderr | Unauthorized use is strictly prohibited. All access and activity
        diego_cell/<ID>: stderr | is subject to logging and monitoring.
        diego_cell/<ID>: stderr | Unauthorized use is strictly prohibited. All access and activity
        diego_cell/<ID>: stderr | is subject to logging and monitoring.
        diego_cell/<ID>: stdout | Chain PREROUTING (policy ACCEPT)
        diego_cell/<ID>: stdout | num  target     prot opt source               destination
        diego_cell/<ID>: stdout | 1    TWISTLOCK-NET-PRE  all  --  anywhere             anywhere
        diego_cell/<ID>: stderr | Connection to 10.###.###.62 closed.
        diego_cell/<ID>: stdout | Chain PREROUTING (policy ACCEPT)
        diego_cell/<ID>: stdout | num  target     prot opt source               destination
        diego_cell/<ID>: stdout | 1    TWISTLOCK-NET-PRE  all  --  anywhere             anywhere
        diego_cell/<ID>: stderr | Connection to 10.###.###.63 closed.



      Example output for environment that has not been updated (healthy output):

      • bosh -d cf-<ID> ssh diego_cell -c "sudo iptables -L PREROUTING -t mangle --line-numbers | head -3"
        Using environment '10.###.###.###' as client 'ops_manager'

        Using deployment 'cf-<ID>'

        Task 3764. Done
        diego_cell/<ID>: stderr | Unauthorized use is strictly prohibited. All access and activity
        diego_cell/<ID>: stderr | is subject to logging and monitoring.
        diego_cell/<ID>: stderr | Unauthorized use is strictly prohibited. All access and activity
        diego_cell/<ID>: stderr | is subject to logging and monitoring.
        diego_cell/<ID>: stdout | Chain PREROUTING (policy ACCEPT)
        diego_cell/<ID>: stdout | num  target     prot opt source               destination
        diego_cell/<ID>: stdout | 1    netin--<CONTAINER_ID>  all  --  anywhere             anywhere      
        diego_cell/<ID>: stderr | Connection to 10.###.###.62 closed.
        diego_cell/<ID>: stdout | Chain PREROUTING (policy ACCEPT)
        diego_cell/<ID>: stdout | num  target     prot opt source               destination
        diego_cell/<ID>: stdout | 1    netin--<CONTAINER_ID>  all  --  anywhere             anywhere  
        diego_cell/<ID>: stderr | Connection to 10.###.###.63 closed.

       

    • The Diego cells that show the netin-<CONTAINER_ID> as the first entry in the PREROUTING table should be functional and should not be impacted by the failure, we can skip these foundations.

    • Any foundations that have Diego cells that show the TWISTLOCK-NET-PRE as the first entry in the PREROUTING table will need to be updated to remove this entry.

    • For the problem environments that have the TWISTLOCK-NET-PRE entry, run the following command to remove the problem jump rule and recover network delivery:

      bosh -d cf-<ID> ssh diego_cell -c "sudo iptables -t mangle -D PREROUTING 1"

  2. The secondary option is to use an os-conf release pre-start script to remove these rules on repave. The following documentation details this operation (NOTE:  if you already have an os-conf built in the environment, the os-conf release version will likely differ from the 23.0.0 noted below, use whichever version you have already installed)os-conf reference KB

    • Find below an os-conf configuration that applies a post-deploy-script on Diego Cell instances that goes in and searches for the TWISTLOCK-NET-PRE and TWISTLOCK-NET-POST jump rules and chains in the mangle table in iptables. If it finds the rules and chains, it removes them from the PREROUTING and POSTROUTING chains, then purges and exterminates the TWISTLOCK chains, specifically in the mangle table:

      releases:
      name: os-conf
        version: 23.0.0
      addons:
      name: postdeploy_twistlock_iptable_rule_removal
        include:
          instance_groups:
          - diego_cell
        jobs:
        - name: post-deploy-script
          release: os-conf
          properties:
            script: |
              #!/bin/bash
              # Job Name: postdeploy_twistlock_iptable_rule_removal
       
              LOG_FILE="/var/vcap/sys/log/twistlock-removal-postdeploy.log"
              mkdir -p $(dirname $LOG_FILE)
       
              # Redirect stdout and stderr to the log file
              exec >> "$LOG_FILE" 2>&1
       
              echo "--- Started Post-Deploy Twistlock Cleanup: $(date) ---"
       
              TABLE="mangle"
              # Mapping of Chain:Target
              REMOVAL_LIST=("PREROUTING:TWISTLOCK-NET-PRE" "POSTROUTING:TWISTLOCK-NET-POST")
       
              for entry in "${REMOVAL_LIST[@]}"; do
                CHAIN="${entry%%:*}"
                TARGET="${entry#*:}"
       
                echo "Searching $TABLE $CHAIN for jump rule to $TARGET..."
       
                # 1. Find the line number in the built-in chain
                # We use -n to prevent slow DNS lookups during the deploy
                LINE_NUM=$(iptables -t $TABLE -L $CHAIN --line-numbers -n | grep " $TARGET " | awk '{print $1}' | head -n 1)
       
                if [ -n "$LINE_NUM" ]; then
                  echo "Found rule at line $LINE_NUM in $CHAIN. Deleting..."
                  iptables -t $TABLE -D $CHAIN $LINE_NUM
                  echo "Successfully removed jump rule from $CHAIN."
                else
                  echo "No jump rule to $TARGET found in $CHAIN."
                fi
       
                # 2. Flush and Xterminate the custom chain
                # We check if the chain exists before attempting deletion to remain idempotent
                if iptables -t $TABLE -L $TARGET -n >/dev/null 2>&1; then
                  echo "Target chain $TARGET exists. Flushing and Xterminating..."
                  iptables -t $TABLE -F $TARGET
                  iptables -t $TABLE -X $TARGET
                  echo "Chain $TARGET has been removed from the $TABLE table."
                else
                  echo "Custom chain $TARGET does not exist. Skipping."
                fi
       
                echo "------------------------------------------"
              done
       
              echo "--- Post-Deploy Cleanup Finished: $(date) ---"
              echo ""