Application Security Groups (ASG) and Distributed Firewall Rules (DFW) Integration
search cancel

Application Security Groups (ASG) and Distributed Firewall Rules (DFW) Integration

book

Article ID: 298134

calendar_today

Updated On:

Products

VMware Tanzu Application Service for VMs

Issue/Introduction

In this KB we will cover how Distributed Firewall Rules (DFW) are realized in NSX-T Firewall service when configuring Application Security Groups (ASG) in Tanzu Application Service for VMs. We will also cover some basic troubleshooting of ASGs should your application fail to communicate with an IP/Port outside of the container. 
Note:  ASGs are TAS's security controls feature for an application's egress traffic. We assume that the reader is comfortable with creating and viewing ASG rules using CF CLI. See documentation for an introduction into ASGs

As part of the NSX-T & TAS integration the role of maintaining the consistency of container states between TAS & NSX-T falls upon NCP. NCP's role with ASGs is to ensure that when an ASG record is created, modified, or deleted within TAS's Cloud Controller Database that information is quickly pushed to NSX Management Plane. NCP achieves this by adopting two communication workflows to retrieve ASG information; workflow A depends on polling Cloud Controller API endpoints (eg /v2/security_groups), and workflow B depends on listening to LRP events from BBS.

Regardless of both workflows NCP creates a local cache to store ASG information obtained from both workflows. NCP uses this cache to compare when ASG information for a space or organization changes. The information in the cache is what is sent to NSX Management Plane, NSX uses this information to create two Resource Types for the ASG (ContainerNetworkPolicy & Firewall Section). 

Below we will highlight how each workflow ingests ASG data from TAS:

Workflow A: Polling Cloud Controller

For this workflow NCP creates a worker thread (ASGController worker) that periodically polls Cloud Controller API endpoint (GET /v2/security_groups and GET /v2/spaces/SPACE_GUID/security_groups) every 10 seconds; in cloud controller nginx access logs you will see these requests come from a python client. 

1. The worker thread starts a sync first by polling the ASGs endpoints, the endpoints will output all of the ASGs created and binded within TAS in JSON format.
2. ASGController stores this information in its asg-cache where it is compared each time the worker's sync thread is executed. If the ASG is not yet bounded to a space or organization then the DFW firewall rule is not yet created, instead an NSX T ResourceType ContainerNetworkPolicy is created.
3. If the ASG is bound to a space or applied org wide then the DFW firewall rule is created, the DFW Firewall rule is created as NSX T Resource Type Firewall Section.


Workflow B: LRP Events

For this workflow NCP creates a worker thread (LRPWatcher) to listen for LRP events from BBS, for ASG integration this worker thread will monitor for DesiredLRP events from BBS. BBS will emit ASG information related to the application in the DesireLRP event. Similar to Workflow A after receiving ASG information it will check its asg-cache to see if the ASG information is either new or recently modified and NCP will take action based on the state of the cache. 

A few additional technical details:
1. Cloud Controller creates a unique GUID for each ASG, and NCP uses this GUID to track ASG's states within its asg-cache.
2. NSX creates a unique GUID for each ASG received from NCP, the NSX GUID and ASG GUID do not match, the ASG GUID is referenced as a tag in the ResourceTypes created in NSX.
3. There is a 128 Limit in NSX for the number of spaces assigned to a DFW rule. The limitation does not exist in TAS, thus you can run into a situation where you bind too many spaces to the ASG rule without realizing it.
4. Each space is created with its own Logical Switches, each Organization is created with its own Logical Router, and the DFW rules are applied to the Space's Logical Switches

Environment

Product Version: 3.0

Resolution

If the application attempts to reach an IP/Port and is unable to then you can use the following steps below to ensure if the ASG has been properly realized and created in NSX T.

1. Retrieve the ASG GUID from Cloud Controller by searching for the ASG name

cf curl /v2/security_groups?q=name:rule-to-bind | jq .resources[].metadata.guid
"12f0145e-34a6-4861-a71f-e73cc35bd527"

2. Retrieve the Space GUID from Cloud Controller by search for the Space name

cf curl /v2/spaces?q=name:rios-test-1111 | jq .resources[].metadata.guid
"d99d65dc-9d6f-4eea-82c5-f87c206e0a37"


3. Identify the NCP Leader, and check if the ASG GUID is in the NCP asg-cache using nsxcli & bosh ssh.  After the command executes we can identify the NCP Leader via STDOUT output, our NCP Leader will say This instance is the NCP master. Once the NCP Leader is identified we will check the asg-cache

bosh -d cf-382ebe75a0100ffa6525 ssh diego_database -c "sudo /var/vcap/jobs/ncp/bin/nsxcli -c get ncp-master status" -r

Instance   diego_database/0400c700-d138-4842-8dd2-e450710c4617
Stdout     Mon Nov 14 2022 UTC 19:23:48.312
           This instance is the NCP master
           Current NCP Master id is 3ddf17bd-b43d-4d13-a8ba-f3f90e6bd458
           Current NCP Instance id is 3ddf17bd-b43d-4d13-a8ba-f3f90e6bd458
           Last master update at Mon Nov 14 19:23:43 2022
Stderr     Unauthorized use is strictly prohibited. All access and activity
           is subject to logging and monitoring.
           Connection to 172.##.#.## closed.

Exit Code  0
Error      -

Instance   diego_database/1b5944c2-2de5-426e-a72a-aa74ca5f27c6
Stdout     Mon Nov 14 2022 UTC 19:23:48.170
           This instance is not the NCP master
           Current NCP Master id is 3ddf17bd-b43d-4d13-a8ba-f3f90e6bd458
           Current NCP Instance id is 455aeaf4-a59d-4411-8d3f-4ba2e3598d8b
           Last master update at Mon Nov 14 19:23:47 2022


Stderr     Unauthorized use is strictly prohibited. All access and activity
           is subject to logging and monitoring.
           Connection to 172.##.#.## closed.
bosh -d cf-382ebe75a0100ffa6525 ssh diego_database/0400c700-d138-4842-8dd2-e450710c4617 -c "sudo /var/vcap/jobs/ncp/bin/nsxcli -c get asg-cache 12f0145e-34a6-4861-a71f-e73cc35bd527" -r
Using environment '172.##.#.##' as client 'ops_manager'

Using deployment 'cf-382ebe75a0100ffa6525'

Task 162. Done

Instance   diego_database/0400c700-d138-4842-8dd2-e450710c4617
Stdout     Tue Nov 15 2022 UTC 00:19:37.463
               fws_id: 15a655fd-9c4d-4b03-95b7-a6003e12d026
               name: rule-to-bind
               rules:
                   code: 0
                   destinations:
                       0.0.0.0/0
                   ports:
                   protocol: icmp
                   type: 0

                   code: None
                   destinations:
                       10.0.11.0/24
                   ports:
                       80
                       443
                   protocol: tcp
                   type: None
               running_default: False
               running_spaces:
                   d99d65dc-9d6f-4eea-82c5-f87c206e0a37
               staging_default: False
               staging_spaces:

Note: We see that the ASG GUID exists in the cache as a result is returned and we confirmed that the ASG is binded to our space.
Suggestion: If you do not see the ASG in the cache or the Space GUID is not in running_spaces or staging_spaces field, then restart the NCP Job hosted on the Diego Database VM. Upon a restart of NCP this will clear the asg-cache and NCP we will rebuild its cache upon restart. If the asg-cache continues to not reflect the ASG or Space GUID then open a case with Tanzu Support, as this indicates ASG information is not being pulled to Workflow A or pushed to Workflow B.

4. If the ASG exists within the cache and is binded to a space and communication continues to fail then check if the ResourceType FirewallSection is created for the ASG object and is binded assigned_to spaces in NSX.

5. Search for the ASG in NSX UI using the ASG GUID

6. Search the ID created by NSX for Resource Type Firewall Section which we identified in the above step. Hover your cursor over the Applied To field and check if your Space name appears on the list


7. If you do not see the DFW Rule applied to your space, the rule exists in asg_cache, and there are no more than 128 Spaces assigned_to the DFW Rule then open a case with NSX T Support as this outside of the knowledge realm of TAS