Application Classification - How it works, and how to troubleshoot flows from the same application taking different paths.
search cancel

Application Classification - How it works, and how to troubleshoot flows from the same application taking different paths.

book

Article ID: 344871

calendar_today

Updated On:

Products

VMware VMware SD-WAN by VeloCloud

Issue/Introduction

In this KB article we will share insights into the internal workings of the application classification process, its limitations, and potential workarounds.

Symptoms:
A business policy configured with a specific application as the selected match criterion may fail to adhere to the configured action; instead, it is bypassed, and an alternate business policy is chosen.

The same application at other times is observed to adhere to the configured business policy.

For example: a Customer configures a business policy rule for APP_TIKTOK(3497) to use Internet backhaul.

Most of the flows of APP_TIKTOK(3497) are matching the correct policy rule, but the customer found that some flows are being steered via Direct to Cloud.

Environment

VMware SD-WAN
VMware SD-WAN by VeloCloud

Cause

Let us consider a user who has configured a business policy rule for the application TikTok to use Internet backhaul.

image.png


The Edge logs indicate that the majority of APP_TIKTOK(3497) flows are correctly following the designated policy rule. However, it has been observed that some flows are accessed via "Direct to Cloud".

The designation "APP_TIKTOK(3497)" signifies that internally, the TikTok application is classified as APP_TIKTOK, with an associated application ID of 3497.


image.png

CLI output is accessible only for partners managing Orchestrator, Gateway, and Edges on their own. Customers utilizing VMware-hosted services do not have CLI access but can still view this output under Remote Diagnostics > List Active Flows or partner users who have access to download the diagnostic bundle can find the above output in the file "optvcbindebugpy--limitflow_dump_limit--timeout30--flow_dumpallallall.out.txt" in under COMMANDS/ directory. More information about "Roles and Privileges" can be found here
For 

Application identification involves the packet traversing through three databases documented below, each serving a distinct purpose:

1. ip_port_db (CLI command: debug.py --app_ip_port_db) - Fast Learning Database:

  • This database relies on a static configuration within the application map, where applications are mapped to destination IP addresses and ports.

Example:
image.png

 

2. proto_port_db (CLI command: debug.py --app_proto_port_db) - Fast Learning Database:

  • Similar to ip_port_db, this database draws from the static configuration in the application map, mapping applications to protocol ports.

Example :

image.png

Above are the screenshots taken from the Orchestrator under Edge Image Management > Application Maps.
Operator-level users have access to view/edit the Application Maps. More information about "Roles and Privileges" can be found here


3. ip_port_cache (CLI command: debug.py --app_ip_port_cache) - Slow Learning Database:

  • This dynamic cache includes information on whether the application ID originated from fast learning or DPI (Deep Packet Inspection).
  • The cache records recent classified flow data, associating destination IP/port with application names.
  • Each entry in this cache is retained for 10 minutes (600s) and is cleared if there are no new hits.
Deep Packet Inspection (DPI):

The application classification process is handled by our DPI engine, known for its accuracy in identifying flows. Typically, DPI requires multiple packets to accurately classify flows. A sufficient number of packets containing the application's signature are needed for correct classification, after which the results are inserted into the ip_port_cache.

For instance, a standard TCP 443 flow undergoes three stages of classification based on the received packets (initial SYN/SYN-ACK/ACK messages during the TCP handshake). Ultimately, web traffic flows are classified as APP_SSL or more specific applications like APP_FACEBOOK, APP_LINKEDIN, and so forth.

  1. Stage 1: App-id: 205, App-name: APP_TCP
  2. Stage 2: App-id: 199, App-name: APP_SSL, App-class: VPN and tunnel
  3. Stage 3: App-id: 1448, App-name: APP_OFFICE365, App-class: Business Collaboration

The current expected behavior indicates that the first packet of the initial flow will be classified as APP_TCP, causing the first flow to follow the path defined by the business policy for APP_TCP. Even if subsequent classifications change, the Edge will update the app ID, but the route policy and link steering for the first flow will remain unchanged. Subsequent flows will be directed according to the ip_app_port_cache and routed based on the final classified application type. 

Fast Learning Database:

  • Applications that are recognized by static entries defined in application maps by IP address range or protocol ports (ip_port_db and proto_port_db) are referred to as fast learning.
  • Once the application is recognized, the port and IP address details are stored in cache (debug.py --app_ip_port_cache) in order to prevent subsequent entry look up for same destination/port mapping.

image.png
 

Slow Learning Database:

  • Applications that are recognized by the DPI engine are referred to as slow learning. 
  • Once the application is recognized, the port and IP address details are stored in cache (debug.py --app_ip_port_cache) in order to prevent subsequent DPI look up for same destination/port mapping.

image.png
 

Application Classification WorkFlow

image.png

The application map offers the following flags to regulate the classification process. You can modify these flags by downloading the application map from the Orchestrator, making the necessary edits, and then uploading it back to the Orchestrator.
 

"doNotSlowLearn" : 1

DPI will be performed, but the DPI-recognized application result will not be updated in the ip_port_cache.
 

"mustNotPerformDpi" : 1

DPI will not be performed.
 

If users encounter any of the issues described in the aforementioned symptoms, it may be due to the absence of entries in the ip_port_cache that match the specified destination IPs and/or ports. Alternatively, there might be no entries in the ip_port_db or proto_port_db (Static). In such cases, it is expected that DPI will be kicked in to identify the application.

As DPI is a process that necessitates the inspection of several packets for accurate application identification, the example application mentioned earlier (APP_TIKTOK(3497)), being a TCP 443 flow, was classified under a default business rule "Default-Internet-Other." The configured action for this rule was to directly send the traffic out.


image.png
The expectation is that DPI will ultimately classify the application and update the app-id of the flow, although the route_policy will not be updated. Subsequent or new flows are expected to adhere to the configured business rule.

Resolution

This is expected behavior per design.

Workaround:
  1. Minimize differences in Route/Link Steering policies between the final classified application and the first packet (APP_TCP or APP_UDP).
  2. Configure IP addresses for business policy matching. Utilize Address Groups (Object Groups) or IP subnets defined in the application map to match the destination IP address of the application.


Additional Information

If a customer observes the symptoms mentioned above and encounters difficulties in narrowing down the issues, they can initiate a support ticket with VMware for troubleshooting assistance. Please refer to this article for guidance on filing a support ticket: https://kb.vmware.com/s/article/2006985

Impact/Risks:
The initial flow alone might not align with the intended route policy. In cases where applications like "Microsoft Office 365" or "Microsoft Teams" utilize a random IP address from a vast address pool, they could have an impact.