In this KB article we will share insights into the internal workings of the application classification process, its limitations, and potential workarounds.
Symptoms:
A business policy configured with a specific application as the selected match criterion may fail to adhere to the configured action; instead, it is bypassed, and an alternate business policy is chosen.
The same application at other times is observed to adhere to the configured business policy.
For example: a Customer configures a business policy rule for APP_TIKTOK(3497) to use Internet backhaul.
Most of the flows of APP_TIKTOK(3497) are matching the correct policy rule, but the customer found that some flows are being steered via Direct to Cloud.
VMware VeloCloud SD-WAN supported versions
Consider a user who has configured a business policy rule for the application TikTok to use Internet backhaul.
The Edge logs indicate that the majority of APP_TIKTOK(3497) flows are correctly following the designated policy rule. However, it has been observed that some flows are accessed via "Direct to Cloud".
The designation "APP_TIKTOK(3497)" signifies that internally, the TikTok application is classified as APP_TIKTOK, with an associated application ID of 3497.
CLI output is accessible only for partners managing Orchestrator, Gateway, and Edges on their own. Customers utilizing VMware-hosted services do not have CLI access but can still view this output under Remote Diagnostics > List Active Flows or partner users who have access to download the diagnostic bundle can find the above output in the file "optvcbindebugpy--limitflow_dump_limit--timeout30--flow_dumpallallall.out.txt" in under COMMANDS/ directory. More information about "Roles and Privileges" can be found here
For
Application identification involves the packet traversing through three databases documented below, each serving a distinct purpose:
1. ip_port_db (CLI command: debug.py --app_ip_port_db) - Fast Learning Database:
Example:
2. proto_port_db (CLI command: debug.py --app_proto_port_db) - Fast Learning Database:
Example :
Above are the screenshots taken from the Orchestrator under Edge Image Management > Application Maps.
Operator-level users have access to view/edit the Application Maps. More information about "Roles and Privileges" can be found here
3. ip_port_cache (CLI command: debug.py --app_ip_port_cache) - Slow Learning Database:
Deep Packet Inspection (DPI):
The application classification process is handled by our DPI engine, known for its accuracy in identifying flows. Typically, DPI requires multiple packets to accurately classify flows. A sufficient number of packets containing the application's signature are needed for correct classification, after which the results are inserted into the ip_port_cache.
For instance, a standard TCP 443 flow undergoes three stages of classification based on the received packets (initial SYN/SYN-ACK/ACK messages during the TCP handshake). Ultimately, web traffic flows are classified as APP_SSL or more specific applications like APP_FACEBOOK, APP_LINKEDIN, and so forth.
The current expected behavior indicates that the first packet of the initial flow will be classified as APP_TCP, causing the first flow to follow the path defined by the business policy for APP_TCP. Even if subsequent classifications change, the Edge will update the app ID, but the route policy and link steering for the first flow will remain unchanged. Subsequent flows will be directed according to the ip_app_port_cache and routed based on the final classified application type.
ip_port_db and proto_port_db
) are referred to as fast learning.(debug.py --app_ip_port_cache)
in order to prevent subsequent entry look up for same destination/port mapping.
(debug.py --app_ip_port_cache)
in order to prevent subsequent DPI look up for same destination/port mapping.
The application map offers the following flags to regulate the classification process. You can modify these flags by downloading the application map from the Orchestrator, making the necessary edits, and then uploading it back to the Orchestrator.
"doNotSlowLearn" : 1
DPI will be performed, but the DPI-recognized application result will not be updated in the ip_port_cache.
"mustNotPerformDpi" : 1
DPI will not be performed.
If users encounter any of the issues described in the aforementioned symptoms, it may be due to the absence of entries in the ip_port_cache that match the specified destination IPs and/or ports. Alternatively, there might be no entries in the ip_port_db or proto_port_db (Static). In such cases, it is expected that DPI will be kicked in to identify the application.
As DPI is a process that necessitates the inspection of several packets for accurate application identification, the example application mentioned earlier (APP_TIKTOK(3497)), being a TCP 443 flow, was classified under a default business rule "Default-Internet-Other." The configured action for this rule was to directly send the traffic out.
The expectation is that DPI will ultimately classify the application and update the app-id of the flow, although the route_policy will not be updated. Subsequent or new flows are expected to adhere to the configured business rule.
This is expected behavior per design.
Workaround:
If a customer observes the symptoms mentioned above and encounters difficulties in narrowing down the issues, they can initiate a support ticket with VMware for troubleshooting assistance. Please refer to this article for guidance on filing a support ticket: https://kb.vmware.com/s/article/2006985
Impact/Risks:
The initial flow alone might not align with the intended route policy. In cases where applications like "Microsoft Office 365" or "Microsoft Teams" utilize a random IP address from a vast address pool, they could have an impact.