Notes to help utilize command line tokenization to help create better queries that are easier to write and understand
Command lines are tokenized with the default cbeventsv2 schema in Solr. Tokenization is a way of breaking up the command into smaller chunks that can be searched individually.
For example, let's use this command line to see how tokenization is broken up and how it can be searched against
"C:\Windows\Microsoft.NET\Framework64\v4.0.30319\csc.exe" /noconfig
Tokenization breaks this up into smaller pieces based on last dot, backslashes, spaces, parentheses and other special characters. This command line would be broken up like the following
C: Windows Microsoft.NET .NET Framework64 v4.0.30319 .30319 csc.exe .exe /noconfig
Instead of search by the full command line like this
cmdline:"\"C:\\Windows\\Microsoft.NET\\Framework64\\v4.0.30319\\csc.exe\" /noconfig"
We can simply it to a to some specifics using tokenization. Here's some examples
cmdline:"c:\\windows\\microsoft.net" cmdline:"framework64" cmdline:".net" cmdline:"/noconifg"
In our example command line we know that the version will change. We also know that Framework64 path could just be Framework. Wildcards would not work here, no results would come back. So, how can we search this without a wildcard? It's simple, we can split a command line search up with ANDs. Notice we have the two possible Framework* examples within parentheses while using an OR, this is in place of using a wildcard.
((cmdline:"C:\\Windows\\Microsoft.NET\\Framework64" OR cmdline:"C:\\Windows\\Microsoft.NET\\Framework") AND cmdline:"csc.exe" AND cmdline:"/noconfig")
Additional Tokenization Notes are found in the user guide here.