Does Symantec DLP index all the words it is given, including common words like "the" and "and"? If not, is there a list of words that Symantec DLP ignores?
Symantec DLP does ignore common words like "the".
Common words occur so frequently that they don't provide any help in detecting protected data. If anything, detecting on common words would create a large number of false positive matches because, again, they are so common. Ignoring common words improves the detection results and reduces the time and resources needed to create the indexes.
The common words that Symantec DLP ignores are kept in files in the config/stopwords directory in the DLP installation directory on the Enforce server.