ZIP+4 codes detected as Social Security Numbers
Article ID: 160338


Updated On:


Data Loss Prevention Enforce


In limited situations involving PDF files and soft hyphens, Symantec Data Loss Prevention (DLP) may detect 9-digit ZIP+4 codes as Social Security Numbers (SSNs) and produce false positives.


This issue is limited to PDF documents and only occurs when the dash separating the five-digit ZIP code from the 4-digit specifier appears as the last character on the line and the newline is not added by the user (known as a soft hyphen).

For example, if the following text appears in a PDF file, the system may detect the ZIP+4 code as an SSN if the dash separating the ZIP code from the 4-digit specifier is due to a soft hyphen:

ABC Corporation, 1 Main Street, San Francisco, CA 32246-

If this issue is observed when using the SSN Data Identifier, the recommended solution is to use either the medium or wide breadth editions of the data identifier instead of the narrow breadth.

If this issue occurs with another type of detection rule, to resolve it you can add the following parameters to the file \SymantecDLP\Protect\plugins\contentextraction\Verity\<platform>\formats.ini and restart the detection server or endpoint.


NOTE: This latter approach is not recommended because it can lead to false negatives, meaning while it may work to alleviate the PDF file soft hyphen ZIP+4 issue, it may cause your system to miss incidents. In this case, the recommended approach is to leave the system as is and plan for potential false positives or, alternatively, remove soft hyphens from PDF files altogether.