ZIP+4 codes detected as Social Security Numbers

book

Article ID: 160338

calendar_today

Updated On:

Products

Data Loss Prevention Enforce

Issue/Introduction

In limited situations involving PDF files and soft hyphens, Symantec Data Loss Prevention (DLP) may detect 9-digit ZIP+4 codes as Social Security Numbers (SSNs) and produce false positives.

Resolution

This issue is limited to PDF documents and only occurs when the dash separating the five-digit ZIP code from the 4-digit specifier appears as the last character on the line and the newline is not added by the user (known as a soft hyphen).

For example, if the following text appears in a PDF file, the system may detect the ZIP+4 code as an SSN if the dash separating the ZIP code from the 4-digit specifier is due to a soft hyphen:

ABC Corporation, 1 Main Street, San Francisco, CA 32246-
6457

If this issue is observed when using the SSN Data Identifier, the recommended solution is to use either the medium or wide breadth editions of the data identifier instead of the narrow breadth.

If this issue occurs with another type of detection rule, to resolve it you can add the following parameters to the file \SymantecDLP\Protect\plugins\contentextraction\Verity\<platform>\formats.ini and restart the detection server or endpoint.

[pdf_flags]
keepsofthyphen=true

NOTE: This latter approach is not recommended because it can lead to false negatives, meaning while it may work to alleviate the PDF file soft hyphen ZIP+4 issue, it may cause your system to miss incidents. In this case, the recommended approach is to leave the system as is and plan for potential false positives or, alternatively, remove soft hyphens from PDF files altogether.