search cancel

Content filtering regular expressions can match unexpected content for HTML messages


Article ID: 162423


Updated On:


Messaging Gateway


When using regular expression matching against the bodies of HTML messages, the content filters can unexpectedly match groups of characters  which are separated by HTML tags resulting in a content filtering match that would not be expected when looking at the raw message content.


A content filtering rule which applies the regular expression "\b([a-zA-Z][12]\d{8.EN_US})\b" against a message body contining "\0xC5c<html tags>121207753" would return a match even though the \0xC5c sequence should be interpreted as the character '與'.


This is the result of content filtering pre-processing which can strip HTML tags from the message body prior to testing it against content filtering rules interacting with strings of characters from different character sets.


This is a known issue with Messaging Gateway's content filtering rules and is being reviewed. Please subscribe to this document to be automatically notified of any updates.