search cancel

Content filtering regular expressions can match unexpected content for HTML messages

book

Article ID: 162423

calendar_today

Updated On:

Products

Messaging Gateway

Issue/Introduction

When using regular expression matching against the bodies of HTML messages, the content filters can unexpectedly match groups of characters  which are separated by HTML tags resulting in a content filtering match that would not be expected when looking at the raw message content.

Example:

A content filtering rule which applies the regular expression "\b([a-zA-Z][12]\d{8.EN_US})\b" against a message body contining "\0xC5c<html tags>121207753" would return a match even though the \0xC5c sequence should be interpreted as the character '與'.
 

Cause

This is the result of content filtering pre-processing which can strip HTML tags from the message body prior to testing it against content filtering rules interacting with strings of characters from different character sets.

Resolution

This is a known issue with Messaging Gateway's content filtering rules and is being reviewed. Please subscribe to this document to be automatically notified of any updates.