Differences between File Extension, HTTP MIME Type and Apparent Data type


Article ID: 170664


Updated On:


ProxySG Software - SGOS


The ProxySG can block files upon a few ways.


In the ProxySG's policy, you can use the following policy objects to block a file.


The file extension:

This can work only if the extension of the file is contained in the URL.

For example, song.mp3 has the extension mp3. The URL must be something such as: http://www.mywebsite.com/musicfolder/song.mp3?action=download
In this example above, the extension of the file is included in the URL.

CPL Policy example:

define condition FileExtension1

DENY condition=FileExtension1


The HTTP MIME type:

After the download, the webserver sends some HTTP headers in the HTTP response.
These headers will usually contain the 'Content-Type' information or also known as HTTP MIME type.
Most of the webservers send the Content-Type information by default. However, some webservers are configured to not send the Content-Type of the file.

If the webserver sends the Content-Type in the HTTP response, the ProxySG will look at this piece of information from the HTTP headers and the ProxySG will allow/deny the access depending on the Content-Type.

Please note the webserver does not look deeply into the file to define the MIME type.
The webserver will define the MIME type based on the file name extension. If a true PDF file has its name changed to EXE, the webserver will set the HTTP header Content-Type to application/x-mdos-program which is the MIME type for EXE and not PDF.

Below is an example of HTTP headers (request in red and response in blue) taken from a Wireshark Packet Capture.

http response with MIME type

CPL Policy example:

define condition HTTPMIMETypes1
 response.header.Content-Type="^application/pdf( |    )*($|;)"
 response.header.Content-Type="^audio/mpeg( |    )*($|;)"

 DENY condition=HTTPMIMETypes1


The Apparent Data Type:

When using the Apparent Data Type object, the ProxySG will look at the initial bytes of a file to determine its type. It doesn't look at a file extension/name at all but the actual payload of the file.

If a PDF file has been renamed into MP3, the Apparent Data Type will remain PDF and the policy action Allow or Deny will be based on that piece of information.

Running the Apparent Data Type can be similar to running the command `file` on a Linux machine.

CPL Policy example:

    DENY http.response.apparent_data_type=(PDF, executable)



In Conclusion:

The File Extension is the quickest way to block the files based on their extension (file name). This happens before the download of the file.

When the extension of the file is not present in the URL, the HTTP MIME Type is another way to block based on the file extension information. This information is taken from the webserver HTTP response when available. This happens after the complete download of the file.

The Apparent Data Type is the most effective way to identify the type of the file which is not based on the file's name. This happens after the complete download of the file.

For more information about the implementation of the CPL rules, please refer to the CPL admin guide according to your ProxySG software version.