Fonts are an integral part of all modern documents and operating systems. A document contains one or more text areas, such as paragraphs, and each text area is composed of a sequence of characters. To show these characters, a document viewer needs to know how a character is represented graphically. This is exactly what fonts are supposed to contain – information about graphical representation for all the characters, also called glyphs – and instructions about how to render them.
A font is specified in a language such as TrueType or OpenType, and it specifies glyphs for a character set. It contains information about which character set it can render, how to find glyphs for any given character in the set, how to scale a glyph (for instance, when it needs to be displayed in a big 30 point size or a tiny 6 point size), how to make the glyphs look smooth and accurate in all these sizes, and so on. A font language contains additional instructions for hinting and anti-aliasing, which allow it to describe how to modify the glyphs for an accurate representation of characters and how to smooth the glyphs with appropriate use of gray pixels.
Because there are only a handful of font languages (e.g. TrueType) that are very popular, modern operating systems make it easier for document processors by incorporating engines into their platforms that can process fonts specified in these popular languages and display them onto the screen. Modern operating systems, in fact, go further and include a number of popular fonts – specified in these languages – so that a document does not always have to embed these fonts and can still render the characters that use the fonts on the screen.
While all this makes the job of document processors easier and the experience for users better – for instance, by having an efficient font processing and rendering implementation – it provides one more avenue for attackers to exploit. If an attacker creates a font file in a way that can exploit some vulnerability in the operating system’s font parsing and interpretation engine, it can allow the attacker to do potentially dangerous things on behalf of the user. Duqu is an example of such an attack that exploited Microsoft Windows TrueType font parsing vulnerability and allowed the attacker to execute arbitrary code with the user’s privilege.
Disarm considers every embedded font as a potentially malicious object. Every font file contains information that needs extensive parsing and interpretation and this makes them potentially dangerous. When Disarm encounters a font embedded within a PDF document and the font configuration is enabled in the Symantec Messaging Gateway on the Disarm settings page, we remove the font. In the first Disarm version that comes with Symantec Messaging Gateway 10.5, only PDF font removal is supported.
So, how much of a problem does this cause in terms of document readability? In practice, the majority of the documents that we receive every day use fonts that are included in the basic operating system installation. In addition, more fonts are installed when popular document processing application packages are installed. So, after removing all the fonts from within a document, most of the popular – and a lot of not-so-popular – fonts are likely to be already present in the system and will be rendered.
In practice, most documents do not embed all the fonts they use – instead, they contain enough information for the application to locate a font in the computer’s file system where it needs to be rendered. Embedded fonts are, therefore, fonts that are less likely to be present, which usually means they are not so common. All maliciously crafted fonts belong to this set: the fonts that are embedded. So, Disarm takes a cautious approach and removes all of them without trying to determine if they are good or bad.
When a document is stripped of all its embedded fonts by Disarm, and not all the removed fonts are available on the system, parts of the document usually become unreadable. The document viewer displays some unexpected glyphs, such as squares and dots, in place of the characters whose glyphs are not found. In addition, the document viewer usually displays an error that notifies users about the missing font, where the error message may also include the name of the font. One approach to address this problem is to install the missing font onto the system and restart the viewing application to read the document. However, this leaves it to the user to retrieve the font from the internet, which can be dangerous, because the attackers can trick users into downloading a malicious font file that has the same name as the font name being looked for.
A reasonable solution to this problem is to block users from installing fonts and having system administrators install fonts on demand. This allows system administrators to control which fonts are present and used on the endpoints by choosing from a maintained whitelist of only known good fonts. Any font that is not present in the whitelist can be disallowed and manually inspected by an expert, e.g. by rendering in a controlled environment, before it is added to the whitelist.
Disarm’s "PDF Other" statistic
A PDF document is composed of a number of objects – ranging from tens to thousands of them – connected in a well-defined manner. A file that stores a PDF document may contain objects that are not connected to any object that is part of the document. These objects are not interpreted by the viewing application and are not displayed on the screen or printed on paper.
There are legitimate reasons why such objects might exist in a PDF file. For instance, new document versions may add new content objects without removing the older ones, which are overridden and disconnected from the other objects. However, it is possible to add objects containing arbitrary data to a PDF file and disconnected them from all other objects so that they do not affect the document. This can allow attackers to embed an arbitrary payload in any PDF document without being noticed by traditional firewalls and anti-virus scanners. The ordering of objects in a PDF file has no correlation with how they are connected to each other, which is determined by the data structures contained within the objects themselves. So, a change in the ordering of the objects, without changing their content, does not really affect the overall behavior of a document.
When a malicious PDF is created by an attacker, it usually contains a payload distributed across one or more objects. The objects are usually streamed objects that allow the embedding of binary data. Disarm analyzes the objects and their connections with other objects in the document, and when applicable, takes one of the following actions:
- Remove: If an object is unused, i.e. it is disconnected from the objects that are part of the document, then it is removed from the file. Objects that are connected only to the unused objects are also removed. Removal of such unused objects does not affect the visual fidelity of the document in any way.
- Reconstruct: If an object is used, i.e. connected to at least one of the objects that are used, then it is analyzed for the presence of potentially malicious objects within it, which are removed if found. Sometimes such modification may result in more objects getting disconnected (unused), so they are removed too.
- Reorder: All the objects that are used are then reordered without affecting their connectivity. This thwarts certain kinds of attacks that need objects to be present in a particular order.
Figure 1. A PDF document is composed of four main parts: header, body, cross-reference table, and trailer. The trailer and cross-reference table are used to locate all the relevant objects within the document. When one or more objects are removed or reordered, their corresponding entries, if present, are also removed or modified respectively.
All the operations discussed above are part of the basic PDF reconstruction and are carried out every time a PDF document is Disarmed. None of these objects are known to be potentially malicious since they are unused by and disconnected from the document and may or may not be related to any other potentially malicious components. This technique enables us to remove or perturb various types of potential zero-day exploit payloads; however, this transformation alone does not indicate the presence of any potentially malicious components in a document. In other words, it is advisable to carry out these transformations to preclude many types of malicious payload, but they will transform every PDF document, and that does not indicate the presence of malicious components in each such document.
Digitally signed documents
A digital signature provides a method to demonstrate the authenticity of a digital document. A valid digital signature gives the recipient reason to believe that the document was created by a known sender and that the sender cannot deny having created the document. Moreover, it also attests to the integrity of the document to the recipient, i.e. it has not been altered in transit. This is commonly used for documents such as government-published forms, certificates, and so on. In practice, this is used more often with PDF documents than with Microsoft Office documents, partly because Office applications are more suited to editing and are less often used in read-only mode.
When Disarm transforms a digitally signed document, the content is altered, and therefore, the signature does not remain valid. Because signing a document does not require a big infrastructure and is available to attackers at very little cost, it is difficult to trust a document based on its signature alone. So, it may be risky not to Disarm the signed documents. Because Disarm invalidates the digital signature in any document that it operates on, it is important to understand the repercussions. When a Disarmed PDF document is opened by a document viewer application, it usually presents an alert to the user to show that the document has been changed in transit and that the signature is invalid. In Disarm, this is addressed by removing the digital signature after the PDF document has been transformed.
Figure 2. A digitally signed Microsoft Word document shows an alert bar to the user implying that the document is not meant to be modified.
In Microsoft Office documents, viewing any digitally signed document shows an alert bar on top of the viewing/editing window area telling the user that the document is final and is signed by its creator. When Disarm modifies such a document and invalidates its signature, the viewing application shows another alert bar showing that the document has been changed since it was created by the signer. Currently, Disarm does nothing to remove the signatures after transforming Microsoft Office documents, so the user is expected to see the additional alert, which can be easily dismissed if the user wants.