This post was originally published on this site is a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to


Rich Text Format (RTF) is a document format developed by Microsoft
that has been widely used on various platforms for more than 29 years.
The RTF format is very flexible and therefore complicated. This makes
the development of a safe RTF parsers challenging. Some notorious
vulnerabilities such as CVE-2010-3333
and CVE-2014-1761
were caused by errors in implementing RTF parsing logic.

In fact, RTF malware is not limited to exploiting RTF parsing
vulnerabilities. Malicious RTF files can include other vulnerabilities
unrelated to the RTF parser because RTF supports the embedding of
objects, such as OLE objects and images. CVE-2012-0158
and CVE-2015-1641
are two typical examples of such vulnerabilities – their root cause
does not reside in the RTF parser and attackers can exploit these
vulnerabilities through other file formats such as DOC and DOCX.

Another type of RTF malware does not use any vulnerabilities. It
simply contains embedded malicious executable files and tricks the
user into launching those malicious files. This allows attackers to
distribute malware via email, which is generally not a vector for
sending executable files directly.

Plenty of malware authors prefer to use RTF as an attack vector
because RTF is an obfuscation-friendly format. As such, their malware
can easily evade static signature based detection such as YARA or
Snort. This is a big reason why, in this scriptable exploit era, we
still see such large volumes of RTF-based attacks.

In this blog, we present some common evasive tricks used by
malicious RTFs.

Common obfuscations

Let’s discuss a couple different RTF obfuscation strategies.

1.     CVE-2010-3333

This vulnerability, reported by Team509 in 2009, is a typical stack
overflow bug. Exploitation of this vulnerability is so easy and
reliable that it is still used in the wild, seven years after its
discovery! Recently, attackers exploiting this vulnerability targeted
an Ambassador of India

The root cause of this vulnerability was that the Microsoft RTF
parser has a stack-based buffer overflow in the procedure parsing the
pFragments shape property. Crafting a malicious RTF to exploit this
vulnerability allows attackers to execute arbitrary code. Microsoft
has since addressed the vulnerability, but many old versions of
Microsoft Office were affected, so its threat rate was very high.

The Microsoft Office RTF parser lacks proper bounds checking when
copying source data to a limited stack-based buffer. The pattern of
this exploit can be simplified as follows:

pFragments}{sv A;B;[word1][word2][word3][hex value

Because pFragments is rarely seen in normal RTF files, many firms
would simply detect this keyword and the oversized number right after
sv in order to catch the exploit using YARA or Snort rules. This
method works for samples that are not obfuscated, including samples
generated by Metasploit. However, against in-the-wild samples, such
signature-based detection is insufficient. For instance, the
malicious RTF targeting the Ambassador of India
is a good sample
to illustrate the downside of the signature based detection. Figure 1
shows this RTF document in a hex editor. We simplified Figure 1
because of the space limitations – there were plenty of dummy symbols
such as { } in the initial sample.

Figure 1. Obfuscated sample of CVE-2010-3333

As we can see, the pFragments keyword has been split into many
pieces that would bypass most signature based detection. For instance,
most anti-virus products failed to detect this sample on first
submission to VirusTotal. In fact, not only will the split pieces of
sn be combined together, pieces of sv will be combined as well. The
following example demonstrates this obfuscation:


{rtf1{shp{sp{sn2 pF}{sn44
ragments}{sv 1;28}{sv ;fffffffffffff….}}}}


{rtf1{shp{sp{sn pFragments}{sv 1;28

We can come up with a variety of ideas different from the
aforementioned sample to defeat static signature based detection.

Notice the mixed ‘x0D’ and ‘x0A’ – they are ‘r’ and ‘n’ and the
RTF parser would simply ignore them.

2.     Embedded objects

Users can embed variety of objects into RTF, such as OLE (Object
Linking and Embedding) control objects. This makes it possible for OLE
related vulnerabilities such as CVE-2012-0158 and CVE-2015-1641 to be
accommodated in RTF files. In addition to exploits, it is not uncommon
to see executable files such as PE, CPL, VBS and JS embedded in RTF
files. These files require some form of social engineering to trick
users into launching the embedded objects. We have even seen some Data
Loss Prevention (DLP) solutions embedding PE files inside RTF
documents. It’s a bad practice because it cultivates poor habits in users.

Let’s take a glance at the
embedded object syntax first

<objtype> specifies the type of object. objocx is the most
common type used in malicious RTFs for embedding OLE control objects;
as such, let’s take it as an example. The data right after objdata is
OLE1 native data, defined as:


(binN #BDATA) | #SDATA


Binary data


Hexadecimal data

Attackers would try to insert various elements into the <data>
to evade static signature detection. Let’s take a look at some
examples to understand these tricks:

a.     For example, binN can be swapped with #SDATA. The data right
after binN is raw binary data. In the following example, the numbers
123 will be treated as binary data and hence translated into hex
values 313233 in memory.


{objectobjocxobjdata bin3


{objectobjocxobjdata 313233}

Let’s look at another example:




{objectobjocxobjdata 313233}

If we try to call atoi or atol with the numeric parameter string
marked in red in the table above, we will get 0x7fffffff while its
true value should be 3.

This happens because bin
takes a 32-bit signed integer numeric parameter
. You would think
that the RTF parser calls atoi or atol to convert the numeric string
to an integer; however, that’s is not the case. Microsoft Word’s RTF
parser does not use these standard C runtime functions. Instead, the
atoi function in Microsoft Word’s RTF parser is implemented as follows:

b.     ucN and uN
Both of them are ignored, and the
characters right after uN would not be skipped.

c.     The space characters: 0x0D (n), 0x0A (r), 0x09 (t) are ignored.

d.     Escaped characters
RTF has some special symbols that
are reserved. For normal use, users will need to escape these symbols.
Here’s an incomplete list:



All of those escaped characters are ignored, but there’s an
interesting situation with ’hh. Let’s look into an example first:


{objectobjocxobjdata 341’112345


{objectobjocxobjdata 342345}

When parsing ’11, the parser will treat the 11 as an encoded hex
byte. This hex byte is then discarded before it continues parsing the
rest of objdata. The 1 preceding ’11 has also been discarded. Once
the RTF parser parses the 1 right before ’11, which is the higher
4-bit of an octet, and then immediately encounters ’11, the higher
4-bit would be discarded. That’s because the internal state for
decoding the hex string to binary bytes has been reset.

The table below shows the processing procedure, the two 1s in the
yellow rows are from ’11. It’s clear that the mixed ’11 disorders
the state variable, which causes the higher 4-bit of the second byte
to be discarded:

e.     Oversized control word and numeric parameter
says that a control word’s name cannot be longer
than 32 letters and the numeric parameter associated with the control
word must be a signed 16-bit integer or signed 32-bit integer, but the
RTF parser of Microsoft Office doesn’t strictly obey the
specification. Its implementation only reserves a buffer of size 0xFF
for storing the control word string and the numeric parameter string,
both of which are null-terminated. All characters after the maximum
buffer length (0xFF) will not remain as part of the control word or
parameter string. Instead, the control word or parameter will be terminated.

In the first obfuscated example, the length of the over-sized
control word is 0xFE. By adding a null-terminator, the control word
string will reach the maximum length of 0xFF, then the remaining data
belongs to objdata.

For the second obfuscated example, the total length of the “bin”
control word and its parameter is 0xFD. By adding their
null-terminator, the length equals 0xFF.

f.     Additional techniques

The program uses the last objdata control word in a list, as shown here:


554564{*objdata 4444}54545} OR

{objectobjocxobjdata 554445objdata 444454545}

{objectobjocx{{objdata 554445}{objdata 444454545}}}


{objectobjocxobjdata 444454545}

As we can see here, except for binN, other control words are ignored:


44444444{par2211 5555}6666}       OR

{objectobjocxobjdata 44444444{datastore2211 5555}6666}

{objectobjocxobjdata 44444444datastore2211
55556666}   OR

44444444{unknown2211 5555}6666}   OR

{objectobjocxobjdata 44444444unknown2211 55556666}



{objectobjocxobjdata 4444444455556666}

There is another special case that makes the situation a bit more
complicated. That is control symbol *. From RTF specification, we can
get the description for this control symbol:

    Destinations added after the 1987
RTF Specification
may be preceded by the control symbol
* (backslash asterisk). This control symbol identifies
destinations whose related text should be ignored if the RTF reader
does not recognize the destination control word.

Let’s take a look at how it can be used in obfuscations:



44444444{*par314 5555}6666}


{objectobjocxobjdata 4444444455556666}

par is a known control word that does not accept any data. RTF
parser will skip the control word and only the data that follows remains.



44444444{*datastore314 5555}6666}


{objectobjocxobjdata 444444446666}

RTF parser can also recognize datastore and understand that it can
accept data, therefore the following data will be consumed by datastore.



44444444{*unknown314 5555}6666}


{objectobjocxobjdata 444444446666}

For an analyst, it’s difficult to manually extract embedded objects
from an obfuscated RTF, and no public tool can handle obfuscated RTF.
However, winword.exe uses the OleConvertOLESTREAMToIStorage function
to convert OLE1 native data to OLE2 structured storage object. Here’s
the prototype of OleConvertOLESTREAMToIStorage:

The object pointed by lpolestream contains a pointer to OLE1 native
binary data. We can set a breakpoint at OleConvertOLESTREAMToIStorage
and dump out the object data which has been de-obfuscated by the RTF Parser:

The last command .writemem writes a section of memory to
d:evil_objdata.bin. You can specify other paths as you want; 0e170020
is the start address of the memory range, and 831b6 is the size.

Most of the obfuscation techniques of objdata can also apply to
embedded images, but for images, it seems there is no obvious
technique as OleConvertOLESTREAMToIStorage. To extract an obfuscated
picture, locate the RTF parsing code quickly using data breakpoint and
that will reveal the best point to dump the whole data.


Our adversaries are sophisticated and familiar with the RTF format
and the inner workings of Microsoft Word.  They have managed to devise
these obfuscation tricks to evade traditional signature-based
detection. Understanding how our adversary is performing obfuscation
can in turn help us improve our detection of such malware.


Thanks to Yinhong Chang, Jonell Baltazar and Daniel Regalado for
their contributions to this blog.

At L Technology Group, we know technology alone will not protect us from the risks associated with in cyberspace. Hackers, Nation States like Russia and China along with “Bob” in HR opening that email, are all real threats to your organization. Defending against these threats requires a new strategy that incorporates not only technology, but also intelligent personnel who, eats and breaths cybersecurity. Together with proven processes and techniques combines for an advanced next-generation security solution. Since 2008 L Technology Group has develop people, processes and technology to combat the ever changing threat landscape that businesses face day to day.

Call Toll Free (855) 999-6425 for a FREE Consultation from L Technology Group,