Image credit: [**pixabay**](https://pixabay.com/photos/source-code-software-computer-4280758/) Image credit: pixabay

Extracting the Payload from a CVE-2014-1761 RTF Document

This is a local mirror of a blog written by me and originally published by NCC Group.

Background

In March Microsoft published security advisory 2953095, detailing a remote code execution vulnerability in multiple versions of Microsoft Office (CVE-2014-1761). A Technet blog was released at the same time which contained excellent information on how a typical malicious document would be constructed.

NCC Group’s Cyber Defence Operations team used the information in the Technet blog to identify a malicious document within our malware zoo that exploited this vulnerability which appears to have been used in a targeted attack. In this blog we show one method of analysing the shellcode manually to extract the payload.

Matching the malicious document

The Technet blog gives a number of pointers toward a malicious document. First there is a bad header at the beginning of the document, which should be {rtf in a real document but is {rt{. Our sample matches this:

Hex editor showing RTF header

The MSComctl object is a short way into the document, in this case an ImageComboCtl:

Hex editor showing embedded OLE object

And it is easy to identify the potential ROP chain:

Hex editor showing ROP chain

What will happen if the exploit is successful? If the exploit doesn’t work on our test systems, how can we manually extract the payload?

We know that the document should contain something useful, either saving malicious embedded content or using a URL download/execute. But where is this shellcode?

Analysing the shellcode

After identifying the vulnerability we can now hunt for the shellcode which will run on successful exploitation. The Technet blog suggests the shellcode is placed near the end of the file so this is a good place to start. Upon loading into IDA the correct option to choose is 32-bit disassembly.

Locating the shellcode

How can we quickly identify what might be code? One common technique in shellcode is using the hashes of Windows APIs, searching for these can often yield good results. Running a small IDA Python script over the database returns some possible matches:

Output from a custom IDA Python script showing matches for shellcode hashes

The first four are probably misdetections but the following API names definitely look suspicious. All of them are toward the end of the file, which ends at 0x71CB1. Checking the results for Sleep and ExitProcess shows the following potential shellcode locations:

Screenshot of IDA Pro showing shellcode hashes

Turning the bytes into code

It is now possible to see where some of the hashed APIs might be used, which gives an indication of where the shellcode is located. We can begin to convert the unknown bytes into code (right click and choose “Code”, or use the shortcut C).

If we accidentally choose the wrong place to start analysing then it is possible to end up with “junk” results, as demonstrated below:

Screenshot of IDA Pro with “junk” code

We can fix this by undefining the junk code (right click, “Undefine” or shortcut U), then making code at a slightly different offset. Very quickly the disassembly starts looking like real code:

Screenshot of IDA Pro with correct code

Calling functions by hash

We can now see that the API hash is placed in the EBX register before a function is called, which has been manually named CallByHash by us in the disassembly above.

This function uses the standard mechanism of obtaining the PEB to find loaded modules:

Screenshot of IDA Pro with code to locate the PEB

The correct API is found using a simple ROR 0x13 (19 decimal) loop until the generated hash matches the value in EBX where the desired hash is stored (see comparison instruction at 0x71A83).

Screenshot of IDA Pro with shellcode hashing routine

This allows the shellcode to locate and call any Windows API from kernel32.dll without knowing anything about the process which loaded the RTF file or including API name strings.

Finding ourselves – where is the RTF file?

The shellcode next needs to find the RTF file so it can locate and save the payload. It does this by iterating over all possible file handles until a valid one is found. This will always work because Word must have the RTF file open in order to parse it.

The code below tries each possible handle in turn, starting from 0x4 until 0x4000. It then calls GetFileSize, ensuring that the handle is valid by checking the return code.

Screenshot of IDA Pro with shellcode hashing routine

The code which follows is responsible for finding the start of the payload and saving it to disk. The position is first reset to the start of the file (offset 0) using SetFilePointer. The loop below then looks for the characters S18t in the document and obtains the offset if the string is found. If the characters are not present then the shellcode tries the next handle until an open file containing S18t is located.

Screenshot of IDA Pro with egg hunt code for S18t

Once the payload data is found it is unobfuscated with a simple XOR loop, seen below. This is important to note for when we extract the data manually later.

Screenshot of IDA Pro with XOR loop

Following this are standard calls to GetTempPathA, CreateDirectoryA, CreateFileA and WriteFile, which save the payload to disk. Finally the shellcode calls LoadLibraryExA to launch the payload and then sleeps before calling ExitProcess to terminate Microsoft Word cleanly.

Unusual code or shellcode trickery?

Other typical techniques are also evident, for example this simple sequence:

Screenshot of IDA Pro with obfuscated constants

The constant 0x40000000 (equivalent to GENERIC_WRITE permissions) is obtained by taking the number 0x41010101 and subtracting 0x1010101, avoiding null bytes in the shellcode. The same trick is used for some API hashes, for example CloseHandle below:

Screenshot of IDA Pro with obfuscated constants

A simple calculation shows that the hash for CloseHandle would be 0xED00C776, which contains a null byte.

Extracting the payload

With the information above we can extract the payload data from the document and decode the executable which will be run. By searching for the string S18t the start of data can be found.

Screenshot of hex editor with the start of data

The bytes following S18t look suspiciously like an obfuscated PE header, using our earlier information about the usage of XOR 0x4 we can test to see if this is correct:

Screenshot of hex editor with an obfuscated PE header

From here we can copy all of the bytes from offset 0x6c38 to the end of the file and then apply XOR 0x4 to obtain a PE file. The resulting file will contain the shellcode at the end; this could be removed if desired. Loading the payload into IDA shows a well formed executable which allows us to begin further analysis.

In this instance the payload was a 425KB executable which is often called the “havex RAT”. Crowdstrike attribute the use of this malware to a group called ENERGETIC BEAR in their Global Threat Report 2013. At the time of our analysis only 1 antivirus engine of 50 on VirusTotal detected the payload as malicious, once again highlighting the malicious code arms race.

David Cannings
David Cannings
Cyber Security

My interests include computer security, digital electronics and writing tools to help analysis of cyber attacks.