ltechnologygroup.com is a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to amazon.com.
The FireEye Labs Advanced Reverse Engineering (FLARE) Team is
dedicated to sharing knowledge and tools with the community. We
started with the release of the FLARE On Challenge in early July where thousands
of reverse engineers and security enthusiasts participated. Stay tuned
for a write-up of the challenge solutions in an upcoming blog post.
This post is the start of a series where we look to aid other
malware analysts in the field. Since IDA Pro is the most popular tool
used by malware analysts, we’ll focus on releasing scripts and
plug-ins to help make it an even more effective tool for fighting
evil. In the past, at Mandiant we released scripts on GitHub and we’ll
continue to do so at the following new location https://github.com/fireeye/flare-ida.
This is where you will also find the plug-ins we released in the past:
Shellcode Hashes and Struct Typer. We hope you find all these scripts
as useful as we do.
Let’s start with a simple challenge. What two strings are printed
when executing the disassembly shown in Figure 1?
Figure 1: Disassembly challenge
If you answered
“Hello worldn” and
“Hello, good job! If you didn’t see it then Figure 2 makes
this more obvious. The bytes that make up the strings have been
converted to characters and the local variables are converted to
arrays to show buffer offsets.
Figure 2: Disassembly challenge with markup
Reverse engineers are likely more accustomed to strings that are a
consecutive sequence of human-readable characters in the file, as
shown in Figure 3. IDA generally does a good job of cross-referencing
these strings in code as can be seen in Figure 4.
Figure 3: A simple string
Figure 4: Using a simple string
Manually constructed strings like in Figure 1 are often seen in
malware. The bytes that make up the strings are stored within the
actual instructions rather than a traditional consecutive sequence of
bytes. Simple static analysis with tools such as strings cannot detect
these strings. The code in Figure 5, used to create the challenge
disassembly, shows how easy it is for a malware author to use this technique.
Figure 5: Challenge source code
Automating the recovery of these strings during malware analysis is
simple if the compiler follows a basic pattern. A quick examination of
the disassembly in Figure 1 could lead you to write a script that
mov instructions that begin with the opcodes
C6 45 and then extract the stack offset and character
bytes. Modern compilers with optimizations enabled often complicate
matters as they may:
- Load frequently used characters in registers which are used to
copy bytes into the buffer
- Reuse a buffer for multiple
- Construct the string out of order
Figure 6 shows the disassembly of the same source code that was
compiled with optimizations enabled. This caused the compiler to load
some of the frequently occurring characters in registers to reduce the
size of the resulting assembly. Extra instructions are required to
load the registers with a value like the 2-byte mov instruction at
0040115A, but using these registers requires only a
4-byte mov instruction like at
mov instructions that contain hard-coded byte values are
5-bytes, such as at
Figure 6: Compiler optimizations
The StackStrings IDA Pro Plug-in
To help you defeat malware that contains these manually constructed
strings we’re releasing an IDA Pro plug-in named StackStrings that is
available at https://github.com/fireeye/flare-ida.
The plug-in relies heavily on analysis by a Python library called Vivisect. Vivisect
is a binary analysis framework frequently used to augment our
analysis. StackStrings uses Vivisect’s analysis and emulation
capabilities to track simple memory usage by the malware. The plug-in
identifies memory writes to consecutive memory addresses of likely
string data and then prints the strings and locations, and creates
comments where the string is constructed. Figure 7 shows the result of
running the above program with the plug-in.
Figure 7: StackStrings plug-in results
While the plug-in is called StackStrings, its analysis is not just
limited to the stack. It also tracks all memory segments accessed
during Vivisect’s analysis, so manually constructed strings in global
data are identified as well as shown in Figure 8.
Figure 8: Sample global string
Simple, manually constructed WCHAR strings are also identified by
the plug-in as shown in Figure 9.
Figure 9: Sample WCHAR data
Download Vivisect from http://visi.kenshoto.com/viki/MainPage
and add the package to your PYTHONPATH environment variable if you
don’t already have it installed.
Clone the git repository at https://github.com/fireeye/flare-ida.
pythonstackstring.py file is the IDA Python script
that contains the plug-in logic. This can either be copied to your
%IDADIR%python directory, or it can be in any directory
found in your PYTHONPATH. The
pluginsstackstrings_plugin.py file must be copied to the
Test the installation by running the following Python commands
within IDA Pro and ensure no error messages are produced:
To run the plugin in IDA Pro go to Edit – Plugins – StackStrings or
The compiler may aggressively optimize memory and register usage
when constructing strings. The worst-case scenario for recovering
these strings occurs when a memory buffer is reused multiple times
within a function, and if string construction spans multiple basic
blocks. Figure 10 shows the construction of
“Hello theren”. The plug-in attempts
to deal with this by prompting the user by asking whether you want to
use the basic-block aggregator or function aggregator. Often the
basic-block level of memory aggregation is fine, but in this situation
running the plug-in both ways provides additional results.
Figure 10: Two strings, one buffer, multiple
You’ll likely get some false positives due to how Vivisect
initializes some data for its emulation. False positives should be
obvious when reviewing results, as seen in Figure 11.
Figure 11: False positive due to memory initialization
The plug-in aggressively checks for strings during aggregation
steps, so you’ll likely get some false positives if the compiler sets
null bytes in a stack buffer before the complete string is constructed.
The plug-in currently loads a separate Vivisect workspace for the
same executable loaded in IDA. If you’ve manually loaded additional
memory segments within your IDB file, Vivisect won’t be aware of that
and won’t process those.
Vivisect’s analysis does not always exactly match that of IDA Pro,
and differences in the way the stack pointer is tracked between the
two programs may affect the reconstruction of stack strings.
If the malware is storing a binary string that is later decoded,
even with a simple XOR mask, this plug-in likely won’t work.
The plug-in was originally written to analyze 32-bit x86 samples. It
has worked on test 64-bit samples, but it hasn’t been extensively
tested for that architecture.
StackStrings is just one of many internally developed tools we use
on the FLARE team to speed up our analysis. We hope it will help speed
up your analysis too. Stay tuned for our next post where we’ll release
another tool to improve your malware analysis workflow.
At L Technology Group, we know technology alone will not protect us from the risks associated with in cyberspace. Hackers, Nation States like Russia and China along with “Bob” in HR opening that email, are all real threats to your organization. Defending against these threats requires a new strategy that incorporates not only technology, but also intelligent personnel who, eats and breaths cybersecurity. Together with proven processes and techniques combines for an advanced next-generation security solution. Since 2008 L Technology Group has develop people, processes and technology to combat the ever changing threat landscape that businesses face day to day.
Call Toll Free (855) 999-6425 for a FREE Consultation from L Technology Group, https://www.ltechnologygroup.com.