Obfuscation: Malware's best friend

Here at Malwarebytes, we see a lot of malware. Whether it’s a botnet used to attack web servers or a ransomware stealing your files, much of today’s malware wants to stay hidden during infection and operation to prevent removal and analysis. Malware achieves this using many techniques to thwart detection and analysis—some examples of these include using obscure filenames, modifying file attributes, or operating under the pretense of legitimate programs and services. In more advanced cases, the malware might attempt to subvert modern detection software (i.e. MBAM) to prevent being found, hiding running processes and network connections. The possibilities are quite endless.

Despite advances in modern malware, dirty programs can’t hide forever. When malware is found, it needs some additional layers of defense to protect itself from analysis and reverse engineering. By implementing additional protection mechanisms, malware can be more difficult to detect and even more resilient to takedown. Although a lot of tricks are used to hide malware’s internals, a technique used in nearly every malware is binary obfuscation.

Obfuscation (in the context of software) is a technique that makes binary and textual data unreadable and/or hard to understand. Software developers sometimes employ obfuscation techniques because they don’t want their programs being reverse-engineered or pirated.

Its implementation can be as simple as a few bit manipulations and advanced as cryptographic standards (i.e. DES, AES, etc). In the world of malware, it’s useful to hide significant words the program uses (called “strings”) because they give insight into the malware’s behavior. Examples of said strings would be malicious URLs or registry keys. Sometimes the malware goes a step further and obfuscates the entire file with a special program called a packer.

Let’s see some practical obfuscation examples used in a lot of malware today.

Scenario 1: The exclusive or operation (XOR) The exclusive or operation (represented as XOR) is probably the most commonly used method of obfuscation. This is because it is very easy to implement and easily hides your data from untrained eyes. Consider the following highlighted data.

Obfuscated data is unreadable in its current form.

In its current form, the data is unreadable. But when we apply an XOR value of 0x55, we see something else entirely.

An XOR operation using 0x55 reveals a malicious URL.

Now we have our malicious URL. Looks like this malware contacts “http://tator1157.hostgator.com” to retrieve the file “bot.exe”.

This form of obfuscation is typically very easy to defeat. Even if you don’t have the XOR key, programs exist to manually cycle through every possible single-byte XOR value in search of a particular string. One popular tool available on both UNIX and Window platforms is XORSearch written by Didier Stevens. This tool searches for strings encoded in multiple formats, including XOR.

Because malware authors know programs like these exist, they implement tricks of their own to avoid detection. One thing they might do is a two-cycle approach, performing an XOR against data with a particular value and then making a second pass with another value. A separate technique (although equally effective) commonly used is to increment the XOR value in a loop. Using the previous example, we could XOR the letter ‘h’ with 0x55, then the letter ‘t’ with 0x56, and so on. This would also defeat common XOR detection programs.

Scenario 2: Base64 encoding

Base64 encoding has been used for a long time to transfer binary data (machine code) over a system that only handles text. As the name suggests, its encoding alphabet contains 64 characters, with the equal sign (=) used as a padding character. The alphabet contains the characters A-Z, a-z, 0-9, + and /. Below is an example of some encoded text representing the string pointing to the svchost.exe file, used by Windows to host services.

Base64 is commonly used in malware to disguise text strings.

While the encoded output is completely unreadable, base64 encoding is easier to identify than a lot of encoding schemes, usually because of its padding character. There are a lot of tools that can perform base64 encode/decode functions, both online and via downloaded programs.

Because base64 encoding is so easy to overcome, malware authors usually take things a step further and change the order of the base64 alphabet, which breaks standard decoders. This allows for a custom encoding routine that is more difficult to break.

Scenario 3: ROT13 Perhaps the most simple of the three techniques that’s commonly used is ROT13. ROT is an ASM instruction for “rotate”, hence ROT13 would mean “rotate 13”. ROT13 uses simple letter substitution to achieve obfuscated output.

Let’s start by encoding the letter ‘a’. Since we’re rotating by thirteen, we count the next thirteen letters of the alphabet until we land at ‘n’. That’s really all there is to it!

ROT13 uses a simple letter substitution to jumble text.

The above image shows a popular registry key used to list programs that run each time a user logs in. ROT13 can also be modified to rotate a different number of characters, like ROT15.

Scenario 4: Runtime packers In a lot of cases, the entire malware program is obfuscated. This prevents anybody from viewing the malware’s code until it is placed in memory.

This type of obfuscation is achieved using what’s known as a packer program. A packer is piece of software that takes the original malware file and compresses it, thus making all the original code and data unreadable. At runtime, a wrapper program will take the packed program and decompress it in memory, revealing the program’s original code.

Packers have been used for a long time for legitimate purposes, some of which include reducing file sizes and protecting against piracy. They help conceal vital program components and deter novice program crackers.

Fortunately, we aren’t without help when it comes to identifying and unpacking these files. There are many programs available that detect commercial packers, and also advise on how to unpack. Some examples of these file scanners are Exeinfo PE and PEID (no longer developed, but still available for download).

Exeinfo PE is a great tool for detecting common packers.

However, as you might expect, the situation can get more complicated. Malware authors like to create custom packers to prevent less-experienced reverse engineers from unpacking their malware’s contents. This approach defeats modern unpacking scripts, and forces reversers to manually unpack the file and see what the program is doing. Even rarer, sometimes malware authors will twice-pack their files, first with a commercial packer and then their own custom packer.

Conclusion While this list of techniques is certainly not exhaustive, hopefully this has provided a better understanding of how malware hides itself from plain sight. Obfuscation is a highly reliable technique that’s used to hide file contents, and sometimes the entire file itself if using a packer program.

Obfuscation techniques are always changing, but rest assured knowing we at Malwarebytes are well-aware of this. Our staff has years of experience in fighting malware, and goes to great lengths to see what malicious files are really doing.

Bring it on, malware. Do your worst!

_______________________________________________________________________________

Joshua Cannell is a Malware Intelligence Analyst at Malwarebytes where he performs research and in-depth analysis on current malware threats. He has over 5 years of experience working with US defense intelligence agencies where he analyzed malware and developed defense strategies through reverse engineering techniques. His articles on the Unpacked blog feature the latest news in malware as well as full-length technical analysis. Follow him on Twitter @joshcannell