Encryption 101: a malware analyst's primer

While most in the security industry know what encryption is, many lack a basic understanding of how it is used in malware—especially ransomware. Because of this, we thought it would be beneficial to do an introductory primer on encryption mechanisms and how they are exploited for malicious purposes.

We will start with a introduction to encryption in general, and follow up with the main methods used to encrypt files used by ransomware. In part two of this blog series, we’ll use a recent ransomware variant detected by Malwarebytes as Ransom.ShiOne to highlight the key weaknesses in encryption to look out for when trying to decrypt files.

What is encryption?

In the simplest of terms, encryption is the process of encoding information so that only authorized parties can access it, and those who are not authorized cannot. In computing, encryption is the method by which data is converted from a readable format (plaintext) to an encoded one (ciphertext) that can only be decoded by another entity if they have access to a decryption key. While encryption was long used by the military to facilitate secret communication, today it is used to secure data in transit and in storage, as well as to authenticate and verify identities.

Unfortunately, encryption is also used for malicious purposes, as is the case with ransomware.

Evaluating the encryption

If a malware analyst wants to effectively evaluate a malicious encryption, he needs to observe the encryption on the machine that is creating or receiving the encrypted data. If you have access to any process running before it has performed the encryption phase, you will typically have enough data to be able to decrypt or to simply view the natively decrypted data.

Being the observer is the only chance you have to recover files without needing to crack the encryption. However, for a ransomware attack, being an observer usually isn’t possible. After the malware finishes running and sends off the encryption keys, it is too late. When the malware shuts down, you can no longer be the observer. Now you must rely on analysis and hope there is a flaw in the encryption.

What exactly does it mean to be an observer of decryption and encryption? Someone once asked me:

Why do malware authors not always encrypt communication to a C2 server?

My answer to him was that malware is public, in that it can run on random victim systems throughout the world. As reverse engineers, we always have access to the binary and are able to inspect the software on the lowest, most detailed level. A secure communication is meaningless at this level because we will be looking at the system before encryption and after decryption.

An SSL or https communication received on the client side (victim computer) will be decrypted in memory for the data to be processed in whichever way the malware intends. At this point, we will always be able to inspect memory and pull the decrypted communication out in its raw form, so it really doesn’t hide anything when a server sends encrypted data down to a malware that is being analyzed.

This same logic applies to many use cases in ransomware and file encryption. If we are looking at a ransomware and it is generating its encryption keys locally, we can observe the keys in memory immediately after creation, save those out, and use those keys from memory to decrypt files after the ransomware runs, with a requirement that we are able to determine the algorithm used for the encryption.

If during the process of ransomware running and encrypting files the user dumped its memory, he has a chance of being an observer, and a likely possibility to recover files. This is unfortunately not a typical scenario, as a victim’s first instinct is not to create a memory dump while continuing to allow the process to run. But for a theoretical example, it technically can work.

Ransomware algorithms

Over the years, we have come across many algorithms used to hold victims’ files hostage. The most common of these involve the use of standard, public, and proven algorithms for asymmetric encryption. But occasionally we see custom encryption (which is likely weaker) or even simple, obfuscated methods to holds files hostage.

Years ago, when I would come across ransom malware, it would typically use alternative methods for holding the victim computer as ransom, rather than encrypting all files on the drive. I have seen everything from file hiding and custom crypto to Master Boot Record (MBR) rewriting.

File obfuscation

In the case of file obfuscation, the ransomware simply would move or hide targeted files—documents and anything else it believes the victim would care about—and ask for a ransom to recover the files. In this case, recovery is trivial. You can simply reverse the code that is performing the hiding and be able to reverse the actions it took.

Here’s an example: A fake pop-up claims your hard drive is corrupt, and asks you to call and pay for “support” to recover the files. On this particular malware I analyzed, it displayed a pop-up window (shown below), and simply moved all the documents and desktop files into a new folder in a hidden location. The fix for this was looking at the code to see which files it moved and to what location.

Custom cryptos
When dealing with custom cryptos, typically a file is passed through an algorithm that modifies the file in a standard way. A simple example would be every byte in the file being XORed by a constant or cycling set of bytes. In these cases, you can almost think of the algorithm itself as the key. If you can reverse the algorithm, you can figure out exactly how it is modifying or encrypting the file. Then, simply reversing the steps will give you the original file. In contrast, when faced with asymmetric cryptography, the algorithm itself will not provide you with enough to be able to decrypt a file.
MBR rewriting
In the third scenario, the MBR would be rewritten with a small program that requires a password or serial number for access. The malware would force a reboot of the computer, and before the system would load Windows, it would first prompt the user with instructions on how to pay the ransom to receive a passcode. In this scenario, reversing the serial or password validation algorithm within the boot record (essentially creating a keyGen), would provide you with the ability to know which password would allow access. Alternatively, rewriting that portion of the hard drive with the factory boot record would also disable the lock on your computer.

Aside from reversing the algorithm, the only difficulty here would be knowing how to rewrite the MBR yourself in order to restore the original code to this section of the drive.

Below is an example of an MBR locker. Notice that no ID is asked of you, which means there is nothing unique to the specific blockage, and a static unlock code is likely to be needed.

Now, these three alternative methods are not quite using encryption in the standard sense of the word, but I am mentioning them here because it shows that a custom written, closed-source obfuscation is sometimes trivial to break. The reason most criminals use standardized, public, open-source encryption algorithms to encrypt files for ransom is that they are tested and reliably secure. That means you can know every detail about the encryption algorithm but it won’t matter—you can’t decrypt the encrypt by any method other than having the encryption key.

Why is this important? Encryption using standard, open-source code is built off of the relationship between the encryption keys. The algorithms used simply derive two separate keys that are related. This concept is called asymmetric cryptography, and it is the method of encryption that the vast majority of ransomware authors use today.

Asymmetric cryptography basics

Asymmetric encryption involves generating two keys that are completely different; however, they share a relationship. One key (the public key) is used to encrypt the data into ciphertext, while its companion (the private key) is used to decrypt the ciphertext into the original plaintext. It is called asymmetric because the public key, although it was used to encrypt, cannot be used to decrypt. The private key is required to decrypt.

In encrypted communication that uses asymmetric cryptography, two keys are generated locally—the public key and the private key. The public key is available for the whole word to see. If someone wants to send a message that only one person (Bob) could read, they would encrypt the message using Bob’s public key. Because his public key cannot be used to decrypt the message, it is completely safe. Using his offline, private key, Bob would be able to decrypt the message and see the original text.

Here a few visualizations to help you understand.

“>

This is the same method used by ransomware authors for file encryption. A basic run-through of an encryption process goes as follows:

An array of random numbers is generated. This series of bytes will be used before the first round of the file encryption. Typically, a mathematical series of operations is performed on the public key using this random initialization to, in effect, create a sub-key derived from the initial key. This sub-key will be now used in order to actually encrypt the file data. During the first phase of the encryption process, a second random array is generated, which will be used as an initialization vector (IV). Every stage following will use the output of the previous stage as its new IV.

A random number is first used as the IV, then the generated ciphertext is used for the next round of encryption.

The generation of the keys themselves also relies on a random number generator. So as you can imagine, having a solid and “as random as possible” generator is extremely important.

Ransomware file encryption

Modern ransomware typically does one of a couple things: Either it dynamically generates the keys locally and sends them up to the C2 server attached to the client ID, or the keys are generated by the author and are preloaded into the ransomware itself.

The latter, although arguably more secure, requires a lot of overhead in generating a completely new binary for each victim, or at least distributing a batch of identical malware using the same key in each campaign version. The trade-off here is that although the key generation cannot be observed directly and inspected for weaknesses by an analyst, two victims will actually have the same encryption keys. So if one person pays the ransom and shares his keys, everyone else impacted by that campaign version can decrypt their files for free. This is a weakness via leaked keys.

Now, if the keys were dynamically generated, it allows the slim potential usage of a memory dump for file recovery, and also the slim possibility that an analyst can find a hole in the encryption. (The memory dump is not that big of a problem for the malware author because, as we said earlier, it is rare when a user knows that he should create a dump). However, the benefit to this local key generation method is that the malware is fully dynamic, and no two people will share the same key. Both methods have their share of weaknesses and strengths.

Modern ransomware authors typically use one of these forms of standardized encryption—AES, RSA, Blowfish, etc.—to try and make the victim’s files unrecoverable without the author providing the decryption key. The reason I say “try” is because there are many cases where these good algorithms are being misused (which will allow the same key to be generated twice). In addition, the transfer and generation of keys can be intercepted.

Asymmetric cryptography encryption may be near impossible to crack, but that doesn’t mean it can’t be broken.

To learn how, tune in for part 2 of Encryption 101 – ShiOne ransomware case study, which uses the ShiOne ransomware to demonstrate how malware analysts can look for weaknesses in encryption.