Since June 27th we have been investigating the outbreak of the new Petya-like malware armed with an infector similar to WannaCry. Since day one, various contradicting theories started popping up. Some believed that this malware is a rip-off of the original Petya, while others think that it is another step in Petya’s evolution. However, those were just different opinions and none of them were backed up with enough evidence to hold solid. In this post, we will try to fill this gap by making step-by-step comparisons of the current kernel and the one on which it is based (Goldeneye Petya).
- 71b6a493388e7d0b40c83ce903bc6b04 – the main DLL
- f3471d609077479891218b0f93a77ceb – the low level part (Petya bootloader + kernel) <- the main focus of this analysis
Why is it important to know whether or not the code was recompiled?
Answering this question and collecting enough evidence is crucial for further discussions on attribution. The source code of the original Petya has never been leaked publicly, so in case it was recompiled it proves that the original Petya’s author, Janus, is somehow linked to the current outbreak (either this is his work or he has sold the code to another actor).
In this analysis, we hope to identify if this malware could have been recompiled from the original code, or it’s just a work of anyone with the appropriate skills to modify the ready-made binary. Doing so would not entirely disprove Janus as the creator, but his involvement becomes less likely.
Anyways, let’s take a look at the code.
Looking at the sectors, we can find that the layout of EternalPetya is identical to Goldeneye. Full comparison:
- Petya Goldeneye: sector 1
- Petya Eternal: sector 1
- Petya Goldeneye: 32
- Petya Eternal: 32
- Petya Goldeneye: 33
- Petya Eternal: 33
Original MBR (xored with 7)
- Petya Goldeneye: 34
- Petya Eternal: 34
Comparing both kernels at hexadecimal level, we can see tiny differences at various points. However, there are big portions of code that are identical in both.
The screenshots below show fragments of the (current) EternalPetya on the left, and Goldeneye on the right.
Its interesting that, at some point, the layout of the same strings in the memory was shifted:
As mentioned, the data sector starts in both cases at the same offset. This sector stores the random Salsa20 key and nonce, which are generated per victim, and this is identical in both cases. However, in Goldeneye the victim ID is much longer, which is not surprising taking into the account the fact that in the past it was supposed to be the encrypted backup of the Salsa key, and now it is just an arbitrary string, so it’s length doesn’t really matter.
The first thing that struck me as different was the bootloader. Fragment of the hexdump (as before: EternalPetya on the left, and Goldeneye on the right.):
Functionality-wise, it is the same in both cases. It is supposed to read 32 (0x20) sectors from the disk, starting from sector 1, and load them into memory at the address 0x8000. However, the opcodes that are used in both cases to do the same operations are a bit different.
This is the old bootloader, used in Goldeneye:
And this is the bootloader used in the EternalPetya version:
My first impression upon seeing this was that the code was recompiled with different settings, however, another possibility also exists. The total length of the different fragments are the same – so, we cannot exclude the possibility that someone manually edited them inside the pre-compiled binary.
Optimizations – and why it matters
So far we’ve seen some interesting changes, but they were not enough to prove or disprove whether the code was recompiled. However, the breakthrough in the research may lie in the interesting observation made by David Buchanan.
The Salsa20 Key expansion was modified using a hexeditor, NOT by modifying the source pic.twitter.com/Q06ZEle8k9— David Buchanan (@David3141593) June 29, 2017
His theory was based on compiler optimization, which ensures that the same character will not need to be loaded into memory twice. We can see this rule applied in examining the code responsible for storing a string in the memory. Inside of Goldeneye’s key expansion function, we can find that this kind of optimization absolutely happens – every character is unique, no character is loaded twice:
But in the corresponding fragment of the current kernel, we can find that this rule is broken. The character ‘d’ repeats and optimization was not applied:
If the same code was generated by a compiler, this fragment would look identical to other repeated characters:
mov al, 'd' mov [bp+var_B], al mov [bp+var_3], al
This is a very strong argument against the theory of the code being recompiled. But anyway, let’s continue the analysis and see if we can find even more evidence.
Closer look at the changes
In a previous post I presented a fast comparison of the current kernel vs Goldeneye, done with the help of IDA plugin, BinDiff:
We can see that significant modifications have been made only in the functions related to displaying the information screen. Let’s check how exactly these changes have been applied.
main_info_screen (offset 0x8426):
Changes of the main_info_screen pointed out by the BinDiff (left: current, right: Goldeneye):
As we can see, the call to a function at 0x008848E was replaced with NOPs (No Operation). This is a common practice used to remove an unwanted function in case of patching compiled binaries. Yet, sometimes it can be also introduced by #Ifdefs. The rest of the code matches the previous version, even using the same offsets. However, the addresses to the displayed strings are different in both binaries.
The unreferenced function is still present in the current binary:
…and called in some other places of code:
Comparison to the Goldeneye’s call graph, it lacks one of the references, but the other ones are consistent:
sub_86E0 (offset 0x86E0):
The second change is in another function, that is also a part of the information screen. It is not referenced from any other place in the code:
As we can see, it is called at the beginning of the previously discussed function:
In the Goldeneye kernel, the corresponding function was the one responsible for printing the skull:
The first jump leads to the loop responsible for displaying the skull and waiting for the key to be pressed by the user. Fragment of the code:
Looking inside the EternalPetya code, we are almost sure that this function was patched post-compilation, rather than recompiled. The first jump, that was supposed to lead to the loop leads directly to the function end:
The original code is still in the binary, but it is never referenced (dead code).
Are the patches reversible?
I thought as a finishing touch of this research it would be interesting to reverse the changes and bring the dead code back to life. As an input, I used the dumped code of:
- EternalPetya kernel + bootloader (f3471d609077479891218b0f93a77ceb).
My version (reverse patch): (7957520271edf003742db63fc250c231).
Indeed, after applying the patches, we are back to seeing the same blinking screen, only the skull is gone (the corresponding strings has been overwritten):
I think the presented evidence is enough to prove, that the code was not recompiled from the original source (in contrary to what I initially suspected). Thus, the involvement of the original Petya author, Janus, seems unlikely. It seems in this case he was just chosen as a scapegoat by some different actor.
The edits made in the code are well crafted – the person doing them was fluent in assembly and knew exactly what to change and why. Thus, it gave the first impression of very neat and clean modifications, that could possibly be a result of code recompilation. Yet, after doing a deeper analysis, we have identified numerous nuances that show otherwise.
EternalPetya seems to be a patchwork made of code stolen from various sources. In addition to the modified version of the GoldenEye Petya kernel, we can find the leaked NSA exploits from the “Eternal” series as well as legitimate applications, such as PsExec.
It is common practice among unsophisticated actors (script-kiddies) to steal and repurpose someone else’s code. However, in this case, the composition was done well by a person or team with good technical knowledge and careful execution. A possible reason for using so many stolen elements, apart from saving actor’s time, could have been to throw off any obvious signs of attribution.
There are still many mysteries to solve about this malware which creates many theories that, until proven true, are nothing more than speculation.
This was a guest post written by Hasherezade, an independent researcher and programmer with a strong interest in InfoSec. She loves going in details about malware and sharing threat information with the community. Check her out on Twitter @hasherezade and her personal blog: https://hshrzd.wordp