In computers, are random numbers really random?

Sometimes it is necessary to use random numbers in programs. Depending on the need, it can be important how random the numbers used really are.

The discussion about random numbers comes down to TRNG (True Random Number Generators) vs PRNG (Pseudo-Random Number Generators).

Why does this even matter? Random is random, right?

Not really. Since computers have no imagination whatsoever, it is physically impossible for them to come up with a truly random number.

If you use built-in functions to randomize a number, it will produce a pseudo-random number using a complex algorithm.

If all you need is one random number this may be fine, but if you need a sequence of them, the series typically starts with a common ‘seed’ number and then follows a pattern, because of the algorithm it uses.

Knowing the algorithm and identifying the seed, you can predict the rest of the series by calculating it. Imagine being able to do that on an online gambling site!

So, what are you going to do about it?

Well, methods were needed that use more than just an algorithm or a given set of numbers.

The .net framework for example offers the function System.Security.Cryptography.RandomNumberGenerator that is more than just an algorithm. It also incorporates the following environmental factors in its calculations:

The current process ID
The current thread ID
The tick count since boot time
The current time
Various high precision CPU performance counters
An MD4 hash of the user’s environment (username, computer name, search path, etc)

But again, all these numbers are not random.

While the real world seems to be random by nature, computers are not. Therefore, methods were developed to sample numbers from the real world and use them as random numbers.

TrueCrypt and other programs like it, instruct you to move your mouse when they create an encryption key. The random input is used to make the key less predictable.

Other types of data used to collect random number sets are atmospheric noise, radioactive decay, patterns in a video frame and even lava lamps.

Basically, all these methods measure some physical phenomenon that is expected to be random and then compensate for possible biases in the measurement process. In all these cases, compensations would be necessary to get an even distribution, since extremes are usually more sparse then numbers near the median.

A slightly easier, and hybrid, method is to make the seed a TRNG and achieve output unpredictability by using a high quality PRNG, but as mentioned, this only works if you start with an unpredictable seed.

photodune-1104832-table-of-random-numbers-s

Can we test a set of numbers for randomness?

An easy test to check if a given set of numbers is truly random is to compress the file with the numbers (e.g. using WinRar) and see how much smaller the resulting file is compared to the original.

Compression programs look for repetitions and patterns and store these rather than the actual data. So a totally random set of numbers would not shrink by using this method.

How are random numbers used by malware?

There are two main uses that most malware has for random number generators, encryption and filename creation.

With encryption, be it file encryption or network traffic encryption, malware can use a custom or built-in PRNG and use a unique seed found by obtaining specific system variables, such as the ID number for your hard drive or, as mentioned before, the tick count since boot time.

The malware uses the result of the PRNG and uses it as a key to encrypt what they need.

In many cases, this key is transmitted, via plain-text traffic, back to the command and control. The C2 saves that key in order to read any incoming encrypted messages or store a remote decryption key if say the system was being held ransom.

Random number generation and filename creation are useful to a lot of malware, because it allows the input of similar seeds as mentioned above, but using that random number, creates a string of characters used to create a new filename.

This filename can be given to any files the malware drops on the system in an effort to mask its identity from scanners.

The above method has also been effective in domain name generation, allowing malware to be deployed without the need of a hard-coded domain name when it wants to connect to the C2.

This method is particularly interesting because it uses a built-in or custom PRNG and a seed obtained either through being hard-coded into the malware, present on all attack victim systems or available remotely via a web page being hosted that provides seeds to the malware.

Using this seed, the malware can cycle through a predictable series of domain names, all created pseudo-randomly. The attackers also know the algorithm and have used it to register the domain names from the algorithm results.

Take for example an algorithm that will create three seven-character strings when provided with a seed. The attackers create a webpage at badguy.com/seed.php that provides a seed for that particular algorithm.

The malware, upon execution on the system reaches out and grabs that seed from online and feeds it into the PRNG algorithm.

The algorithm comes up with the following strings:

Ietyskk
Loijihd
Yeuuahg

Now the attackers behind the command and control have already run this seed through their own local version of the algorithm and registered the above strings as domain names, for examples letyskk.com. The malware will cycle through all three generated strings and attempt to connect to those domains until it gets a response.

If the domain name gets revoked, all the bad guys need to do is change the value at badguy.com/seed.php to a new one and register those domains, the same malware will retrieve that value and attempt to connect to the resulting domains.

Summary

It is important to keep in mind that computers do not work easily with truly random numbers and sometimes it pays off to understand how pseudo-random numbers are used and created. Be it for your own use or that of the malware creators.