26 December 2012

When random numbers go One

Random Numbers

Random numbers are great. They let you write interesting computer simulations, accurately sample statistics, implement cryptographic algorithms and, most importantly, gamble without thinking. In Java the usual way of accessing random numbers is through a class called java.util.Random. You can instantiate it with a seed, or the current time, and then you have a sequence of values that have easy type specific accessor methods, for example:

Random random = new Random();
// Prints out either 0 or 1
System.out.println(random.nextInt(2));

Audience Participation Time

If your expectations hadn't been biased by asking this question: what would you expect the following code to print out:

public static void main(String[] args) {
    int start = 0;
    int end = 1000;
    int countOnes = 0;
    for (int i = start; i < end; i++) {
        countOnes += new Random(i).nextInt(2);
    }
    System.out.println(countOnes);
}

If we run the program what do we get:

1000

That's quite surprising - if you'd asked me what number I would expect, I would tell you that I couldn't give you an exact number but I'd expect it to be about 500. This is based upon the idea that you're sampling from a distribution with a 50% probability of a 1, and a 50% probability of a 0. It won't come up with 500 exactly, but you'd expect with a sample size of 1000 that you'd approach a total of 500. If we change the 'end' variable to be 4000, we get:

4000

Again - the call to nextInt(2) should be returning a random number between 0 and 1, but its only returning 1. How about start = 4000 and end = 5000.

96

Ok - I've got more zeros now, but a lot less of them than I'd expect.

Why?

Well people frequently refer to computer based random number generators as pseudo-random number generators, but its rarely described what pseudo-random actually means. What we really mean in this case is that we have an algorithm that produces a deterministic sequence of numbers, but is initialised with a seed. The sequence looks random, but its just a facade. What the small program above does is simply sample the first random number out of the sequence.

This is a cheap parlour trick, but it does raise some interesting issues. java.util.Random uses a linear congruential generator in order to generate its numbers, which has the upside of being simple and fast. Unfortunately that comes at the cost that its not very random - even when the programmer uses it properly. Its not at all suitable for monte carlo simulations or crytographic algorithms.

This isn't an issue specific to Java either - early versions of python used a 1982 algorithm by Wichmann & Hill. This suffers from a short period - which in this context means that the numbers start repeating after a while. More recent versions use the Mersenne Twister as their algorithm - which has much nicer properties, but is still unsuitable for cryptographic use.

Conclusions

The thing to bear in mind in all these cases is that pseudo-random numbers aren't random numbers, and if you care about statistical randomness then you need to take care in both your algorithm selection and usage of the implementation.

Read More: Date and Time in Java 8 (2)