Tuesday, November 25, 2008


Random is as random does

When I did that “random things meme” recently, I took issue with its misuse of “random”, preferring the word “arbitrary”. That’s because I’m a mathematician and computer scientist (though far more the latter, these days, not having done real mathematics since college), and “random” has a specific meaning in those fields.

New Scientist has an article about recent work by engineers in Japan that fits into the “random” conversation, and calls for more explanation. The engineers have used lasers to generate random numbers:

Now a new method that uses lasers to produce streams of truly random numbers faster than ever before could help improve security at a time when digital traffic and cybercrime are both growing.

Strings of random numbers are used to make secret keys and other parts of encryption protocols. But software that generates random numbers can generally only manage a close approximation to random. Statistical analysis reveals underlying if near-invisible patterns that mean an attacker could predict the sequence and break the code.

At first glance, “truly random” looks like “truly unique”, a meaningless phrase using an inappropriate qualifier. Something is unique, or it isn’t. A number sequence is random, or it isn’t. Isn’t it?

Well, no. To understand that, we have to talk for a moment about what the properties of a random-number sequence are, and how random numbers — properly called “pseudo-random numbers” — are generated by computer programs.

First: we don’t talk about “a random number” when we’re being rigorous; we talk about a number taken from a random sequence (or a pseudo-random sequence). That’s an important distinction when we consider primary properties of a (truly) random sequence:

  1. The sequence is unpredictable — knowing the sequence so far gives no clue to the next number in the sequence.
  2. The sequence is not repeatable — we can’t run the algorithm again and get the same sequence.
  3. The sequence exhibits a uniform distribution — for any position in the sequence, any number is equally likely to appear there.

Computer programs generally simulate this selection of a random sequence by using a pseudo-random-number generator (PRNG) algorithm and a random seed. The algorithm — and there are good ones out there — takes the seed and generates a sequence that

  1. is unpredictable and unrepeatable, if you don’t know the seed, and
  2. exhibits, to close examination of the results, what appears to be a uniform distribution.
Given a good algorithm, then, the key point is seeding it properly, because these algorithms also have the property that if you do know the seed, the sequence is entirely repeatable.

And since cryptographic systems use PRNGs in critical places, such as key generation, there have been many attacks on such systems by finding weaknesses in their choices of random seeds. A number representing the current time is often used as part of the seed, for example. If you know with significant accuracy when the program is run, you can reliably grab many bits of the seed. If you can get enough of the seed so that you only have, say, 10,000 seeds to try, you may be able to crack the encryption system.

Some systems, therefore, try to incorporate aspects of the outside world into their choice of a seed. When the BlackBerry Desktop Manager needs to create a new encryption key, for example, it will tell the user to move the mouse around and will use samples of the mouse position as it moves as a means to create a random seed. Such data is very hard to replicate — perhaps impossible — and will generate a very good seed.

That gets us back to the work with the lasers. Here, the combination of the signals from two lasers can be used as the external input to a seed generator. More than that, the signals can actually be used as part of the algorithm to generate the numbers — the random sequence is generated by the lasers themselves, not by an algorithm manipulating a seed.

I have not yet read the paper, so I haven’t seen the proof of the unpredictability nor of the uniform distribution. But assuming that these aspects of the laser system are true and reliable, this technique certainly could be used to build a random-number generator in hardware that would work in highly secure applications.

A final note: We don’t always want truly random numbers. There are lots of applications (especially in development and testing) for which it’s useful to be able to reproduce the sequence if we have the original random seed. As I mentioned in passing above, this laser system can still be used to create the seed for conventional PRNG algorithms, providing the robustness needed there, while also allowing us to repeat the sequence when the application calls for it.

No comments: