Computer Randomness?

MRC_Hans · Jul 27, 2005

One thing people often fail to understand is the probability thing. Partly because they try to apply it after the fact. This is the reason some numbers or sequences are considered more "random looking" than others. However, we must realize that once we have a sequence, there is NO WAY to determine whether it is random or not. Only by trying to predict the number or sequence IN ADVANCE can we hope to determine whether it is random.

An example:

A

Is that random? We cannot say.

AB

Random now? Still cannot say, but if I now say: The next letter will be 'C', and we subsequently get

ABC

THEN I can say that it is probably NOT random.

Hans

Soapy Sam · Jul 27, 2005

ReFLeX-
Gotcha. It's random number generation
rather than random number generation.

And a random number table is not a table of random numbers, but a random table of numbers.

This is probably stunningly obvious to you, but suddenly makes a lot more sense to me.

69dodge · Jul 27, 2005

A Monte Carlo algorithm takes as input a sequence of numbers ("random numbers"), and produces as output an answer to whatever question it was designed for. It doesn't know or care about how the numbers were generated, i.e., whether they were generated via a random process or not; its answer just depends on what the numbers themselves are. Some number sequences will cause it to give the right answer, or very nearly the right answer, and other number sequences will cause it to give a wrong answer.

The important point, however, is that although we generally don't know which are the good sequences and which are the bad ones, we do know that the vast majority of them are good. Therefore, generating one randomly is very likely to result in a good one. So that's what we do.

69dodge · Jul 27, 2005

Originally posted by Soapy Sam
In the OP, ReFlex uses the expression "Perfectly random".

I wonder - does that term have a rigorously defined meaning?

There seems an inherent contradiction here. If a number is perfectly random, then no number can be "more" random. So the number should be, in principle, calculable in advance- hence not random at all. Only imperfectly random numbers would be wholly unpredictable.

"Random numbers", like "infinite numbers", leave me feeling that I was off school the day they were explained.

You could have a bunch of numbers that are all "equally random."

But, actually, you're very close to the truth. One can rigorously define "random" (Google for "algorithmic information complexity"), but it turns out that, for most random numbers, it's impossible to prove that they are random.

Roughly, the idea is that a number is not random if it can be compressed, e.g., by a file compression program like WinZip. But even if WinZip can't compress it, perhaps some other program can, in which case it's still not random. So, you can see why proving that a number is random might be harder that proving that it's not random.

ReFLeX · Jul 27, 2005

69dodge said:
A Monte Carlo algorithm takes as input a sequence of numbers ("random numbers"), and produces as output an answer to whatever question it was designed for. It doesn't know or care about how the numbers were generated, i.e., whether they were generated via a random process or not; its answer just depends on what the numbers themselves are.

I'm only a third year student, and this came from my required stats class for psych. Thus I don't know a lot about this, but are you sure that applies to using the algorithm for hypothesis testing, that the numbers don't need to be random? Like in the quote above from wikipedia, it was the advent of Monte Carlo methods that led to the need for random number generation.

Some number sequences will cause it to give the right answer, or very nearly the right answer, and other number sequences will cause it to give a wrong answer.

The important point, however, is that although we generally don't know which are the good sequences and which are the bad ones, we do know that the vast majority of them are good. Therefore, generating one randomly is very likely to result in a good one. So that's what we do.

Again, I'm not sure what a "bad" sequence is. If the vast majority of sequences are desirable, then generating one in any other way is also very likely to result in a good one. When I get home I'll get out the notes and give you the explanation we got of Monte Carlo simulations.

Roughly, the idea is that a number is not random if it can be compressed...

?!? This is the same misconception as Soapy Sam's. Randomness is not a property of a number. It is a property of the method of arriving at that number. If a number exists, then there must be a way to arrive at that number randomly.

The idea · Jul 27, 2005

69dodge said:
Roughly, the idea is that a number is not random if it can be compressed, e.g., by a file compression program like WinZip. But even if WinZip can't compress it, perhaps some other program can, in which case it's still not random. So, you can see why proving that a number is random might be harder that proving that it's not random.

Are there some restrictions that a program must satisfy for it to be considered a legitimate compression program?

Suppose a program specifies that some particular big number (for example, a hundred decimal digits) is compressed to (or encoded as) some particular small number (for example, twenty decimal digits). For any particular big number and small number combination, there is such a program. Thus, it would seem that no big number is random.

Thumbo · Jul 27, 2005

The idea said:
Suppose a program specifies that some particular big number (for example, a hundred decimal digits) is compressed to (or encoded as) some particular small number (for example, twenty decimal digits). For any particular big number and small number combination, there is such a program. Thus, it would seem that no big number is random.

A non-mathematicians answer, which will no doubt get shot down in flames by the experts here...

Although the output of the program is 20 digits, to re-create the original 100 digits you need more than just those 20 digits: you also need to program itself, which can also be expressed as a string of digits of some length.

So to refine the previous explanation, the sequence of 100 digits is apparently random (or at least, not shown to be non-random) if the 20 output digits plus the digits in the string representing the decompression program itself equals more than 100.

The program itself carries information which has to be included when deciding on the compressibility or otherwise of the digit string.

69dodge · Jul 27, 2005

Originally posted by Thumbo
A non-mathematicians answer, which will no doubt get shot down in flames by the experts here...

No, your answer is exactly right.

69dodge · Jul 27, 2005

Originally posted by ReFLeX
are you sure that applies to using the algorithm for hypothesis testing, that the numbers don't need to be random?

How would a computer know whether the numbers it's given are "really random" or not? It just does whatever it does, on whatever numbers it's given. If you give it the same numbers, it follows the same steps, and comes up with the same answer, regardless of where you got the numbers from originally.

?!? This is the same misconception as Soapy Sam's. Randomness is not a property of a number. It is a property of the method of arriving at that number.

We could give a different name, like "compressibility", to the property of the number itself, to avoid confusion with the concept of a random process. But compressibility corresponds pretty well to one's intuitive idea of a random-looking number. If a number's digits contain any sort of nonrandom pattern to them, the number will be more or less compressible, depending on exactly how nonrandom the pattern is. And the output of a random process will (probably) not be compressible. So all the concepts are closely related.

ReFLeX · Jul 28, 2005

Ok, I can't find my stats binder, so I'm going to try to explain how I understand Monte Carlo simulations. It would help if you are familiar with the t-test, which is the standard statistical test to see if study results are significant. Anyway, what it tells you is how likely it is that an effect has been measured, in, say, a double-blind medication test. If the scores are 2, 2, and 2 in the control group and 3, 3, and 3 in the treated group, then it will render a high probability that there is an effect being measured. But, it has a limited power for a number of reasons I can't remember. I do know that it uses a fixed table of numbers. A better way to find out if your results are significant is the Monte Carlo method. That is when you use a computer to assign a label to each datum, (ostensibly, the label could be a number) and then randomly re-assign them all to either the treated or control group. Somehow, it takes note of the result of doing that, I'm blurry on the details. But then it repeats the re-assignment again and again and this is what renders whether the treatment has an effect or not. Right now it's all slipped away from me, but you do see why they need to be randomly assigned. With only 20 participants there are millions of possible re-assignments, which is why a computer is needed to do them.

...see what you make of that, because I believe we are talking at cross-purposes here. Sorry for the horrific explanation, I would have much rather looked it up, but maybe this way will clear up my understanding as well.

The idea · Jul 28, 2005

Thumbo said:
So to refine the previous explanation, the sequence of 100 digits is apparently random (or at least, not shown to be non-random) if the 20 output digits plus the digits in the string representing the decompression program itself equals more than 100.

Of course, the program itself has to be expressed in some language, so I have to ask the following.

Are there some restrictions that the language must satisfy for it to be considered a legitimate language?

The concern is that if the language allows various particular, complicated programs to be expressed in very concise form, then the fact that those programs can be expressed in concise form may tell us more about the language than about the programs.

69dodge · Jul 28, 2005

The Wikipedia article on algorithmic information theory addresses the issue of different languages. The complexity of a string differs only by a constant between different languages, regardless of the length of the string. So, for sufficiently long strings, which language you choose hardly matters.

Whichever language you choose, you're supposed to stick with it for all strings.

frozenfred · Jul 29, 2005

random number generation

From what I remember, "Monte Carlo" method just means simply running the experiment over and over, and tabulating the results to determine the probabilities. if i wanted to figure out the statistical results of rolling two dice and adding them up (ala Craps), i could

a) do some math (not Monte Carlo)
b) physically roll the dice 10,000,000 times and plot the results and look at it (Monte Carlo method by definition)
c) write a computer program to SIMULATE the rolling of the dice 10million times, and look at the results (also Monte Carlo)

Now, as to whether computers can REALLY generate random numbers, I'd say no. If i knew everything the computer did before the generation of the number, then i could tell you 100% of the time what the next value would be.

No matter how much info i have about physical dice, the table, wind, humidity, dust particles in the air... i would NOT be able to accuratly predict the next roll.

Now, Computers can generate SIMULATED random numbers. What this is often defined as is that given enough samples, every possible outcome is represented equally.

If i write a program that generates the numbers 1-10, as the number of trials approaches infinity, the distribution of the 10 outcomes approaches a flat line.

Note that that doesn't mean the absolute number of outcomes for each value will all aproach 1/10 of the total, or that the outcomes will all become equal. In fact, it's possible that the values become further and further apart.

Take a coin flip. if i flip it 10 times, i may end up with 7 heads and 3 tails. if i flip it 1000 times, i may end up with 527 head and 473 tails. i've gone from a difference of 4 to a difference of 54. but the percentages actually get closer to even... from 70/30 to 52.7/47.3.

as the number of trials increase, i would expect the percentages to get closer and closer to 50/50, while the absolut difference keeps getting bigger.

Floyt · Aug 1, 2005

ReFLeX said:
*snip*

A better way to find out if your results are significant is the Monte Carlo method. That is when you use a computer to assign a label to each datum, (ostensibly, the label could be a number) and then randomly re-assign them all to either the treated or control group.

*snip*

I think what you are referring to here is the process of bootstrapping. IIRC this is not a Monte Carlo approach in the strict sense, as it does not deal in probabilistic outcomes. The rationale is that if, due to a small sample size, you do not have a good idea of your data's distribution, the best indicator of that distribution is (duh) the data itself. So the algorithm re-samples the entire data set a number of times, re-distributing the data at random each time. Thus a larger data set than you actually have available is simulated, which might enable you to draw better conclusions.

pmurray · Aug 5, 2005

DrDave said:
So he got a coin and flipped it 128 times and recorded heads as 1, and tails as 0.

The real daft bit came at the end when he looked at the number and decided it still wasn't random enough, and swapped a bunch of the digits

Doing that reduces the entropy of the sequence. It reduces the set of possible sequences from "all 128-bit sequences" to "all 128-bit sequences that also ook random".

Gestahl · Aug 10, 2005

It's cute when non-computer scientists try to answer computer questions...

Most random numbers are generated these days with an entropy pool. This pool is derived from network packet timings, interrupt timings from keyboards and mice, and other non-deterministics (from the point of view of the computer). It only switches to the PRNG when there is not enough entropy data to use for generating the random numbers. In fact, the entropy pool data is used to seed the PRNG, and when there is no entropy, the previous few random numbers are used. In this way, most random number generators on a computer will pass every single statistical test for randomness you care to choose.

For even more randomness, CDs of random data are generated from nuclear decay and other *absolutely* random data. I think there is even a web service where you can do it, but I am too lazy to do a google search for you.

Neat experiment if you can find a UNIX box...

$cat /dev/urandom

You should see tons of data, and then it stops. Move your mouse, and new data can be generated. If you use /dev/random (non-garaunteed randomness), the output never stops.

Rasmus · Aug 10, 2005

Gestahl said:
I think there is even a web service where you can do it, but I am too lazy to do a google search for you.

Since I looked for this recently and know what to google for:
random.org

Rasmus.

Computer Randomness?

MRC_Hans

Penultimate Amazing

Soapy Sam

Penultimate Amazing

69dodge

Illuminator

69dodge

Illuminator

ReFLeX

Graduate Poster

The idea

Graduate Poster

Thumbo

Critical Thinker

69dodge

Illuminator

69dodge

Illuminator

ReFLeX

Graduate Poster

The idea

Graduate Poster

69dodge

Illuminator

frozenfred

New Blood

Floyt

Chordate

pmurray

Thinker

Gestahl

Muse

Rasmus

Philosopher