• Quick note - the problem with Youtube videos not embedding on the forum appears to have been fixed, thanks to ZiprHead. If you do still see problems let me know.

pseudo-random number generators

wjousts

Critical Thinker
Joined
Mar 15, 2004
Messages
324
This is a follow up to the "Ian's Question for Dummies" thread. Yahweh and I both created our own programs to generate random numbers and both noticed that repeative numbers such as 1111 seem to occurs less often than non-repeative numbers (e.g. 5261). I believe this is probably an artifact of the way the computer generates pseudo-random numbers and it reminded me that I really have no idea how the computer generates these numbers.
What I do know is that (usually) you seed the random number generator and it will then generate a sequence of numbers, but it will always produce the same sequence of numbers for a given seed. IIRC, it is also the case that the next number generated will depend on the last number generated. As I understand it the random number generator takes the value from the seed plus the last number generated (I think) and produces the next random number by using some obscure bit of mathematical trickery. But I have no idea what the trickery is?
The numbers generated are considered pseudo-random because if enough trials are run you should get approximately a statistical distribution of all possible numbers. For example, if you use the random number generator to generate a number from 0 to 9 inclusive (i.e. 0,1,2,3,4,5,6,7,8 or 9 - 10 possible values) and you run it 10 million times, you should get roughly a million 0s, a million 1s, a million 2s and so on. I believe this is the objective when designing a random number generator. But, it is possible that if you consider two consecutive digits generated which would have 100 possible values (e.g. 00, 01, 02, ... 99) that you might find the distribution is skewed away from what would be expected based on statistics? I.e. some values, say 37, might have slightly more than the expected 100,000 occurances and others, say 77 have slightly less than 100,000? Does this skewing (if it even exists) get worse with longer sequences of random numbers, e.g. 3 consecutive numbers? 4? 100?
Can anybody elighten me on this?
 
wjousts said:
This is a follow up to the "Ian's Question for Dummies" thread. Yahweh and I both created our own programs to generate random numbers and both noticed that repeative numbers such as 1111 seem to occurs less often than non-repeative numbers (e.g. 5261). I believe this is probably an artifact of the way the computer generates pseudo-random numbers and it reminded me that I really have no idea how the computer generates these numbers.
How did you 'notice' this? After all, the collection of non repeative sequences is much larger than repeatitive sequences, thus your observation is to be expected given a uniform, random distribution.

Any industrial strength random number generator is tested for artifacts such as this.

Simpler algorithms are subjected to analytical analysis to determine their strength. More complicated algorithms will not yield to analytical methods, and mathematicians rely on statistical methods to test them (generate a lot of numbers, test for uniform, random distribution).

In short, a good algorithm will not exhibit the behavior you are describing.

Here's a good link to get you started
 
Re: Re: pseudo-random number generators

roger said:
How did you 'notice' this? After all, the collection of non repeative sequences is much larger than repeatitive sequences, thus your observation is to be expected given a uniform, random distribution.

I absolutely agree that I really have no basis for saying that what I was observing was anything other than mere chance. For the trials I did I found that search for sequences such as 999 was taking longer than say 425. It suggests that there might be a bais for certain sequences but it doesn't follow that it is repeative sequences that are being baised against. It would require many more trials to conclude that. But I was wondering if such effects might exist.
I'll check out your link. Thanks.
 
Feel free to post your code. You may have a bug, and a second pair of eyes always helps.
 
wjousts said:
This is a follow up to the "Ian's Question for Dummies" thread. Yahweh and I both created our own programs to generate random numbers and both noticed that repeative numbers such as 1111 seem to occurs less often than non-repeative numbers (e.g. 5261). I believe this is probably an artifact of the way the computer generates pseudo-random numbers and it reminded me that I really have no idea how the computer generates these numbers.
What I do know is that (usually) you seed the random number generator and it will then generate a sequence of numbers, but it will always produce the same sequence of numbers for a given seed. IIRC, it is also the case that the next number generated will depend on the last number generated. As I understand it the random number generator takes the value from the seed plus the last number generated (I think) and produces the next random number by using some obscure bit of mathematical trickery. But I have no idea what the trickery is?
The numbers generated are considered pseudo-random because if enough trials are run you should get approximately a statistical distribution of all possible numbers. For example, if you use the random number generator to generate a number from 0 to 9 inclusive (i.e. 0,1,2,3,4,5,6,7,8 or 9 - 10 possible values) and you run it 10 million times, you should get roughly a million 0s, a million 1s, a million 2s and so on. I believe this is the objective when designing a random number generator. But, it is possible that if you consider two consecutive digits generated which would have 100 possible values (e.g. 00, 01, 02, ... 99) that you might find the distribution is skewed away from what would be expected based on statistics? I.e. some values, say 37, might have slightly more than the expected 100,000 occurances and others, say 77 have slightly less than 100,000? Does this skewing (if it even exists) get worse with longer sequences of random numbers, e.g. 3 consecutive numbers? 4? 100?
Can anybody elighten me on this?
Sure can, but how deep do you want to go?

PRNGs all use different algorithms, and all have different repeat lengths (The number of digits in the generated sequence before the sequence repeats itself.) They generate a uniform distribution on [0,1], which can then be run through other algorithms to generate binomial, poisson, normal, etc., distributions.

NIST has a suite of tests that can be used to qualify PRNGs. The quality of PRNGs delivered with various compilers and packages is not the same. Some fail various NIST tests. Depending on how you need to use them, that may put them out of the running. A PRNG perfectly fine to simulate a poker game doesn't necessarily cut it in a "Monte Carlo" simulation of radiation.

If your observation is real, then something seems wrong with your PRNG.
 
There are good pseudo-randome number generators, and bad ones. You can try this one if you want.

Just remove the txt extension from the attached file, and include it in any C++ project that you want to use it with.

There is documentation for the classes here.

I strongly doubt that you will find any statistically significant artifacts.

Dr. Stupid
 
wjousts said:
For example, if you use the random number generator to generate a number from 0 to 9 inclusive (i.e. 0,1,2,3,4,5,6,7,8 or 9 - 10 possible values) and you run it 10 million times, you should get roughly a million 0s, a million 1s, a million 2s and so on. I believe this is the objective when designing a random number generator.
Note that the requirement for a Gaussian distribution is just one of the requirements (and the most basic) that must be met by a good random number generator. A thorough description of PRNG design is covered in the classic Cambridge Press text "Numerical Recipes in C", now available online at Cornell University's web site here. If you skip down to Chapter 7 you'll find a comprehensive treatment of the subject including source code for top notch PRNGs as well as some famous random number functions that came packaged in the Standard C libraries for (from memory IBM mainframes) that had fatal flaws.
 
wjousts said:

The numbers generated are considered pseudo-random because if enough trials are run you should get approximately a statistical distribution of all possible numbers. For example, if you use the random number generator to generate a number from 0 to 9 inclusive (i.e. 0,1,2,3,4,5,6,7,8 or 9 - 10 possible values) and you run it 10 million times, you should get roughly a million 0s, a million 1s, a million 2s and so on. I believe this is the objective when designing a random number generator. But, it is possible that if you consider two consecutive digits generated which would have 100 possible values (e.g. 00, 01, 02, ... 99) that you might find the distribution is skewed away from what would be expected based on statistics? I.e. some values, say 37, might have slightly more than the expected 100,000 occurances and others, say 77 have slightly less than 100,000? Does this skewing (if it even exists) get worse with longer sequences of random numbers, e.g. 3 consecutive numbers? 4? 100?
Can anybody elighten me on this?

Hi wjousts,

Great questions.

Let's say that we are going to generate N whole numbers from the range 1 to R, and that each number has probability 1/R of being generated.

Let x_i be an indicator variable that is 1 when the number x occurs (where x is a fixed number, like 37 or 999), and 0 if another number occurs, for each of the N generated numbers. As N increases, we'd expect:

sum(x_i)/N = 1/R, or

sum(x_i) = N/R.

You are interested in the probability of the event sum(x_i) lt N/R and the probability of the event sum(x_i) gt N/R (where lt and gt stand for less than and greater than- I was having a hard time getting the less than and greater than symbols to display correctly). I believe you are also wondering what happens as N and R vary.

The x_i is what is called a 'Bernoulli' random variable; 1 or 0, success or failure, occur or not occur. The probability of "success", the number x being generated, is 1/R and the probability of "failure" is 1-(1/R).

The sum(x_i) is what is called a 'Binomial' random variable. The probability of x being generated say r times out of N total generations is given by:

P(x=r) = [N!/(N-r!)r!]*(1/R)^r*[1-(1/R)]^(N-r)

You are interested in evaluating P(x lt N/R) and P(x gt N/R).

In your first example, N = 10,000,000 and R = 10. N/R is therefore 1,000,000.
P(x lt 1,000,000) = sum[P(x=r), from r=0 to r=999,999] = a, and
P(x gt 1,000,000) = sum[P(x=r), from r=1,000,001 to r=10,000,000] = b.

With the calculator I'm using, it is choking because of the very small numbers involved, so I just get an error when I try to obtain a and b. :( I should use a computer program (such as S+ or R) or use a clever alternative to the binomial distribution to eliminate "choking" errors, but it is late here.

I'll instead show some calculations.

N = 10 and R = 10:
P(x lt 1) = .3486
P(x gt 1) = .2639

N = 50 and R = 10:
P(x lt 5) = .4311
P(x gt 5) = .3838

N = 100 and R = 10:
P(x lt 10) = .4512
P(x gt 10) = .4168

N = 200 and R = 10:
P(x lt 20) = .4655
P(x gt 20) = .4408

N = 500 and R = 10:
P(x lt 50) = .4781
P(x gt 50) = .4624

N = 1000 and R = 10:
P(x lt 100) = .4845
P(x gt 100) = .4734

It seems that P(x lt N/R) gt P(x gt N/R), but also it seems that as N increases the "skewness" becomes negligible
(why? because the binomial approaches a normal "bell curve" distribution where the mean is equal to the median is equal to N/R, where 50% of the values lie below it, and 50% of the values lie above it. This can be seen at http://bcs.whfreeman.com/ips4e/cat_010/applets/CLT-Binomial.html. Slide the p slider to 1/10 = .1 and watch what happens as you slide the other slider.)

I hope that helps a bit. I'd probably do some simulation with various N and R and see how close the observations agree with what theory predicts.
 
Thanks for all the replies. I've figure out what was going on in my program (see the "Ian's Question for Dummies" thread and thanks to 69dodge) and what I was observing was exactly what should be expected. No artifacts are present and everything is right with the universe.
Nevertheless, I've learnt a lot from this thread about PRNGs and I'm going to keep this stored somewhere so if I ever need an industrial strength PRNG I'll know where to look.
 
The Mersenne Twister is an algorithm for generating random numbers. It was designed with consideration of the flaws in various other generators. The period, 219937-1, and the order of equidistribution, 623 dimensions, are far greater. The generator is also fast; it avoids multiplication and division, and it benefits from caches and pipelines. For more information see the inventors' web page at
http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/emt.html

more specifically:
http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/ARTICLES/mt.pdf


Luceiia
 
Luceiia,

I believe you meant a period of 2<sup>19937</sup>-1. Just thought I'd point it out so others don't go through the same brief mental confusion I did.

Walt
 
Walter Wayne said:
Luceiia,

I believe you meant a period of 2<sup>19937</sup>-1. Just thought I'd point it out so others don't go through the same brief mental confusion I did.

Walt

Yep yep, thanks!


Luceiia
 

Back
Top Bottom