Random Numbers

MNZ [mnzaki at gmail.com]

Tue, 7 Aug 2007 17:48:20 +0300

Hey Gang, A question about computer "generated" random numbers, in short how random are they really?

I was trying to write this little perl script to play random songs and I noticed it tends to repeat a certain collection of songs (which changes everyday, or even every couple of hours). So what's really happening?

Thanks.

-- 
//MNZ\\

Top Back

Thomas Adam [thomas.adam22 at gmail.com]

Tue, 7 Aug 2007 16:14:16 +0100

On 07/08/07, MNZ <mnzaki@gmail.com> wrote:

> Hey Gang,
> A question about computer "generated" random numbers, in short how
> random are they really?

Because random doesn't necessarily mean unique. And since your list of songs is finite, so are the bounds of the randomness.

-- Thomas Adam

Top Back

Rick Moen [rick at linuxmafia.com]

Tue, 7 Aug 2007 10:11:19 -0700

Quoting Thomas Adam (thomas.adam22@gmail.com):

> Because random doesn't necessarily mean unique.

Reminds me: I heard a marvellous story about a college professor who would always start the semester of his statistics class with a small object lesson: Two teams of students would volunteer to write series of digits on the blackboard, with the professor absent from the room. (The two teams would pick the right or left side based on a coin toss.)

One team would write digits from a decent-quality random binary number generator. The other would write the same quantity of bits, except using they thought of personally rather than from a RNG, attempting to create a random sequence manually.

The professor was then called back into the classroom -- and (almost) unfailingly identify which was the RNG-produced series, and which was the manually created one.

How? Because practically everyone, asked to produce a random bit sequence without mechanical help, will shy away from repeating digits far more often than is likely to happen randomly. The professor thus needs only to spot which sequence looks "more random", and that's (almost always) the one that isn't very random.

The larger lesson is that, in genuinely random trials, weird coincidences, repetitions, and apparent patterns will occur. Probably. ;->

Top Back

Kapil Hari Paranjape [kapil at imsc.res.in]

Wed, 8 Aug 2007 06:09:22 +0530

Hello,

On Tue, 07 Aug 2007, Rick Moen wrote:

> The larger lesson is that, in genuinely random trials, weird
> coincidences, repetitions, and apparent patterns will occur.
> Probably.  ;->

In fact, figuring out how likely it is that you will find such "unlikely phenomena" is a win-win thing --- as S. R. S. Varadhan who wone the Abel prize this year will testify. His work on "Large deviations" is precisely the study of these "unlikely phenomena".

Regards,

Kapil. --

Top Back

René Pfeiffer [lynx at luchs.at]

Tue, 7 Aug 2007 17:24:19 +0200

Hello!

On Aug 07, 2007 at 1748 +0300, MNZ appeared and said:

> Hey Gang,
> A question about computer "generated" random numbers, in short how
> random are they really?

Most random number functions use a pseudo-random number generator (PRNG) since it is very difficult to create sufficient random number with deterministic machines. PRNGs use different algorithms. Some of them are more random than others. Most use an initial value, called the seed, that gives you a specific sequence of numbers. You can repeat this sequence by using the same seed before generating random numbers.

You could also use cryptographically secure pseudorandom number generators (CSPRNGs), which can be modified encryption algorithms for example.

Real randomness is best retrieved from physical processes such as radioactive decay or cosmis radiation. Sound cards can also be used (see http://www.mindrot.org/audio-entropyd.html for an example). The soundchip of the venerable Commodore 64 had an hardware random generator (white noise is useful for some sound effects). Some even use a lava lamp and a CCD chip as used in webcams (http://www.lavarnd.org/). A webcam inside a dark cylinder would also do, because CCD sensors show noise when exposed to "dark light").

The Linux kernel uses a mix of algorithms and hardware-induced effects for the "creation" of randomness.

http://www.random.org/ has some more insights into randomness and its uses (plus some sample of randomness gathered from athmospheric noise, they even offer subscriptions, as strange as it sounds).

> I was trying to write this little perl script to play random songs and
> I noticed it tends to repeat a certain collection of songs (which
> changes everyday, or even every couple of hours). So what's really
> happening?

I think Perl's rand() function is a wrapper for the random number function of the C library which in turn contains a simple PRNG with seed and random number sequences. The effect you have seen may be due to the PRNG, but if your number of songs is sufficiently low then you might hit the effect of discretisation. rand() returns fractional numbers. When creating an integer to address a specific song you lose some randomness by rounding the random number. It's hard to tell without seeing the code.

You could probably add a counter per song and check if the selected song was played more than $n times and prefer a song that was played less times.

Best wishes, René.

Top Back

Thomas Adam [thomas.adam22 at gmail.com]

Tue, 7 Aug 2007 16:37:42 +0100

On 07/08/07, René Pfeiffer <lynx@luchs.at> wrote:

> You could probably add a counter per song and check if the selected song
> was played more than $n times and prefer a song that was played less
> times.

"Collections". XMMS2 allows for this:

http://wiki.xmms2.xmms.se/index.php/Collections

-- Thomas Adam

Top Back

René Pfeiffer [lynx at luchs.at]

Tue, 7 Aug 2007 18:03:18 +0200

On Aug 07, 2007 at 1637 +0100, Thomas Adam appeared and said:

> On 07/08/07, René Pfeiffer <lynx@luchs.at> wrote:
> > You could probably add a counter per song and check if the selected song
> > was played more than $n times and prefer a song that was played less
> > times.
>=20
> "Collections".  XMMS2 allows for this:
>=20
> http://wiki.xmms2.xmms.se/index.php/Collections

Oh, I missed that feature apparently (but I am still "stuck" with xmms1, too lazy to upgrade).

Thanks, René.

Top Back

Kapil Hari Paranjape [kapil at imsc.res.in]

Wed, 8 Aug 2007 06:24:31 +0530

Hello,

On Tue, 07 Aug 2007, René Pfeiffer wrote:

> Real randomness is best retrieved from physical processes such as
> radioactive decay or cosmis radiation.

Actually, it turns out that "real randomness" is hard to verify. [*]

The best that one can hope for is "computationally random" which is to say that no program will be able to distinguish between the given choices and a "really random" choice in a reasonable amount of time.

This is essentially the same as Cryptographic Randomness and recent results (see the talk by Avi Widgerson at ICM, Madrid last year) show that there is strong crypto if and only if there are good random number generators.

Regards,

Kapil.

[*] This may be counting angels on the head of a pin but if something cannot be computed then it cannot be experimentally verified and hence (in the Popperian view) is not science. So one could assert that René's statement quoted above is not scientific :D . (I duck to avoid thrown shoes and tomatoes as I say this!)

Top Back

René Pfeiffer [lynx at luchs.at]

Wed, 8 Aug 2007 09:37:02 +0200

On Aug 08, 2007 at 0624 +0530, Kapil Hari Paranjape appeared and said:

> On Tue, 07 Aug 2007, René Pfeiffer wrote:
> > Real randomness is best retrieved from physical processes such as
> > radioactive decay or cosmis radiation.
>=20
> Actually, it turns out that "real randomness" is hard to verify. [*]

Using the words "real" and "reality" without quotes is asking for trouble. Of course you are perfectly "right". ;)

> The best that one can hope for is "computationally random" which is
> to say that no program will be able to distinguish between the given
> choices and a "really random" choice in a reasonable amount of time.

When attending the lectures about numerical mathematics long ago our professor greeted us with the following statement: "Forget 0. 0 is a fiction and only used in theoretical mathematics." Physicists might add that Nature doesn't use 0 either.

So I agree with you about the computational problems. Hardware RNGs are essentially analog-to-digital and have to omit information (and probably randomness) by design.

> [...]
> [*] This may be counting angels on the head of a pin but if something
> cannot be computed then it cannot be experimentally verified and
> hence (in the Popperian view) is not science. So one could assert
> that René's statement quoted above is not scientific :D .

Well, yes, but I was taking a shortcut (which is rather common when programming). ;)

Best, René, not throwing shoes or tomatoes.

Top Back

MNZ [mnzaki at gmail.com]

Wed, 8 Aug 2007 15:31:53 +0300

The collection is 300+ songs, not small enough. I count all the files in the directory then choose a number from among them randomly (using rand() and rounding), hmmm not very practical maybe. There is no actuall "pattern", but it (is alive!) tends to choose a couple of numbers then "randomly" choose from among them (ie. certain songs get repeated, then the collection changes). But I think I understand randomness now (!) so I guess checking if a song was played more than $n times is a good idea.

Anyway, this script wasn't supposed to be put to practical use. But having it on the rox context menu for directories is pretty cool for when you are not sure about what you want to listen to. Plus (command line) mplayer rules :P

-- 
//MNZ\\

Top Back

Ben Okopnik [ben at linuxgazette.net]

Wed, 8 Aug 2007 12:21:08 -0400

On Wed, Aug 08, 2007 at 03:31:53PM +0300, MNZ wrote:

> The collection is 300+ songs, not small enough. I count all the files
> in the directory then choose a number from among them randomly (using
> rand() and rounding), hmmm not very practical maybe.

It's not. You tend to lose the edge cases that way. There's a nifty little "random picker" algorithm that I got from Randal Schwartz a while ago - I don't think he invented it, but it works really well for choosing a random line from a file:

perl -we'rand($.) < 1 && ($pick = $_) while <>; print $pick' file

It's not quite as obvious as it could be, but it weights all the lines equally and works well.

> There is no
> actuall "pattern", but it (is alive!) tends to choose a couple of
> numbers then "randomly" choose from among them (ie. certain songs get
> repeated, then the collection changes). But I think I understand
> randomness now (!) so I guess checking if a song was played more than
> $n times is a good idea.

You could also use a weighted algorithm, changing the weight for a given song every time it's played. Here's a script with a sample list; it has a one-second delay between picks so that there will be a delay between songs - something that usually provided by playing a song.

#!/usr/bin/perl -w
# Created by Ben Okopnik on Tue Aug  7 22:51:38 EDT 2007
 
@songs = split /\n/, <<"Song_list";
/home/ben/Music/Vladimir Vysotsky - Humor/Mao Zedong.mp3
/home/ben/Music/Sea Shanties/Haul Away Joe Evans And Doherty.mp3
/home/ben/Music/NiN/I'm Afrraid Of Americans.mp3
/home/ben/Music/Sea Shanties/A-Rovin' Richard Burbank.mp3
/home/ben/Music/Elton John - Live in Australia/Madman Across The Water.mp3
/home/ben/Music/Great Big Sea - Road Rage/Hangin' Johnny.mp3
/home/ben/Music/ZZ Top/02. Sharp Dressed Man.mp3
/home/ben/Music/Ken Anoff - A Time To Drum/A Time To Drum.mp3
/home/ben/Music/Great Big Sea - Turn/Captain Wedderburn.mp3
Song_list
 
sub relative_weight {
	my ($weights) = shift;
	my ($total, %dist);
	$total += $_ for values %$weights;
	while (my ($key, $value) = each %$weights){
		$dist{$key} = $value / $total;
	}
	return \%dist;
}
 
sub roll_loaded_dice {
    my ($dist, $rand) = (shift, rand);
	for (keys %$dist){
		return $_ if ($rand -= $dist->{$_}) < 0;
	}
}
 
$diff = time - 1;
$list{$_} = time - $diff for 0 .. $#songs;
 
for ($c = 2 * @songs; $c--; ){
	sleep 1;
	# $dist = relative_weight(\%list);
	$index = roll_loaded_dice(relative_weight(\%list));
	# Save the new time snapshot
	$list{$index} = 1/(time - $diff);
	print "$songs[$index]\n";
}

This "remembers" when you played a given song, giving preference to the oldest ones.

> Anyway, this script wasn't supposed to be put to practical use. But
> having it on the rox context menu for directories is pretty cool for
> when you are not sure about what you want to listen to. Plus (command
> line) mplayer rules :P

Agreed! I'm definitely a fan.

-- 
* Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *

Top Back

Ben Okopnik [ben at linuxgazette.net]

Wed, 8 Aug 2007 13:09:03 -0400

On Wed, Aug 08, 2007 at 12:21:08PM -0400, Benjamin Okopnik wrote:

> 
>```
> for ($c = 2 * @songs; $c--; ){
> 	sleep 1;
> 	# $dist = relative_weight(\%list);
> 	$index = roll_loaded_dice(relative_weight(\%list));
> 	# Save the new time snapshot
> 	$list{$index} = 1/(time - $diff);
> 	print "$songs[$index]\n";
> }
> '''

Oh, whoops - that last part is a test loop (prints 2x as many random picks as there are songs in the list.) If you actually want to use it, you'll probably want to use a command-line argument to decide how many songs you want from that list, and play the songs as the script picks them:

# Command line arg or list length (validate arg as number)
$count = $ARGV[0] =~ /^\d+$/ ? $ARGV[0] : @songs;
 
while ($count--){
   $index = roll_loaded_dice(relative_weight(\%list));
   $list{$index} = 1/(time - $diff);
   # print "$songs[$index]\n";
   system '/usr/bin/mpg123', "$songs[$index]";
}

-- 
* Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *

Top Back

MNZ [mnzaki at gmail.com]

Wed, 8 Aug 2007 21:01:45 +0300

On 8/8/07, Ben Okopnik <ben@linuxgazette.net> wrote:

> On Wed, Aug 08, 2007 at 03:31:53PM +0300, MNZ wrote:
> > The collection is 300+ songs, not small enough. I count all the files
> > in the directory then choose a number from among them randomly (using
> > rand() and rounding), hmmm not very practical maybe.
>
> It's not. You tend to lose the edge cases that way.  There's a nifty
> little "random picker" algorithm that I got from Randal Schwartz a while
> ago - I don't think he invented it, but it works really well for
> choosing a random line from a file:
>
> ```
> perl -we'rand($.) < 1 && ($pick = $_) while <>; print $pick' file
> '''

> It's not quite as obvious as it could be, but it weights all the lines
> equally and works well.

:-/ I don't get it, but won't the lines at the end have a lower chance of being picked? I mean the first line has a 99% chance of getting picked if I understand rand() correctly. rand(1) has to return a fraction between 0 and 1 right? I know the other 300 songs/lines still have a chance but much slimmer.

> You could also use a weighted algorithm, changing the weight for a given
> song every time it's played. Here's a script with a sample list; it has
> a one-second delay between picks so that there will be a delay between
> songs - something that usually provided by playing a song.

[snip]

Neat. I wouldn't actually mind songs being repeated but not at this rate I'm getting.

> This "remembers" when you played a given song, giving preference to the
> oldest ones.
>
> > Anyway, this script wasn't supposed to be put to practical use. But
> > having it on the rox context menu for directories is pretty cool for
> > when you are not sure about what you want to listen to. Plus (command
> > line) mplayer rules :P
>
> Agreed! I'm definitely a fan.

Note: Sorry about the last E-mail. That was an accident.

[[[ Thanks for the heads up, MNZ. I went ahead and deleted it. - Kat ]]]

-- 
//MNZ\\

Top Back

Ben Okopnik [ben at linuxgazette.net]

Wed, 8 Aug 2007 15:04:10 -0400

On Wed, Aug 08, 2007 at 09:01:45PM +0300, MNZ wrote:

> On 8/8/07, Ben Okopnik <ben@linuxgazette.net> wrote:
> > On Wed, Aug 08, 2007 at 03:31:53PM +0300, MNZ wrote:
> > > The collection is 300+ songs, not small enough. I count all the files
> > > in the directory then choose a number from among them randomly (using
> > > rand() and rounding), hmmm not very practical maybe.
> >
> > It's not. You tend to lose the edge cases that way.  There's a nifty
> > little "random picker" algorithm that I got from Randal Schwartz a while
> > ago - I don't think he invented it, but it works really well for
> > choosing a random line from a file:
> >
> > ```
> > perl -we'rand($.) < 1 && ($pick = $_) while <>; print $pick' file
> > '''
> 
> > It's not quite as obvious as it could be, but it weights all the lines
> > equally and works well.
> 
> :-/ I don't get it, but won't the lines at the end have a lower chance
> of being picked?

That would be the "not as obvious as it could be" part. When I understood it, I was just tickled pink by the neat algorithm. Here's how it works:

1. line 1: test 'rand(1)<1' (100% chance: 'rand' always returns <1)
2. line 2: test 'rand(2)<1' (1/2 chance that line 2 will replace $pick)
3. line 3: test 'rand(3)<1' (1/3 chance that line 3 will replace $pick)
...

Pretty cool stuff.

> I mean the first line has a 99% chance of getting picked if I
> understand rand() correctly.

100%, actually. The important thing is the chance of replacement on every line; it ends up spreading out very fairly.

> rand(1) has to return a fraction between
> 0 and 1 right?

0 => x < 1

> I know the other 300 songs/lines still have a chance but much slimmer.

Spin it, say, a hundred times and look at the results.

> > You could also use a weighted algorithm, changing the weight for a given
> > song every time it's played. Here's a script with a sample list; it has
> > a one-second delay between picks so that there will be a delay between
> > songs - something that usually provided by playing a song.
> [snip]
> 
> Neat. I wouldn't actually mind songs being repeated but not at this
> rate I'm getting.

Yep. That algorithm doesn't prevent previously-played songs from being considered; it just decreases their chances of "winning" the roll.

-- 
* Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *

Top Back

MNZ [mnzaki at gmail.com]

Wed, 8 Aug 2007 22:11:59 +0300

On 8/8/07, Ben Okopnik <ben@linuxgazette.net> wrote:

> > > ```
> > > perl -we'rand($.) < 1 && ($pick = $_) while <>; print $pick' file
> > > '''
> >
> > > It's not quite as obvious as it could be, but it weights all the lines
> > > equally and works well.
> >
> > :-/ I don't get it, but won't the lines at the end have a lower chance
> > of being picked?
>
> That would be the "not as obvious as it could be" part.  When I
> understood it, I was just tickled pink by the neat algorithm. Here's
> how it works:
>
> ``
> 1. line 1: test 'rand(1)<1' (100% chance: 'rand' always returns <1)
> 2. line 2: test 'rand(2)<1' (1/2 chance that line 2 will replace $pick)
> 3. line 3: test 'rand(3)<1' (1/3 chance that line 3 will replace $pick)
> ...
> ''
>
> Pretty cool stuff.

Err but line 300 has a chance of only 1/300 of replacing $pick. Still don't quite get it, I'll eventually......

> > I mean the first line has a 99% chance of getting picked if I
> > understand rand() correctly.
>
> 100%, actually. The important thing is the chance of replacement on
> every line; it ends up spreading out very fairly.
>
> > rand(1) has to return a fraction between
> > 0 and 1 right?
>
> 0 => x < 1
>
> > I know the other 300 songs/lines still have a chance but much slimmer.
>
> Spin it, say, a hundred times and look at the results.

Already done Had to wait another minute before checking my mail.

-- 
//MNZ\\

Top Back

Ben Okopnik [ben at linuxgazette.net]

Wed, 8 Aug 2007 15:28:16 -0400

On Wed, Aug 08, 2007 at 10:11:59PM +0300, MNZ wrote:

> On 8/8/07, Ben Okopnik <ben@linuxgazette.net> wrote:
> >
> > Pretty cool stuff.
> 
> Err but line 300 has a chance of only 1/300 of replacing $pick. Still
> don't quite get
> it, I'll eventually......

Yes - but the chances for any line of both being picked and NOT being replaced are equal. That's the trick to it.

-- 
* Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *

Top Back

MNZ [mnzaki at gmail.com]

Wed, 8 Aug 2007 22:36:09 +0300

On 8/8/07, Ben Okopnik <ben@linuxgazette.net> wrote:

> On Wed, Aug 08, 2007 at 10:11:59PM +0300, MNZ wrote:
> > On 8/8/07, Ben Okopnik <ben@linuxgazette.net> wrote:
> > >
> > > Pretty cool stuff.
> >
> > Err but line 300 has a chance of only 1/300 of replacing $pick. Still
> > don't quite get
> > it, I'll eventually......
>
> Yes - but the chances for any line of both being picked and NOT being
> replaced are equal. That's the trick to it.

Oh understood finally thanks.

-- 
//MNZ\\

Top Back

Neil Youngman [neil.youngman at youngman.org.uk]

Wed, 8 Aug 2007 20:30:51 +0100

On or around Wednesday 08 August 2007 20:11, MNZ reorganised a bunch of electrons to form the message:

> Err but line 300 has a chance of only 1/300 of replacing $pick. Still
> don't quite get
> it, I'll eventually......

Given that there are 300 lines, what chance should it have?

Neil

Top Back

MNZ [mnzaki at gmail.com]

Wed, 8 Aug 2007 22:05:11 +0300

On 8/8/07, Ben Okopnik <ben@linuxgazette.net> wrote:

> On Wed, Aug 08, 2007 at 03:31:53PM +0300, MNZ wrote:
> > The collection is 300+ songs, not small enough. I count all the files
> > in the directory then choose a number from among them randomly (using
> > rand() and rounding), hmmm not very practical maybe.
>
> It's not. You tend to lose the edge cases that way.  There's a nifty
> little "random picker" algorithm that I got from Randal Schwartz a while
> ago - I don't think he invented it, but it works really well for
> choosing a random line from a file:
>
> ```
> perl -we'rand($.) < 1 && ($pick = $_) while <>; print $pick' file
> '''

#!/usr/bin/perl -w
## MNZ
 
$n = 10000;
 
for(1..$n){
	my ($pick,$i);
	## Line in Question
	rand($i) < 1 && ($pick = $i) while(++$i && $i < 6);
	$result{$pick}++ if $pick;
}
 
foreach (keys %result){print "$_: " . $result{$_}/$n*100 . "%\n";}

Just couldn't resist. And it does come out pretty even.

-- 
//MNZ\\

Top Back