Sei sulla pagina 1di 30

Random Number Generation

Dr. John Mellor-Crummey

Department of Computer Science
Rice University

COMP 528 Lecture 21 5 April 2005

Topics for Today


Desired properties of a good generator
Linear congruential generators
multiplicative and mixed

Tausworthe generators
Combined generators
Seed selection
Myths about random number generation
Whats used today: MATLAB, R, Linux

Why Random Number Generation?

Simulation must generate random values for variables in a

specified random distribution
examples: normal, exponential,

How? Two steps

random number generation: generate a sequence of uniform FP
random numbers in [0,1]
random variate generation: transform a uniform random
sequence to produce a sequence with the desired distribution

How Random Number Generators Work

Most commonly use recurrence relation

x n = f (x n"1, x n"2 ,...)

recurrence is a function of last 1 (or a few numbers), e.g.


x n = (5x n"1 + 1) mod 16

For x0= 5, first 32 numbers are 10, 3, 0, 1, 6, 15, 12, 13, 2, 11, 8, 9,
14, 7, 4, 5, 10, 3, 0, 1, 6, 15, 12, 13, 2, 11, 8, 9, 14, 7, 4, 5
!xs are integers in [0,16]
dividing by 16, get random numbers in interval [0,1]

Properties of pseudo-random number sequences

from seed value, can determine entire sequence
they pass statistical tests for randomness
reproducibility (often desirable)

Random Number Sequences

Some generators do not repeat the initial part of a sequence

cycle length


Desired Properties of a Good Generator

Efficiently computable
Period should be large
dont want random numbers in a simulation to recycle

Successive values should be

uniformly distributed

Linear-Congruential Generators

1951: D.H. Lehmer found that residues of successive powers

of a number have good randomness

x n = a n mod m;

after computing x n"1, x n = ax n"1 mod m



Lehmers generator: multiplicative LCG

Modern generalization: mixed LCG

x n = (ax n"1 + b) mod m

a,b,m > 0

Result: xn are integers in [0, m-1]

Popular because
analyzed easily
certain guarantees can be made about their properties

Properties of LCGs

Choice of a, b, m affects

Observations about LCGs

period can never be more than m modulus m should be large
m = 2k yields efficient implementation by truncation
if b is non-zero, obtain period of m iff

m & b are relatively prime

every prime that is a factor of m is also a factor of a - 1
if m is a multiple of 4, a - 1 must be too
all of these conditions are met if
m = 2k, for some integer k
x n = (ax n"1 + b) mod
a = 4c + 1, for some integer c
b is an odd integer

Full-period generator = one with period m

not all are equally good
! adjacent elements = better
lower autocorrelation between

Example: Two Candidate LCGs

Which is better?

x n = ((2 34 + 1)x n"1 + 1) mod 2 35

x n = ((218 + 1)x n"1 + 1) mod 2 35

Both must be full period generators

m = 2k, for some integer k
a = 4c + 1, for some integer c
b is an odd integer

x n = (ax n"1 + b) mod m

Multiplicative LCGs

More efficient than mixed LCGs: no addition

Two classes: m = 2k, m 2k


Multiplicative LCG with m = 2k


x n = a mod 2

Most efficient LCG: mod = truncation

Not full-period: maximum possible period for m = 2k is 2k-2
only possible if multipler a = 8i3 and x0 is odd

x n = 5x n"1 mod 2 5 (lcg_m2k_good)

x n = 7x n"1 mod 2 5 (lcg_m2k_bad)

If 2k-2 period suffices, may use multiplicative LCG for efficiency


Multiplicative LCG with m 2k


x n = a mod m, m " 2

Avoid small period of LCG when m = 2k: use prime modulus

Full period generator with proper choice of a
when a is primitive root of m

i.e. an mod m 1 for n = 1, 2, , m-2

x n = 3x n"1 mod 31 (lcg_mprime_good)

x n = 5x n"1 mod 31 (lcg_mprime_bad)

Note : 5 3 mod 31 = 125 mod 31 = 1

unlike mixed LCG, xn can never be 0 when m is prime


Examining Bits of a Multiplicative LCG

decimal binary
--- ---------- ----------------1
25173 01100010 01010101
12345 00110000 00111001
54509 11010100 11101101
27825 01101100 10110001
55493 11011000 11000101
25449 01100011 01101001
13277 00110011 11011101
53857 11010010 01100001
64565 11111100 00110101
1945 00000111 10011001
6093 00010111 11001101
24849 01100001 00010001
48293 10111100 10100101
52425 11001100 11001001
61629 11110000 10111101
18625 01001000 11000001
2581 00001010 00010101
25337 01100010 11111001
11949 00101110 10101101
47473 10111001 01110001

x n = 25,173x n"1 mod 216

bit 1: always 1
bit 2: always 0
bit 3: cycle (10) of length 2
bit 4: cycle (0110) of length 4
In general:
kth bit follows cycle
of length 2k-2, k 2
Typical of multiplicative
LCG with modulus 2k

Examining Bits of a Mixed LCG

decimal binary
--- ---------- ----------------1
39022 10011000 01101110
61087 11101110 10011111
20196 01001110 11100100
45005 10101111 11001101
3882 00001111 00101010
21259 01010011 00001011
65216 11111110 11000000
19417 01001011 11011001
30502 01110111 00100110
20919 01010001 10110111
26076 01100101 11011100
16421 01000000 00100101
44130 10101100 01100010
63139 11110110 10100011
32824 10000000 00111000
14513 00111000 10110001
51934 11001010 11011110
36303 10001101 11001111
35284 10001001 11010100
8573 00100001 01111101

x n = (25,173x n"1 + 13,849)mod 216

bit 1: cycle (10) of length 2

bit 2: cycle (1100) of length 4
bit 3: cycle (11110000) of length 8

In general:
kth bit follows cycle of length 2k
Typical of mixed LCG with
modulus 2k

LCG Cautions

Properties guaranteed only if

computations are exact: no roundoff
use integer arithmetic without overflow

Low-order bits not very random, high-order bits better

if one wants k bits && k < machine word length
better to choose high-order k bits than low-order k bits.


Tausworthe Generators

Significant interest in huge random numbers

cryptographic applications want many-bit random numbers
produce k-bit numbers by
produce random sequence of bits
chunk bit stream into k-bit quantities

1965: Tausworthe generator

bn = c q"1bn"1 # c q"2bn"2 # c q"3bn"3 # ... # c 0bn"q

c i and bi are binary variables
# is the xor operation (mod 2 addition)
uses last q bits of bit stream to compute next bit
autoregressive, order q: AR(q)

AR(q) generator maximum period = 2q - 1


Tausworthe Generator Notation

Characteristic polynomial notation

characteristic polynomial
x7 + x3 +1
bn +7 " bn +3 " bn = 0, n = 0,1,2,...
bn +7 = bn +3 " bn , n = 0,1,2,...

bn = bn#4 " bn#7 ,

n = 7,8,9,...

Most polynomials for Tausworthe generators are trinomials

Period depends on characteristic polynomial
if period = 2q - 1, characteristic polynomial is primitive polynomial


Implementing Tausworthe Generators

Linear feedback shift registers

x7 + x3 +1
bn +7 " bn +3 " bn = 0, n = 0,1,2,...
bn +7 = bn +3 " bn , n = 0,1,2,...
bn = bn#4 " bn#7 ,



n = 7,8,9,...






Disadvantage of Tausworthe generators

while sequence is good overall, local behavior may not be
known to perform negatively on runs up and down test

first-order serial correlation almost 0

suspected that some polynomials may give poor high-order corr.


Generating k-bit Random Numbers

k-bit random numbers xn from binary sequence bn
Generalized feedback shift register method (Lewis & Payne 73)

x n = 0. bn bn +sbn +2s ... bn +(k"1)s

s is carefully selected delay

s k: xn and xj have no bits in common for n j

s relatively
prime to 2q - 1: guarantees full period for xn
xn can be generated very efficiently with wide-word shift and
exclusive or operations

storing an array of seed numbers
careful initialization of seed array

Extended Fibonacci Generators

Fibonacci sequence:
Fibonacci RNG:

x n = x n"1 + x n -2
x n = (x n"1 + x n -2 )mod m

not very good randomness

high serial correlation

Extended Fibonacci generator (Marsaglia 1983)

x n = (x!n"5 + x n -17 )mod2 k

state: ring buffer with 17 values


save integers in 17 values (not all integers even)

initialize j=16,k=4 cursors for buffer


x = B[j] + B[k]
B[j] = x
j = j -1 mod 17; k = k -1 mod 17
return x

passes most statistical tests
period = 2k(217-1) (much longer than LCGs)


Some Combined Generators

Can combine 2 or more generators to produce a better one

Adding random numbers from 2 or more generators

if xn and yn are random sequences in [0,m-1], then

wn= (xn + yn) mod m

can be used as a random number

why do this?

can increase period and randomness if two generators have different periods

Exclusive-or random numbers from 2 or more generators

Santha & Vazirani (1984)

xor of 2 random n-bit streams generates a more random sequence

use sequence a to pick which recent element in sequence b to return
Marsaglia & Bray (1964)

keep 100 items of sequence b

use sequence a to select which to return next and replace

claim: better k distributivity than LFSR methods

problem: not easy to skip long sequence for multi-stream simulations

Seed Selection Issues

Wrong combination of seed and RNG can hurt

especially if RNG is flawed
e.g. seed might be RNG fixed point

one stream needed
if RNG has full period, then any seed as good as another

multiple streams needed

e.g. queue simulation requires
interarrival time stream
service time stream
requires special care!


Seed Selection Guidelines I

Dont use 0
multiplicative LCGs and Tausworthe generators would stick at 0

Avoid even values

seed should be odd for multiplicative LCG with m = 2k
for full period generators, all non-zero values equally good

Dont subdivide one stream

dont use a single stream for all random variables

might be a strong correlation between items in same stream

Use non-overlapping streams

each stream requires separate seed
dont use same seed for 2 or more streams!

if seeds are bad, streams will overlap and not be independent

right way: select seeds so streams dont overlap at all
example: need 3 streams of 20,000 numbers
pick u0 as seed for first stream
pick u20,000 as seed for second stream
pick u40,000 as seed for third stream


Seed Selection Guidelines II

Reuse seeds in successive replications

if simulation experiment is replicated several times
can use seeds from end of previous replication in next one

Dont use random seeds

simulation cant be reproduced
impossible to guarantee multiple streams wont overlap


Myths I

A complex set of operations leads to random results

complicated code random sequence of numbers that will pass
tests of uniformity and independence

A single test of goodness suffices

sequence 0, 1, , m-1
not random but passes chi-square test
will fail run test

use as many tests as possible

Pseudo-random numbers are unpredictable

e.g. can identify LCG parameters with a few numbers and predict
LCG unsuitable for cryptographic applications where
unpredictability is desired

Some seeds are better than others

e.g. odd vs. even, avoid particular seeds, etc.
x n = (9806x n"1 + 1)mod(217 "1)
37,911 is a fixed point!
may be true for some generators, but these should be avoided!
any non-zero seed should produce equally valid results

Myths II

Accurate implementation is not important

period and randomness are guaranteed only if formula is
implemented without overflow or truncation

overflows and truncations can

change the path of a generator
reduce the period

Bits of successive words are equally-randomly distributed

if an algorithm produces a k-bit wide number, randomness is
only guaranteed when all k bits are used
unless specified otherwise, assume any particular bit position
(or sequence thereof) will not be equally random


Whats Used Today: MATLAB

rand function
lagged Fibonacci generator
cache of 32 floating point numbers
combined with a shift register random integer generator
core: j ^= (j<<13); j ^= (j>>17); j ^= (j<<5)

period: > 21492
fairly sure all FP numbers in [e/2,1-e/2] are generated
e = 2-52


Whats Used Today: R

Mersenne-Twister (Matsumoto and Nishimura,1998) [default]

twisted GFSR based on Mersenne primes
seed: 623-dimensional set of 32-bit integers + a cursor
period: 219937 - 1
equi-distribution in 623 consecutive dimensions (whole period)
[note: variant of MT for independent parallel streams exists too]

Knuth-TAOCP (Knuth, 1997)

GFSR using lagged Fibonacci sequences with subtraction

X[j] = (X[j-100] - X[j-37]) mod 230

seed: the set of the 100 last numbers + cyclic shift of buffer
period: about 2^129.

initialization of GFSR from seed was altered

Whats Used Today: R (continued)

seed: integer vector of length 3
seed[i] is in 1:(p[i] - 1)
p is the length 3 vector of primes, p = (30269, 30307, 30323)

cycle length: 6.9536e12 = prod(p-1)/4

reference: Applied Statistics (1984) 33, 123

Marsaglia-Multicarry multiply-with-carry RNG (Marsaglia)

seed: two integers, all values allowed
period: > 260
has passed all tests (according to Marsaglia)

Super-Duper (Marsaglia)
doesnt pass the MTUPLE test of the Diehard battery
period: about 4.6*10^18 for most initial seeds
seed: 2 integers (first: all values allowed; second: odd value).
default seeds are the Tausworthe and congruence long integers


Whats Used Today: Linux

random function
non-linear additive feedback-based generator
state: 8, 32, 64, 128, or 256 bytes
all bits considered random

rand function
bottom 12 bits go through cyclic pattern
higher-order bits more random


Potrebbero piacerti anche