Distinguishing Features of Simulation Time (CLK) DYNAMIC focused on this aspect during the modeling section of the course Pseudorandom variables (RND) STOCHASTIC will focus on this aspect in coming weeks
(Pseudo) Random Number Generation Properties of pseudo-random numbers Continuous numbers between 0 and 1 Probability of selecting a number in interval (a,b) ~ (b-a) – i.e. Uniformly distributed Numbers are statistically independent Can’t really generate random numbers ∞ information – finite algorithm or table Example: XL spreadsheet function =RAND() Also, want fast and repeatable...
Random Number Generation How to generate random numbers –Table look-up –Computer generation: these values cannot be truly random and a computer cannot express a number to an infinite number of decimal places Pseudorandom numbers
Random Number Generation Random number seed: Virtually all computer methods of random number generation start with an initial random number seed. This seed is used to generate the next random number and then is transformed into a new seed value.
Random Generators Reasons for pseudorandom numbers: –Flexible policies –Lack of knowledge Generate stochastic processes Decision making (random decision) Numerical analysis (numerical integration) Monte Carlo integration
Desirable Properties of Random Number Generators Fast Should not require much memory Long cycle or period Should support multiple streams Sequence should be replicable –Debugging –Compare various scenarios under similar conditions Numbers should come close to: –Uniformity (or known distribution) –Independence
Historical Generator Midsquare method: 1.Start with an initial seed (e.g. a 4-digit integer). 2.Square the number. 3.Take the middle 4 digits. 4.This value becomes the new seed. Divide the number by 10,000. This becomes the random number. Go to 2.
Midsquare Method, example x 0 = 5497 x 1 : = x 1 = 2170, R 1 = x 2 : = x 2 = 7089, R 2 = x 3 : = x 3 = 2539, R 3 = Drawback: Hard to state conditions for picking initial seed that will generate a “good” sequence.
Midsquare Generator, examples “Bad” sequences: x 0 = 5197 x 1 : = x 1 = 0088, R 1 = x 2 : = x 2 = 0077, R 2 = x 3 : = x 3 = 0059, R 3 = x i = 6500 x i+1 : = x i+1 =2500, R i+1 = x i+2 : = x i+2 =2500, R i+1 =
Linear Congruential Generator (LCG) Generator Start with random seed Z 0 < m = largest possible integer on machine Recursively generate integers between 0 and M Z i = (a Z i-1 + c) mod m Use U = Z/m for pseudo-random number get (avoid 0 and 1) When c = 0 Called Multiplicative Congruential Generator When c > 0 Mixed LCG
Linear Congruential Generator (LCG) (Lehmer 1951) Let Z i be the i th number (integer) in the sequence Z i = (aZ i-1 +c)mod(m) Z i {0,1,2,…,m-1} where Z 0 = seed a = multiplier c = increment m = modulus DefineU i = Z i /m (to obtain U(0,1) value)
LCG, example 16-bit machine a = 1217c = 0 Z 0 = 23m = = Z 1 = (1217*23) mod = U 1 = 27991/32767 = Z 2 = (1217*27991) mod = U 2 = 20134/32767 =
An LCG can be expressed as a function of the seed Z 0 THEOREM: Z i = [a i Z 0 +c(a i -1)/(a-1)] mod(m) Proof:By induction on i i=0Z 0 = [a 0 Z 0 +c(a 0 -1)/(a-1)] mod(m) Assume for i. Show that expression holds for i+1 Z i+1 = [aZ i +c] mod(m) = [a {[a i Z 0 +c(a i -1)/(a-1)] mod(m)} +c] mod(m) = [a i+1 Z 0 +ac(a i -1)/(a-1) +c] mod(m) = [a i+1 Z 0 +c(a i+1 -1)/(a-1) ] mod(m)
Examples: Z i = (69069Z i-1 +1) mod 2 32 U i = Z i /2 32 Z i = (65539Z i-1 +76) mod 2 31 U i = Z i /2 31 Z i = ( Z i-1 ) mod ( )U i = Z i /2 31 Z i = Z i-1 mod 2 59 U i = Z i /2 59 What makes one LCG better than another?
A full period (full cycle) LCG generates all m values before it cycles. Consider Z i = (3Z i-1 +2) mod(9) with Z 0 =7 Then Z 1 = 5 Z 2 = 8 Z 3 = 8 Z j = 8 j = 3,4,5,6,… On the other hand Z i = (4Z i-1 +2) mod(9) has full period. Why?
Random Number Generation Mixed congruential generator is full period if 1.m = 2 B (B is often # bits in word) fast 2.c and m relatively prime (g.c.d. = 1) 3.If 4 divides m, then 4 divides a – 1 (e.g., a = 1, 5, 9, 13,…)
The period of an LCG is m (full period or full cycle) if and only if —If q is a prime that divides m, then q divides a-1 —The only positive integer that divides both m and c is 1 —If 4 divides m, then 4 divides a-1. Examples Z i+1 = (16807Z i +3) mod (451605), where =7 5, =(2)(3)(2801), =(3)(5)(7)(11)(17)(23) This LCG does not satisfy the first two conditions. Z i+1 = (16807Z i +5) mod ( ) where =7 5, = (2)(3)(2801), = (3 4 )( ) This LCG satisfies all three conditions.
- m = 2 B where B = # bits in the machine is often a good choice to maximize the period. - If c = 0, we have a power residue or multiplicative generator. Note that Z n = (aZ n-1 ) mod(m) Z n = (a n Z 0 ) mod(m). If m = 2 B, where B = # bits in the machine, the longest period is m/4 (best one can do) if and only if —Z 0 is odd —a = 8k+ 3, k Z + (5,11,13,19,21,27,…)
Random Number Generation Other kinds of generators Quadratic Congruential Generator –S new = (a 1 S old 2 + a 2 S old 2 + b) mod L Combination of Generators –Shuffling – L’Ecuyer – Wichman/Hill Tausworthe Generator –Generates sequence of random bits
Feedback Shift Generators Tausworthe, Math of Computing 1965 If {a k } is a sequence of binary digits (0 or 1) defined by a k = (c 1 a k-1 + c 2 a k-2 + … + c p a k-p )mod 2 and the c’s are relatively prime, then {a k } has period 2 p -1
IBM - Randu If c = 0 power residue generator (multiplicative generator) u n = a n u 0 mod m u n = a u n-1 mod m (homework)
NOTES —Never “invent” your own LCG. It will probably not be “good.” —All simulation languages and many software packages have their own PRN generator. Most use some variation of a linear congruential generator. —Power residue generators are the most common.
Tests of RNG, cont’d Theoretical tests –Prove sample moments over entire cycle are correct –Lattice structure of LCGs “random numbers fall mainly in the planes” (Marsaglia) Spacing hyperplanes: the smaller, the better
Tests of Random Number Generators Empirical tests –Uniformity Compute sample moments Goodness of fit –Independence Gap Test Runs Test Poker Test Spectral Test Autocorrelation Test
Testing Random Number Generators Desirable Properties: Mean and Variance Theorem: E 1/2 and V 1/12 as m + . Proof: For a full period LCG, every integer value from 0 to m-1 is represented. Therefore E = (0+1+…+(m-1))/m 2 = ((m-1)(m)/2)/m 2 = (1/2)-(1/2m) V = (( …+(m-1) 2 )/m 3 ) - E 2 = [(m)(m-1)(2m-1)/6]/m 3 - [(1/2) - (1/2m)] 2 = [(1/12) - (1/12m 2 )]
Uniformity 2 Goodness of Fit Test —Divide n observations into k (equal) intervals —Do a frequency count f i, i=1,2,…,k —Compute X 2 = i (f i -n/k) 2 / (n/k) = i (f i -np i ) 2 / np i, where p i = 1/k, i=1,2,…,k.
f1f1 f2f2 f k-1 fkfk 0 e1e1 e2e2 e k-1 ekek e i = expected number of observations in interval i = n p i = n / k, i = 1, 2, …, k Data Classification 1
NOTE — (f i -np i )/(np i ) 1/2 is the N(0,1) approximation of a multinomial distribution for p i small, where E[f i ] = np i and Var [f i ] = np i (1-p i )). —For n large, X 2 is distributed 2 with k-1 degrees of freedom —Reject randomness assumption X 2 > 2 NOTE: if X 2 is too close to zero, it may be because the numbers have been “fudged.” BE WARY OF PRN WHICH LOOK TOO RANDOM
Do Not Reject H O 2 Goodness of Fit Test -Repeat test m times with independent samples of size n -If H 0 is true, test will reject H 0 m times (on average)
Trouble Spots —Choosing the intervals evenly —Choosing the intervals such that you would expect each class to contain at least 5 or 10 observations —p i should (ideally) be small (<.05)
Example n = 1000 [0,.1)f i = 87[.1,.2)f i = 93[.2,.3)f i = 113 [.3,.4)f i = 106[.4,.5)f i = 108[.5,.6)f i = 99 [.6,.7)f i = 91[.7,.8)f i = 95[.8,.9)f i = 103 [.9, 1.]f i = 105 X 2 = 628/100 = 6.28 Do not reject H 0 : U(0,1).
NOTE —The 2 goodness of fit test is also used to fit distributions to data, where X 2 = i (f i -e i ) 2 / e i e i = expected number of observations in interval i.
Kolmogorov-Smirnov Goodness-of-fit Test —Order n U[0,1] variates {x [i] } —Construct an empirical CDF for the n variates {x [i] } (i.e., F(x [i] ) = i/ni = 1,2,…,n) —Construct a hypothesized CDF for n uniform variates (i.e., = x, 0 x 1) —Compute D = max {D +, D - }, where D + = Max 1<i<n [(i/n)- D - = Max 1<i<n [ -((i-1)/n)]. —Check tables Reject if D is too large, with a risk , which means that we reject (uniformity) falsely with probability .
D + = max {.15,.30,.45,.10}=.45 D - = max {.10, -.05, -.20,.15}=.15 D+D+ D-D-
Examples —If {U i } = {.1,.2,.3,.9}, then D =.45. —If {U i } = {.2,.6,.8,.9}, then D =.35. —If {U i } = {.25,.5,.75, 1.}, then D =.25. NOTE: The minimum value that D can take on is 1/2n. (How?)
Independence —Sign Test * Test Statistic: S = runs of numbers above or below median) * For large N, S is distributed N( = 1+(N/2), 2 = N/2) Example N = 15, S = 7, distributed N( = 8.5, 2 = 15/2) Maximum value for S: N(negative dependency) Minimum value for S: 1(positive dependency)
Normal Curve Rejection Regions REJECT (+ve) REJECT (-ve) Do Not REJECT Reject H 0 in favor of H A if Z = (S - (1+(N/2))) / (N/2) 1/2 Z /2 or Z Z /2 Z /2 -Z /2 H 0 : Independence H A : Dependence
—Runs Up and Down Test (runs of increasing and decreasing numbers) Assign + if x i x i+1 Test Statistic: S = number of runs up AND down (sequence of + and -) E(S) = (2N-1)/3, V(S) = (16N-29)/90 Use Normal approximation for N>30. Example: N = 15, S = 8, distributed N(µ = 29/3, 2 = 211/90) Maximum value for S: N-1(negative dependency) Minimum value for S: 1?
Normal Curve Rejection Regions REJECT (-ve)REJECT Do Not REJECT H 0 : Independence H A : Dependence Z /2 -Z /2 Reject H 0 in favor of H A if Z = (S - (2N-1)/3) / (16N-29/90) 1/2 Z /2 or Z Z /2
Test of Cycling Floyd’s Test for Cycling Assume u i = G(u i-1 ) x 0 = y 0 = seeds x i = G(x i-1 )y i = G(G(y i-1 )), i.e. skip every other one so y will go twice as fast as x. Then check to see if there is some value of n for which x n = y n. If x n = y n, cycling occurred.
Marsaglia’s Theorem All N-tuples generated by a congruential generator will fall in fewer than (N!m) 1/N hyperplanes. (Proc. Nat. Acad. Sci. 61, 1968 pp.25-28) e.g. all 10-tuples fall in fewer than 13 9-dimensional planes for m = Randu in ONLY 15 PLANES in 3D cube. (Solution: Make m bigger – limited by computer word size.)
Plot of RND i+1 vs RND i using LCG in SIGMA