Fall 2011 CSC 446/546 Part 6: Random Number Generation.

Slides:



Advertisements
Similar presentations
1 Chi-Square Test -- X 2 Test of Goodness of Fit.
Advertisements

Generating Random Numbers
Random Number Generation. Random Number Generators Without random numbers, we cannot do Stochastic Simulation Most computer languages have a subroutine,
CHAPTER 2 Building Empirical Model. Basic Statistical Concepts Consider this situation: The tension bond strength of portland cement mortar is an important.
Random Numbers. Two Types of Random Numbers 1.True random numbers: True random numbers are generated in non- deterministic ways. They are not predictable.
Random Number Generators. Why do we need random variables? random components in simulation → need for a method which generates numbers that are random.
Introduction to Hypothesis Testing
Simulation Modeling and Analysis
Agenda Purpose Prerequisite Inverse-transform technique
1/55 EF 507 QUANTITATIVE METHODS FOR ECONOMICS AND FINANCE FALL 2008 Chapter 10 Hypothesis Testing.
Topic 2: Statistical Concepts and Market Returns
Random Number Generation
Distinguishing Features of Simulation Time (CLK)  DYNAMIC focused on this aspect during the modeling section of the course Pseudorandom variables (RND)
Statistical Background
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Statistics for Business and Economics 7 th Edition Chapter 9 Hypothesis Testing: Single.
Experimental Evaluation
Inferences About Process Quality
1 Simulation Modeling and Analysis Pseudo-Random Numbers.
Properties of Random Numbers
Random-Number Generation. 2 Properties of Random Numbers Random Number, R i, must be independently drawn from a uniform distribution with pdf: Two important.
Random Number Generation Pseudo-random number Generating Discrete R.V. Generating Continuous R.V.
ETM 607 – Random Number and Random Variates
Fundamentals of Hypothesis Testing: One-Sample Tests
Chapter 5 Sampling and Statistics Math 6203 Fall 2009 Instructor: Ayona Chatterjee.
Chapter 3 – Descriptive Statistics
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Random-Number Generation Andy Wang CIS Computer Systems Performance Analysis.
1 Statistical Distribution Fitting Dr. Jason Merrick.
CPSC 531: RN Generation1 CPSC 531:Random-Number Generation Instructor: Anirban Mahanti Office: ICT Class Location:
Chapter 7 Random-Number Generation
CS 450 – Modeling and Simulation Dr. X. Topics What Does Randomness Mean? Randomness in games Generating Random Values Random events in real life: measuring.
Tests for Random Numbers Dr. Akram Ibrahim Aly Lecture (9)
CS433: Modeling and Simulation Dr. Anis Koubâa Al-Imam Mohammad bin Saud University 15 October 2010 Lecture 05: Statistical Analysis Tools.
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
Module 1: Statistical Issues in Micro simulation Paul Sousa.
Basic Concepts in Number Theory Background for Random Number Generation 1.For any pair of integers n and m, m  0, there exists a unique pair of integers.
Modeling and Simulation Random Number Generators
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
MEGN 537 – Probabilistic Biomechanics Ch.5 – Determining Distributions and Parameters from Observed Data Anthony J Petrella, PhD.
Monte Carlo Methods So far we have discussed Monte Carlo methods based on a uniform distribution of random numbers on the interval [0,1] p(x) = 1 0  x.
Lecture 9 Chap 9-1 Chapter 2b Fundamentals of Hypothesis Testing: One-Sample Tests.
Tests of Random Number Generators
Physics 270 – Experimental Physics. Let say we are given a functional relationship between several measured variables Q(x, y, …) x ±  x and x ±  y What.
Chapter 2 Statistical Background. 2.3 Random Variables and Probability Distributions A variable X is said to be a random variable (rv) if for every real.
Chap 8-1 Fundamentals of Hypothesis Testing: One-Sample Tests.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
Testing Random-Number Generators Andy Wang CIS Computer Systems Performance Analysis.
© Copyright McGraw-Hill 2004
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Copyright © Cengage Learning. All rights reserved. 9 Inferences Based on Two Samples.
R ANDOM N UMBER G ENERATORS Modeling and Simulation CS
Chapter 14 – 1 Chi-Square Chi-Square as a Statistical Test Statistical Independence Hypothesis Testing with Chi-Square The Assumptions Stating the Research.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
0 Simulation Modeling and Analysis: Input Analysis 7 Random Numbers Ref: Law & Kelton, Chapter 7.
MONTE CARLO METHOD DISCRETE SIMULATION RANDOM NUMBER GENERATION Chapter 3 : Random Number Generation.
MEGN 537 – Probabilistic Biomechanics Ch.5 – Determining Distributions and Parameters from Observed Data Anthony J Petrella, PhD.
Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.
1.  How does the computer generate observations from various distributions specified after input analysis?  There are two main components to the generation.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
CHI SQUARE DISTRIBUTION. The Chi-Square (  2 ) Distribution The chi-square distribution is the probability distribution of the sum of several independent,
Random Number Generators
Hypothesis Theory examples.
Chapter 7 Random Number Generation
Chapter 7 Random-Number Generation
Properties of Random Numbers
Discrete Event Simulation - 4
Chapter 8 Random-Variate Generation
Hypothesis Tests for a Standard Deviation
Computer Simulation Techniques Generating Pseudo-Random Numbers
Random Number Generation
Presentation transcript:

Fall 2011 CSC 446/546 Part 6: Random Number Generation

Fall 2011 CSC 446/546 Agenda 1.Properties of Random Numbers 2.Generation of Pseudo-Random Numbers (PRN) 3.Techniques for Generating Random Numbers 4.Tests for Random Numbers 5.Caution

Fall 2011 CSC 446/ Properties of Random Numbers (1) A sequence of random numbers R 1, R 2, …, must have two important statistical properties: Uniformity Independence. Random Number, R i, must be independently drawn from a uniform distribution with pdf: pdf for random numbers

Fall 2011 CSC 446/ Properties of Random Numbers (2) Uniformity: If the interval [0,1] is divided into n classes, or subintervals of equal length, the expected number of observations in each interval is N/n, where N is the total number of observations Independence: The probability of observing a value in a particular interval is independent of the previous value drawn

Fall 2011 CSC 446/ Generation of Pseudo-Random Numbers (PRN) (1) “Pseudo”, because generating numbers using a known method removes the potential for true randomness. If the method is known, the set of random numbers can be replicated!! Goal: To produce a sequence of numbers in [ 0,1 ] that simulates, or imitates, the ideal properties of random numbers (RN) - uniform distribution and independence.

Fall 2011 CSC 446/ Generation of Pseudo-Random Numbers (PRN) (2) Problems that occur in generation of pseudo-random numbers (PRN) Generated numbers might not be uniformly distributed Generated numbers might be discrete-valued instead of continuous-valued Mean of the generated numbers might be too low or too high Variance of the generated numbers might be too low or too high There might be dependence (i.e., correlation)

Fall 2011 CSC 446/ Generation of Pseudo-Random Numbers (PRN) (3) Departure from uniformity and independence for a particular generation scheme can be tested. If such departures are detected, the generation scheme should be dropped in favor of an acceptable one.

Fall 2011 CSC 446/ Generation of Pseudo-Random Numbers (PRN) (4) Important considerations in RN routines: The routine should be fast. Individual computations are inexpensive, but a simulation may require many millions of random numbers Portable to different computers – ideally to different programming languages. This ensures the program produces same results Have sufficiently long cycle. The cycle length, or period represents the length of random number sequence before previous numbers begin to repeat in an earlier order. Replicable. Given the starting point, it should be possible to generate the same set of random numbers, completely independent of the system that is being simulated Closely approximate the ideal statistical properties of uniformity and independence.

Fall 2011 CSC 446/ Techniques for Generating Random Numbers 3.1 Linear Congruential Method (LCM). Most widely used technique for generating random numbers 3.2 Combined Linear Congruential Generators (CLCG). Extension to yield longer period (or cycle) 3.3 Random-Number Streams.

Fall 2011 CSC 446/ Techniques for Generating Random Numbers (1): Linear Congruential Method (1) To produce a sequence of integers, X 1, X 2, … between 0 and m-1 by following a recursive relationship: X 0 is called the seed The selection of the values for a, c, m, and X 0 drastically affects the statistical properties and the cycle length. If c  0 then it is called mixed congruential method When c=0 it is called multiplicative congruential method The multiplier The increment The modulus

Fall 2011 CSC 446/ Techniques for Generating Random Numbers (1): Linear Congruential Method (2) The random integers are being generated in the range [ 0,m-1 ], and to convert the integers to random numbers:

Fall 2011 CSC 446/ Techniques for Generating Random Numbers (1): Linear Congruential Method (3) EXAMPLE: Use X 0 = 27, a = 17, c = 43, and m = 100. The X i and R i values are: X 1 = (17*27+43) mod 100 = 502 mod 100 = 2,R 1 = 0.02; X 2 = (17*2+43) mod 100 = 77 mod 100 =77, R 2 = 0.77 ; X 3 = (17*77+43) mod 100 = 1352 mod 100 = 52R 3 = 0.52; … Notice that the numbers generated assume values only from the set I = {0,1/m,2/m,….., (m-1)/m} because each X i is an integer in the set {0,1,2,….,m-1} Thus each R i is discrete on I, instead of continuous on interval [0,1]

Fall 2011 CSC 446/ Techniques for Generating Random Numbers (1): Linear Congruential Method (4) Maximum Density Such that the values assumed by R i, i = 1,2,…, leave no large gaps on [0,1] Problem: Instead of continuous, each R i is discrete Solution: a very large integer for modulus m (e.g., , 2 48 ) Maximum Period To achieve maximum density and avoid cycling. Achieved by: proper choice of a, c, m, and X 0. Most digital computers use a binary representation of numbers Speed and efficiency are aided by a modulus, m, to be (or close to) a power of 2.

Fall 2011 CSC 446/ Techniques for Generating Random Numbers (1): Linear Congruential Method (5) Maximum Period or Cycle Length: For m a power of 2, say m=2 b, and c  0, the longest possible period is P=m=2 b, which is achieved when c is relatively prime to m (greatest common divisor of c and m is 1) and a=1+4k, where k is an integer For m a power of 2, say m=2 b, and c=0, the longest possible period is P=m/4=2 b-2, which is achieved if the seed X 0 is odd and if the multiplier a is given by a=3+8k or a=5+8k for some k=0,1,…. For m a prime number and c=0, the longest possible period is P=m- 1, which is achieved whenever the multiplier a has the property that the smallest integer k such that a k -1 is divisible by m is k=m-1

Fall 2011 CSC 446/ Techniques for Generating Random Numbers (1): Linear Congruential Method (6) Example: Using the multiplicative congruential method, find the period of the generator for a=13, m=2 6 =64 and X 0 =1,2,3 and 4 m=64, c=0; Maximal period P=m/4 = 16 is achieved by using odd seeds X 0 =1 and X 0 =3 ( a=13 is of the form 5+8k with k=1 ) With X 0 =1, the generated sequence {1,5,9,13,…,53,57,61} has large gaps Not a viable generator !! Density insufficient, period too short i Xi Xi Xi Xi

Fall 2011 CSC 446/ Techniques for Generating Random Numbers (1): Linear Congruential Method (7) Example: Speed and efficiency in using the generator on a digital computer is also a factor Speed and efficiency are aided by using a modulus m either a power of 2 (= 2 b )or close to it After the ordinary arithmetic yields a value of aX i +c, X i+1 can be obtained by dropping the leftmost binary digits and then using only the b rightmost digits

Fall 2011 CSC 446/ Techniques for Generating Random Numbers (1): Linear Congruential Method (8) Example: c=0; a=7 5 =16807; m= =2,147,483,647 (prime #) Period P=m-1 (well over 2 billion) Assume X 0 =123,457 X 1 =7 5 (123457)mod( )=2,074,941,799 R 1 =X 1 /2 31 = X 2 =7 5 (2,074,941,799) mod( )=559,872,160 R 2 =X 2 /2 31 = X 3 =7 5 (559,872,160) mod( )=1,645,535,613 R 3 =X 3 /2 31 = ………. Note that the routine divides by m+1 instead of m. Effect is negligible for such large values of m.

Fall 2011 CSC 446/ Techniques for Generating Random Numbers (2): Combined Linear Congruential Generators (1) With increased computing power, the complexity of simulated systems is increasing, requiring longer period generator. Examples: 1) highly reliable system simulation requiring hundreds of thousands of elementary events to observe a single failure event; 2) A computer network with large number of nodes, producing many packets Approach: Combine two or more multiplicative congruential generators in such a way to produce a generator with good statistical properties

Fall 2011 CSC 446/ Techniques for Generating Random Numbers (2): Combined Linear Congruential Generators (2) L’Ecuyer suggests how this can be done: If W i,1, W i,2,….,W i,k are any independent, discrete valued random variables (not necessarily identically distributed) If one of them, say W i,1 is uniformly distributed on the integers from 0 to m 1 -2, then is uniformly distributed on the integers from 0 to m 1 -2

Fall 2011 CSC 446/ Techniques for Generating Random Numbers (2): Combined Linear Congruential Generators (3) Let X i,1, X i,2, …, X i,k, be the i th output from k different multiplicative congruential generators. The j th generator: –Has prime modulus m j and multiplier a j and period is m j -1 –Produced integers X i,j is approx ~ Uniform on integers in [ 1, m j -1 ] – W i,j = X i,j -1 is approx ~ Uniform on integers in [ 0, m j -2 ]

Fall 2011 CSC 446/ Techniques for Generating Random Numbers (2): Combined Linear Congruential Generators (4) Suggested form: The maximum possible period for such a generator is:

Fall 2011 CSC 446/ Techniques for Generating Random Numbers (2): Combined Linear Congruential Generators (5) Example: For 32-bit computers, L’Ecuyer [1988] suggests combining k = 2 generators with m 1 = 2,147,483,563, a 1 = 40,014, m 2 = 2,147,483,399 and a 2 = 20,692. The algorithm becomes: Step 1: Select seeds – X 1,0 in the range [ 1, 2,147,483,562] for the 1 st generator – X 2,0 in the range [ 1, 2,147,483,398] for the 2 nd generator. Step 2: For each individual generator, X 1,j+1 = 40,014 X 1,j mod 2,147,483,563 X 2,j+1 = 40,692 X 1,j mod 2,147,483,399. Step 3: X j+1 = ( X 1,j+1 - X 2,j+1 ) mod 2,147,483,562. Step 4:Return Step 5: Set j = j+1, go back to step 2. Combined generator has period: (m 1 – 1)(m 2 – 1)/2 ~ 2 x 10 18

Fall 2011 CSC 446/ Techniques for Generating Random Numbers (3): Random-Numbers Streams (1) The seed for a linear congruential random-number generator: Is the integer value X 0 that initializes the random-number sequence. Any value in the sequence can be used to “seed” the generator. A random-number stream : Refers to a starting seed taken from the sequence X 0, X 1, …, X P. If the streams are b values apart, then stream i could defined by starting seed: Older generators: b = 10 5 ; Newer generators: b =

Fall 2011 CSC 446/ Techniques for Generating Random Numbers (3): Random-Numbers Streams (2) A single random-number generator with k streams can act like k distinct virtual random-number generators To compare two or more alternative systems. Advantageous to dedicate portions of the pseudo-random number sequence to the same purpose in each of the simulated systems.

Fall 2011 CSC 446/ Tests for Random Numbers (1): Principles (1) Desirable properties of random numbers: Uniformity and Independence Number of tests can be performed to check these properties been achieved or not Two type of tests: Frequency Test : Uses the Kolmogorov-Smirnov or the Chi- square test to compare the distribution of the set of numbers generated to a uniform distribution Autocorrelation test: Tests the correlation between numbers and compares the sample correlation to the expected correlation, zero

Fall 2011 CSC 446/ Tests for Random Numbers (1): Principles (2) Two categories: Testing for uniformity. The hypotheses are: H 0 : R i ~ U[0,1] H 1 : R i ~ U[0,1] –Failure to reject the null hypothesis, H 0, means that evidence of non-uniformity has not been detected. Testing for independence. The hypotheses are: H 0 : R i ~ independently distributed H 1 : R i ~ independently distributed –Failure to reject the null hypothesis, H 0, means that evidence of dependence has not been detected. / /

Fall 2011 CSC 446/ Tests for Random Numbers (1): Principles (3) For each test, a Level of significance  must be stated. The level  is the probability of rejecting the null hypothesis H 0 when the null hypothesis is true:  = P(reject H 0 |H 0 is true) The decision maker sets the value of  for any test Frequently  is set to 0.01 or 0.05

Fall 2011 CSC 446/ Tests for Random Numbers (1): Principles (4) When to use these tests: If a well-known simulation languages or random-number generators is used, it is probably unnecessary to test If the generator is not explicitly known or documented, e.g., spreadsheet programs, symbolic/numerical calculators, tests should be applied to many sample numbers. Types of tests: Theoretical tests : evaluate the choices of m, a, and c without actually generating any numbers Empirical tests : applied to actual sequences of numbers produced. Our emphasis.

Fall 2011 CSC 446/ Tests for Random Numbers (2): Frequency Tests (1) Test of uniformity Two different methods: Kolmogorov-Smirnov test Chi-square test Both these tests measure the degree of agreement between the distribution of a sample of generated random numbers and the theoretical uniform distribution Both tests are based on null hypothesis of no significant difference between the sample distribution and the theoretical distribution

Fall 2011 CSC 446/ Tests for Random Numbers (2): Frequency Tests (2) Kolmogorov-Smirnov Test (1) Compares the continuous cdf, F(x), of the uniform distribution with the empirical cdf, S N (x), of the N sample observations. We know: If the sample from the RN generator is R 1, R 2, …, R N, then the empirical cdf, S N (x) is: The cdf of an empirical distribution is a step function with jumps at each observed value.

Fall 2011 CSC 446/ Tests for Random Numbers (2): Frequency Tests (2) Kolmogorov-Smirnov Test (2) Test is based on the largest absolute deviation statistic between F(x) and S N (x) over the range of the random variable : D = max| F(x) - S N (x)| The distribution of D is known and tabulated (A.8) as function of N Steps: 1.Rank the data from smallest to largest. Let R (i) denote i th smallest observation, so that R (1)  R (2)  …  R (N) 2.Compute 3.Compute D= max(D +, D - ) 4.Locate in Table A.8 the critical value D , for the specified significance level  and the sample size N 5.If the sample statistic D is greater than the critical value D , the null hypothesis is rejected. If D  D , conclude there is no difference

Fall 2011 CSC 446/ Tests for Random Numbers (2): Frequency Tests (2) Kolmogorov-Smirnov Test (3) Example: Suppose 5 generated numbers are 0.44, 0.81, 0.14, 0.05, Step 1: Step 2: Step 3: D = max(D +, D - ) = 0.26 Step 4: For  = 0.05, D  = > D Hence, H 0 is not rejected. Arrange R (i) from smallest to largest D + = max {i/N – R (i) } D - = max {R (i) - (i-1)/N} R (i) i/N i/N – R (i) R (i) – (i-1)/N

Fall 2011 CSC 446/ Tests for Random Numbers (2): Frequency Tests (3) Chi-Square Test (1) Chi-square test uses the sample statistic: Approximately the chi-square distribution with n-1 degrees of freedom (where the critical values are tabulated in Table A.6) For the uniform distribution, E i, the expected number in the each class is: Valid only for large samples, e.g. N >= 50 Reject H 0 if  0 2 >  ,N-1 2 n is the # of classes O i is the observed # in the i th class E i is the expected # in the i th class

Fall 2011 CSC 446/ Tests for Random Numbers (2): Frequency Tests (3) Chi-Square Test (2) Example 7.7: Use Chi-square test for the data shown below with  =0.05. The test uses n=10 intervals of equal length, namely [0,0.1),[0.1,0.2), …., [0.9,1.0)

Fall 2011 CSC 446/ Tests for Random Numbers (2): Frequency Tests (3) Chi-Square Test (3) The value of  0 2 =3.4; The critical value from table A.6 is  0.05,9 2 =16.9. Therefore the null hypothesis is not rejected

Fall 2011 CSC 446/ Tests for Random Numbers (3): Tests for Autocorrelation (1) The test for autocorrelation are concerned with the dependence between numbers in a sequence. Consider: Though numbers seem to be random, every fifth number is a large number in that position. This may be a small sample size, but the notion is that numbers in the sequence might be related

Fall 2011 CSC 446/ Tests for Random Numbers (3): Tests for Autocorrelation (2) Testing the autocorrelation between every m numbers ( m is a.k.a. the lag), starting with the i th number The autocorrelation  im between numbers: R i, R i+m, R i+2m, R i+(M+1)m M is the largest integer such that Hypothesis: If the values are uncorrelated: For large values of M, the distribution of the estimator of  im, denoted is approximately normal.

Fall 2011 CSC 446/ Tests for Random Numbers (3): Tests for Autocorrelation (3) Test statistics is: Z 0 is distributed normally with mean = 0 and variance = 1, and: If  im > 0, the subsequence has positive autocorrelation High random numbers tend to be followed by high ones, and vice versa. If  im < 0, the subsequence has negative autocorrelation Low random numbers tend to be followed by high ones, and vice versa.

Fall 2011 CSC 446/ Tests for Random Numbers (3): Tests for Autocorrelation (4) After computing Z 0, do not reject the hypothesis of independence if –z  /2  Z 0  z  /2  is the level of significance and z  /2 is obtained from table A.3

Fall 2011 CSC 446/ Tests for Random Numbers (3): Tests for Autocorrelation (5) Example: Test whether the 3 rd, 8 th, 13 th, and so on, for the output on Slide 37 are auto-correlated or not. Hence,  = 0.05, i = 3, m = 5, N = 30, and M = 4. M is the largest integer such that 3+(M+1)5  30. From Table A.3, z = Hence, the hypothesis is not rejected.

Fall 2011 CSC 446/ Tests for Random Numbers (3): Tests for Autocorrelation (6) Shortcoming: The test is not very sensitive for small values of M, particularly when the numbers being tested are on the low side. Problem when “fishing” for autocorrelation by performing numerous tests: If  = 0.05, there is a probability of 0.05 of rejecting a true hypothesis. If 10 independent sequences are examined, –The probability of finding no significant autocorrelation, by chance alone, is = –Hence, the probability of detecting significant autocorrelation when it does not exist = 40%

Fall 2011 CSC 446/ Caution Caution: Even with generators that have been used for years, some of which still in use, are found to be inadequate. This chapter provides only the basics Also, even if generated numbers pass all the tests, some underlying pattern might have gone undetected.