# Selecting Input Probability Distribution. Introduction need to specify probability distributions of random inputs –processing times at a specific machine.

## Presentation on theme: "Selecting Input Probability Distribution. Introduction need to specify probability distributions of random inputs –processing times at a specific machine."— Presentation transcript:

Selecting Input Probability Distribution

Introduction need to specify probability distributions of random inputs –processing times at a specific machine –interarrival times of customers/pieces –demand size evaluate data sets (if available) failure to choose the correct distribution can affect the accuracy of the model’s results! 040669 || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 2

Assessing Sample Independence correlation plot scatter diagram 040669 || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 3

Assessing Sample Independence important assumption –observations are supposed to be independent graphical techniques for informally assessing whether data are independent –correlation plot –scatter diagram 040669 || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 4

correlation plot graph of sample correlation – estimate of the true correlation between two observations that are j observations apart in time –if observations X 1, X 2, …, X n are independent then ½ j = 0 for j = 1, 2, …, n-1  estimates won’t be exactly zero, even if X i ’s are independent, since its an observation of a random variable  if estimates differ from 0 by a significant amount, then its strong evidence that the X i ’s are not independent 040669 || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 5

correlation plot (example) 040669 || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 6

correlation plot (example) 040669 || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 7

scatter diagram plot of pairs (X i, X i+1 ) –if X i ’s are independent, one would expect the points (X i, X i+1 ) to be scattered randomly throughout the first quadrant of the plane –nature of scattering depends on underlying distribution of the X i ’s –if X i ’s are positively (negatively) correlated, points will tend to lie along a line with positive (negative) slope 040669 || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 8

scatter diagram (example) 040669 || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 9

scatter diagram (example 2) 040669 || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 10

Specifying Distribution useful distributions use values directly define empirical distribution fit theoretical distribution 040669 || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 11

useful probability distribution parameters of continuous distributions –location parameter ° x-axis location usually the midpoint (mean for normal distribution) or lower endpoint also called “shift”-parameter changes in ° shift the distribution left or right without changing it otherwise –scale parameter ¯ determines scale (unit) of measurement standard deviation ¾ for normal distribution changes in ¯ compress or expand the associated distribution without altering its basic form 040669 || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 12

useful probability distribution parameters of continuous distributions –shape parameter ® determines basic form or shape of a distribution within the general family of distributions of interest a change in ® generally alters a distribution’s properties (skewness) more fundamentally than a change in location or scale 040669 || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 13

Approaches to specify distribution if data collection on an input random variable is possible –use data values directly in simulation (trace driven) only reproduces what happened seldom enough data to make all simulation runs useful for model validation –define empirical distribution at least (for continuous data) any value between min and max no values outside the range can be generated may have irregularities –fit to theoretical distribution preferred method easy to change 040669 || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 14

Specifying Distribution useful distributions use values directly define empirical distribution fit theoretical distribution 040669 || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 15

Uniform U(a,b) application used as a “first” model for a quantity that is felt to be randomly varying between a and b about which little else is known 040669 || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 16

exponential distribution exp( ¸ ) application –interarrival times of entities to a system that occur at a constant rate –time to failure of a piece of equipment parameters –scale parameter ¸ > 0 040669 || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 17

gamma(k, µ ) application –time to complete some task (customer service, machine repair) parameters –shape parameter k > 0 –scale parameter µ > 0 040669 || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 18

weibull(k, ¸ ) application –time to complete some task, time to failure of a piece of equipment –used as a rough model in absence of data parameters –shape parameter k > 0, scale parameter ¸ > 0 040669 || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 19

normal N( ¹, ¾ 2 ) application –errors of various types –quantities that are the sum of a large number of other quantities parameters –location parameter - 1 0 040669 || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 20

triangular (a,b,m) application –used as a rough model in absence of data –a, b, m are real numbers (a < m < b) location parameter a scale parameterb-a shape parameterm 040669 || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 21

poisson( ¸ ) application –number of events that occur in an interval of time when events are occurring at a constant rate –number of items demanded from inventory 040669 || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 22

Specifying Distribution useful distributions use values directly define empirical distribution fit theoretical distribution 040669 || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 23

Empirical Distributions use observed data themselves to specify distribution directly –generate random variables from empirical distribution –(if no theoretical distribution can be fitted) define a continuous piecewise-linear distribution function –sort X j ’s into increasing order –X (i) denotes the i th smallest value of all X j ’s 040669 || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 24

Empirical Distribution (example) observation: X 1 = 3, X 2 = 8, X 3 = 18, X 4 = 10, X 5 = 13, X 6 = 6 sorted observation: X (1) = 3, X (2) = 6, X (3) = 8, X (4) = 10, X (5) = 13, X (6) = 18 distribution F(X (i) ) F(X (i) ) = (i-1)/(n-1) F(X (1) ) = F(3) = 0/5 = 0 F(X (2) ) = F(6) = 1/5 F(X (3) ) = F(8) = 2/5 etc… 040669 || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 25 F(X) if X (i) · X · X (i+1) F(X) = (i-1)/(n-1) + (X –X (i) )/((n-1)*(X (i+1) -X (i) ) F(12) = ?? interval: X (4) · 12 < X (5) (n = 6, i = 4) F(12) = 3/5 + 2/(5*3) = 0.68

Empirical Distribution (example) 040669 || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 26

Specifying Distribution useful distributions use values directly define empirical distribution fit theoretical distribution 040669 || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 27

Necessary Steps for fitting a theoretical distribution hypothesize family –summary statistics –histogram –quantile summary & box plots estimate parameters how representative is fitted distribution? –Chi-Square Goodness of fit test –Kolmogorov-Smirnoff Test 040669 || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 28

Hypothesizing families of distributions first step in selecting a particular input distribution: –decide upon general family appears to be appropriate prior knowledge might be helpful –service times should never be generated from a normal distribution WHY???? approaches –summary statistics –histograms –quantile summaries and box plots 040669 || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 29

Summary Statistics some distributions are characterized at least partially by functions of their true paramters sample estimate –estimate for range minimumX (1) maxiumumX (n) –measure of tendency mean ¹ median x 0.5 040669 || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 30

Summary Statistics (cont.) sample estimate –measure of variability variance ¾ 2 coefficient of variation cv –measure of symmetry skewness 040669 || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 31

Histograms graphic estimate of the plot of the density function corresponding to the distribution of data –density functions tend to have recognizable shapes in many cases –graphical estimate of a density should provide a good clue to the distribution that might be tried as a model for the data 040669 || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 32

Histograms how to –break up range of values into k disjoint adjacent intervals (same width) [b 0, b 1 ), [b 1, b 2 ), …, [b k-1, b k ) ¢ b = b j – b j-1 –you might want to throw out a few extremely large or small X i ’s to avoid getting an unwidely-looking histogram plot –let h j be the proportion of X i ’s that are in the j th interval [b j-1, b j ) –hint: try several values of ¢ b and choose the smallest one that gives a “smooth” histogram 040669 || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 33

Histogram (example) create 1000 random variables ~N(0,1) –create histogram 040669 || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 34

Quantile Summaries useful for determining whether the underlying probability density function is skewed to the right or left –if F(x) is the distribution function for a continuous random variable –q-quantile of F(x) is that number x q such that F(x q ) = q medianx 0.5 lower/upper quartilesx 0.25 / x 0.75 lower/upper octiles x 0.125 / x 0.875 040669 || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 35

Quantile Summaries QuantileDepthSample ValuesMidpoint Mediani = (n+1)/2X (i) X (i) Quartilesj = (floor(i)+1)/2X (j) X (n-j+1) [X (j) + X [n-j+1) ]/2 Octilesk = (floor(j)+1)/2X (k) X (n-k+1) [X (k) + X [n-k+1) ]/2 Extremes1X (1) X (n) [(X (1) + X (n) ]/2 –if the underlying distribution of the X i ’s is symmetric, then the midpoints should be approximately equal –if the underlying distribution is skewed to the right (left), then the midpoints should be increasing (decreasing) 040669 || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 36

Box Plots (example) graphical representation of quantile summary –fifty percent of observations fall within the horizontal boundaries of the box [x 0.25, x 0.75 ] 040669 || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 37

Necessary Steps for fitting a theoretical distribution hypothesize family –summary statistics –histogram –quantile summary & box plots estimate parameters how representative is fitted distribution? –Chi-Square Goodness of fit test –Kolmogorov-Smirnoff Test 040669 || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 38

Estimation of Parameters After one ore more candidate families of distributions have been hypothesized we most somehow specify the values of their parameters in order to have a completely specified distributions for possible use in simulation maximum –likelihood estimators (MLEs) –estimator = numerical function of the data –unknown parameter µ –hypothesized density function f µ (x) –likelihood function L( µ ) –estimator is value µ that maximizes L µ over all permissible values of µ 040669 || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 39

Estimation for Parameters (example) exponential distribution with unknown parameter ¯ ( µ = ¯ ) –f ¯ (x) = (1/ ¯ ) e -x/ ¯ for x ¸ 0 –likelihood function L( ¯ ) –we seek value of ¯ that maximizes L( ¯ ) over all ¯ > 0 –easier to work with its logarithm (maximize l( ¯ ) instead of L( ¯ )) –maximize: set derivative equal to zero and solve for ¯ 040669 || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 40

Necessary Steps for fitting a theoretical distribution hypothesize family –summary statistics –histogram –quantile summary & box plots estimate parameters how representative is fitted distribution? –Chi-Square Goodness of fit test –Kolmogorov-Smirnoff Test 040669 || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 41

Goodness-of-Fit Tests Statistical hypothesis tests used to assess formally whether the observations X 1, X 2, … X n are independent samples form a particular distribution with distribution function H 0 the X i ’s are IID random variables with distribution function be careful: failure to reject H 0 should not be interpreted as “accepting H 0 as being true”. we’ll concentrate on two different ones –chi-square test –Kolmogorov-Smirnoff tests 040669 || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 42

Chi-Square Goodness-of-Fit Test more formal comparison of a histogram with the fitted density or mass function how to –divide range into k adjacent intervals [a 0, a 1 ), [a 1, a 2 ), …, [ a k-1, a k ) how to choose number and size of intervals? ! equiprobable –determine N j (number of X i ’s in the j th interval [a j-1, a j ) –compute p j (expected proportion of the X i ’s that would fall in the j th interval if we were sampling from the fitted distribution –determine test statistic χ² and reject H 0 if its too large 040669 || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 43

Chi-Square Goodness-of-Fit Test (cont.) case 1: all parameters of the fitted distribution are known –if H 0 is true, Â 2 converges in distribution (as n → 1 ) to a chi-square distribution with k-1 degrees of freedom –for large n, a test with approximate level ® is obtained by rejecting H 0 if – upper 1 - ® critical point for a chi-square distribution with k-1 dfs 040669 || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 44

Chi-Square Goodness-of-Fit Test (cont.) case 2: m parameters had to be estimated to specify fitted distribution –if H 0 is true, then as n ! 1 the distribution function of  2 converges to a distribution function that lies between the distribution function with k-1 and k-m-1 degrees of freedom – the upper 1 - ® critical point of the asymptotic distribution of   (in general not known) –reject H 0 if –do not reject H 0 if –ambiguous situation if recommendation reject H 0 if (conservative) 040669 || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 45

Kolmogorov-Smirnov Goodness-of-Fit Test compares an empirical distribution function with the distribution function of the hypothesized distribution –not necessary to group data –valid for any sample size n –tend to be more powerful than chi-squared tests –but: only valid if all parameters of the hypothesized distribution are known and the distribution is continuous 040669 || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 46

Kolmogorov-Smirnov Goodness-of-Fit Test (cont.) compute tests statistics –define empirical distribution function –test statistic D n corresponds to largest (vertical) distance between F n (x) and hypothesized distribution function of 040669 || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 47

Kolmogorov-Smirnov Goodness-of-Fit Test (cont.) case 1: all parameters of estimated distribution function are known –distribution of D n does not depend on (if is continuous) –reject H 0 if –c 1- ® (does not depend on n) given in the following table 1 - ® 0.850.90.950.9750.99 c 1- ® 1.1381.2241.3581.481.628 040669 || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 48

Kolmogorov-Smirnov Goodness-of-Fit Test (cont.) case 2: –hypothesized distribution is N( ¹, ¾ 2 ) with both ¹ and ¾ 2 unknown (estimated), estimated distribution function –D n is calculated the same way as in case 1 - different critical points –reject H 0 if –c’ 1- ® (does not depend on n) given in the following table 1 - ® 0.850.90.950.9750.99 c’ 1- ® 0.775 0.8190.895 0.955 1.035 040669 || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 49

Kolmogorov-Smirnov Goodness-of-Fit Test (cont.) case 3 –hypothesized distribution is exponentially distributed (exp( ¸ )) –with ¸ unknown (estimated using ) –estimated distribution function –reject H 0 if –c’’ 1- ® (does not depend on n) given in the following table 1 - ® 0.850.90.950.9750.99 c’ 1- ® 0.9260.9901.0941.191.308 040669 || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 50

Download ppt "Selecting Input Probability Distribution. Introduction need to specify probability distributions of random inputs –processing times at a specific machine."

Similar presentations