Simulation Modeling and Analysis Input Modeling 1 1
Outline Introduction Data Collection Matching Distributions with Data Parameter Estimation Goodness of Fit Testing Input Models without Data Multivariate and Time Series Input Models 2 2
Introduction Steps in Developing Input Data Model Data collection from the real system Identification of a probability distribution representing the data Select distribution parameters Goodness of fit testing 3 3
Data Collection Useful Suggestions Plan, practice, preobserve Analyze data as it is collected Combine homogeneous data sets Watch out for censoring Build scatter diagrams Check for autocorrelation 4 4
Identifying the Distribution Construction of Histograms Divide range of data into equal subintervals Label horizontal and vertical axes appropriately Determine frequency occurrences within each subinterval Plot frequencies 5 5
Physical Basis of Common Distributions Binomial: Number of successes in n independent trials each of probability p . Negative Binomial (Geometric): Number of trials required to achieve k successes. Poisson: Number of independent events occurring in a fixed amount of time and space (Time between events is Exponential). 6 6
Physical Basis of Common Distributions - contd Normal: Processes which are the sum of component processes. Lognormal: Processes which are the product of component processes. Exponential: Times between independent events (Number of events is Poisson). Gamma: Many applications. Non-negative random variables only. 7 7
Physical Basis of Common Distributions - contd Beta: Many applications. Bounded random variables only. Erlang: Processes which are the sum of several exponential component processes. Weibull: Time to failure. Uniform: Complete uncertainty. Triangular: When only minimum, most likely and maximum values are known. 8 8
Quantile-Quantile Plots If X is a RV with cdf F, the q-quantile of X is the value such that F() = P(X < ) = q Raw data {xi} Data rearranged by magnitude {yj} Then: yj is an estimate of the (j-1/2)/n quantile of X, i.e. yj ~ F-1[(j-1/2)/n] 9 9
Quantile-Quantile Plots -contd If F is a member of an appropriate family then a plot of yj vs. F-1[(j-1/2)/n] is a straight line If F also has the appropriate parameter values the line has a slope = 1. 10 10
Parameter Estimation Once a distribution family has been determined, its parameters must be estimated. Sample Mean and Sample Standard Deviation. 11 11
Parameter Estimation -contd Suggested Estimators Poisson: ~ mean Exponential: ~ 1/mean Uniform (on [0,b]): b ~ (n+1) max(X)/n Normal: ~ mean; 2 ~ S2 12 12
Goodness of Fit Tests Test the hypothesis that a random sample of size n of the random variable X follows a specific distribution. Chi-Square Test (large n; continuous and discrete distributions) Kolmogorov-Smirnov Test (small n; continuous distributions only) 13 13
Chi-Square Test Statistic 20 = k (Oi - Ei)2/Ei Follows the chi-square distribution with k-s-1 degrees of freedom (s = d.o.f. of given distribution) Here Ei = n pi is the expected frequency while Oi is the observed frequency. 14 14
Chi-Square Test -contd Steps Arrange the n observations into k cells Compute the statistic 20 = k (Oi - Ei)2/Ei Find the critical value of 2 (Handout) Accept or reject the null hypothesis based on the comparison Example: Stat::Fit 15 15
Chi-Square Test - contd If the test involves a discrete distribution each value of the RV must be in a class interval unless combined intervals are required. If the test involves a continuous distribution class intervals must be selected which are equal in probability rather than width. 16 16
Chi-Square Test - contd Example: Exponential distribution. Example: Weibull distribution. Example: Normal distribution. 17 17
Kolmogorov-Smirnov Test Identify the maximum absolute difference D between the values of of the cdf of a random sample and a specified theoretical distribution. Compare against the critical value of D (Handout). Accept or reject H0 accordingly Example. 18 18
Input Models without Data When hard data are not available, use: Engineering data (specs) Expert opinion Physical and/or conventional limitations Information on the nature of the process Uniform, triangular or beta distributions Check sensitivity! 19 19
Multivariate and Time-Series Input Models If input variables are not independent their relationship must be taken into consideration (multivariable input model). If input variables constitute a sequence (in time) of related random variables, their relationship must be taken into account (time-series input model). 20 20
Covariance and Correlation Measure the linear dependence between two random variables X1 (mean 1, std dev 1) and X2 (mean 2, std dev 2) X1 - 1 = (X2 - 2) + Covariance: cov(X1,X2) = E(X1 X2) - 1 2 Correlation: = cov(X1,X2)/12 21 21
Multivariate Input Models If X1 and X2 are normally distributed and interrelated, they can be modeled by a bivariate normal distribution Steps Generate Z1 and Z2 indepedendent standard RV’s Set X1 = 1 + 1 Z1 Set X2 = 2 + 2(Z1 + (1-2)1/2 Z2) 22 22
Time-Series Input Models Let X1,X2,X3,… be a sequence of identically distributed and covariance-stationary RV’s. The lag-h correlation is h = corr(Xt,Xt+h) = h If all Xt are normal: AR(1) model. If all Xt are exponential: EAR(1) model. 23 23
t are normal with mean = 0 and var = 2 AR(1) model For a time series model Xt = + (Xt-1 - ) + t where t are normal with mean = 0 and var = 2 24 24
AR(1) model -contd 1.- Generate X1 from a normal with mean and variance 2 /(1 - 2). Set t = 2. 2.- Generate t from a normal with mean = 0 and variance 2 . 3.- Set Xt = + (Xt-1 - ) + t 4.- Set t = t+1 and go to 2. 25 25
Xt = Xt-1 + t with prob EAR(1) model For a time series model Xt = Xt-1 with prob Xt = Xt-1 + t with prob where t are exponential with mean = 1/ and 26 26
EAR(1) model - contd 1.- Generate X1 from an exponential with mean . Set t = 2. 2.- Generate U from a uniform on [0,1]. If U < set Xt = Xt-1 . Otherwise generate from an exponential with mean 1/ and set Xt = Xt-1 + t 4.- Set t = t+1 and go to 2. 27 27