Chapter 4 – Modeling Basic Operations and Inputs

Chapter 4 – Modeling Basic Operations and Inputs
Collecting Data p. 154 Generally hard, expensive, frustrating, boring System might not exist Data available on the wrong things — might have to change model according to what’s available Incomplete, “dirty” data Too much data (!) Sensitivity of outputs to uncertainty in inputs Match model detail to quality of data Cost — should be budgeted in project Capture variability in data — model validity Garbage In, Garbage Out (GIGO) If you recall from the tarsand mining paper that was assigned in a previous class, the system did not exist. Hence, much of the data was based on the best judgment of the modeler. Tough estimates had to be made about the operation of the machinery in a cold environment (draglines, bucket-wheel excavators, trains, conveyor belts, etc.). Sensitivity Analysis permits a test on a range of estimates with the simulation model. If the range does not significantly affect the output, then it is assumed the estimates are good. Otherwise, more data about the system would have to be collected, including data requiring more estimates. Again, in the tarsand mining paper, how detailed should the operation of the dragline be developed and modeled? Could it represent a process with one basic motion represented in the simulation as a single probability distribution, or several motions with each having its own probability distribution? Think of the added cost for the latter. Capturing the variability in the data is especially important when comparing system performance with alternative designs for the system (again, we refer to the tarsand mining paper where alternative designs were based on draglines or bucket-wheel excavators). While we can obtain means for system performance based on given alternatives, the key question relates as to whether these alternatives are SIGNIFICANTLY DIFFERENT! Of course, in any use of data, GIGO makes the entire simulation effort meaningless. Simulation with Arena Chapter 4 – Modeling Basic Operations and Inputs

Using Data: Alternatives and Issues
Use data “directly” in simulation Read actual observed values to drive the model inputs (interarrivals, service times, part types, …) All values will be “legal” and realistic But can never go outside your observed data May not have enough data for long or many runs Computationally slow (reading disk files) Or, fit probability distribution to data “Draw” or “generate” synthetic observations from this distribution to drive the model inputs We’ve done it this way so far Can go beyond observed data (good and bad) May not get a good “fit” to data — validity? We can try to make direct observations for measures such as interarrival times, service times, etc., but the cost and effort to obtain good frequency distributions (which in the simulation could be entered as empirical distributions) could be high. Long waits to observe the data and record is one example. Model validity suffers if cost savings result in shorter collection periods for data. Also, detailed data from empirical distributions could slow down the simulation by increasing the computational burden. Another approach is to collect as much data as reasonably possible (given time and cost considerations), and try to fit it to an established (or theoretical) probability distribution commonly found in statistics (e.g., a normal, beta, exponential, or triangular distribution). Such distributions are more easily generated by the software packages than empirical distributions. However, care must be taken that the established distribution from statistics is a good fit to the data (i.e., is the distribution a valid representation of the observed data?). Simulation with Arena Chapter 4 – Modeling Basic Operations and Inputs

Fitting Distributions via the Arena Input Analyzer
Assume: Have sample data: Independent and Identically Distributed (IID) list of observed values from the actual physical system Want to select or fit a probability distribution for use in generating inputs for the simulation model Arena Input Analyzer Separate application, also accessible via Tools menu in Arena Fits distributions, gives valid Arena expression for generation to paste directly into simulation model In probability theory, a sequence or other collection of random variables is independent and identically distributed (i.i.d.) if each has the same probability distribution as the others and all are mutually independent. The abbreviation i.i.d. is particularly common in statistics (often as iid, sometimes written IID), where observations in a sample are often assumed to be (more-or-less) i.i.d. for the purposes of statistical inference. The assumption (or requirement) that observations be i.i.d. tends to simplify the underlying mathematics of many statistical methods. However, in practical applications this may or may not be realistic. The following are examples or applications of independent and identically distributed (i.i.d.) random variables: (1) a sequence of outcomes of spins of a roulette wheel is i.i.d. From a practical point of view, an important implication of this is that if the roulette ball lands on 'red', for example, 20 times in a row, the next spin is no more or less likely to be 'black' than on any other spin; (2) a sequence of dice rolls is i.i.d, (3) a sequence of coin flips is i.i.d. (The above follows from Wikipedia.) Accessible from the Tools pull-down menu, the Input Analyzer is a powerful tool that permits us to fit our data to an established theoretical probability distribution (e.g., normal, triangular, beta, uniform, exponential, …. distributions). However, we can make a frequency distribution based on observations, and treat it as a probability distribution by dividing the number of observations in a single time period by the total number of observations for all time periods, for each time period in our observations. Hence, the division converts frequencies into approximated probabilities, and these make up the empirical (or observed) distribution. The Input Analyzer has some features to help us do this. Simulation with Arena Chapter 4 – Modeling Basic Operations and Inputs

Fitting Distributions via the Arena Input Analyzer (cont’d.)
Fitting = deciding on distribution form (exponential, gamma, empirical, etc.) and estimating its parameters Several different methods (Maximum likelihood, moment matching, least squares, …) Assess goodness of fit via hypothesis tests H0: fitted distribution adequately represents the data Get p value for test (small = poor fit) Fitted “theoretical” vs. empirical distribution Continuous vs. discrete data, distribution “Best” fit from among several distributions Most of the above you might recall from past statistics courses. Our interest to us is to fit a given set of observed data that we may have collected about, for example, a server (such as a store casher checking out customers with their purchases). The methods used to fit the distributions are not of interest to us. We are interested in getting a distribution with the highest p value, when our observed data is matched with several distributions automatically generated by the Input Analyzer. The high “p valued” distribution can be used (with some reservations) as, for example, the distribution for a resource in an Arena Process Module (from the Project Bar). An examples of a continuous distribution is the normal distribution. A binomial distribution is an example of a discrete distribution. Do you recall that the former is smooth, and the latter is not. Simulation with Arena Chapter 4 – Modeling Basic Operations and Inputs

Data Files for the Input Analyzer
Create the data file (editor, word processor, spreadsheet, ...) Must be plain ASCII text (save as text or export) Data values separated by white space (blanks, tabs, linefeeds) Otherwise free format Open data file from within Input Analyzer File/New menu or File/Data File/Use Existing … menu or Get histogram, basic summary of data To see data file: Window/Input Data menu Can generate “fake” data file to play around File/Data File/Generate New … menu We will create a data file with either the editor or a word processor such as Microsoft WORD. Remember that all files must be saved as text files. Play with the instructions in the above slide (but don’t save any of the files). It will give you a feel for generating probability distributions. Simulation with Arena Chapter 4 – Modeling Basic Operations and Inputs

The Fit Menu Fits distributions, does goodness-of-fit tests Fit a specific distribution form Plots density over histogram for visual “test” Gives exact expression to Copy and Paste (Ctrl+C, Ctrl+V) over into simulation model May include “offset” depending on distribution Gives results of goodness-of-fit tests Chi square, Kolmogorov-Smirnov tests Most important part: p-value, always between 0 and 1: Probability of getting a data set that’s more inconsistent with the fitted distribution than the data set you actually have, if the the fitted distribution is truly “the truth” “Small” p (< 0.05 or so): poor fit (try again or give up) For the next slides, we suggest you first read pp (stopping before the third bullet). As you read these pages, try to do what the reading packet describes, then come back to this slide and the remaining slides. In this manner, the bullets on the slides are an overview of what you did following the reading packet, where you used the Arena Input Analyzer. Simulation with Arena Chapter 4 – Modeling Basic Operations and Inputs

The Fit Menu (cont’d.) Fit all Arena’s (theoretical) distributions at once Fit/Fit All menu or Returns the minimum square-error distribution Square error = sum of squared discrepancies between histogram frequencies and fitted-distribution frequencies Can depend on histogram intervals chosen: different intervals can lead to different “best” distribution Could still be a poor fit, though (check p value) To see all distributions, ranked: Window/Fit All Summary or Simulation with Arena Chapter 4 – Modeling Basic Operations and Inputs

The Fit Menu (cont’d.) “Fit” Empirical distribution (continuous or discrete): Fit/Empirical Can interpret results as a Discrete or Continuous distribution Discrete: get pairs (Cumulative Probability, Value) Continuous: Arena will linearly interpolate within the data range according to these pairs (so you can never generate values outside the range, which might be good or bad) Empirical distribution can be used when “theoretical” distributions fit poorly, or intentionally Simulation with Arena Chapter 4 – Modeling Basic Operations and Inputs

Some Issues in Fitting Input Distributions
Not an exact science — no “right” answer Consider theoretical vs. empirical Consider range of distribution Infinite both ways (e.g., normal) Positive (e.g., exponential, gamma) Bounded (e.g., beta, uniform) Consider ease of parameter manipulation to affect means, variances Simulation model sensitivity analysis Outliers, multimodal data Maybe split data set (see textbook for details) Simulation with Arena Chapter 4 – Modeling Basic Operations and Inputs

Chapter 4 – Modeling Basic Operations and Inputs

Similar presentations

Presentation on theme: "Chapter 4 – Modeling Basic Operations and Inputs"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Chapter 4 – Modeling Basic Operations and Inputs

Similar presentations

Presentation on theme: "Chapter 4 – Modeling Basic Operations and Inputs"— Presentation transcript:

Similar presentations

About project

Feedback