Presentation is loading. Please wait.

Presentation is loading. Please wait.

Choosing A Distribution Risk Analysis for Water Resources Planning and Management Institute for Water Resources May 2008.

Similar presentations


Presentation on theme: "Choosing A Distribution Risk Analysis for Water Resources Planning and Management Institute for Water Resources May 2008."— Presentation transcript:

1 Choosing A Distribution Risk Analysis for Water Resources Planning and Management Institute for Water Resources May 2008

2 Probability x Consequence Quantitative risk assessment requires you to use probability distributions to Describe data Model variability Represent our uncertainty What distribution do I use?

3 First! Do you have data? If so, do you need a distribution or can you just use your data? Answer depends on the question(s) you’re trying to answer as well as your data

4 Use Data If your data are representative of the population germane to your problem use them One problem could be bounding data What are the true min & max? Any dataset can be converted into a Cumulative distribution function General density function

5 Fitting Empirical Distribution to Data If continuous & reasonably extensive * May have to estimate minimum & maximum * Rank data x(i) in ascending order * Calculate the percentile for each value * Use data and percentiles to create cumulative distribution function

6 When You Need a Distribution Have some data but not enough for empirical distribution A probability model is needed Many random events do follow standard probability models Not everything follows a model

7 Dilemma Given wide variety of distributions it is not always easy to select the most appropriate one Results can be very sensitive to distribution choice Using wrong assumption in a model can produce incorrect results Incorrect results can lead to poor decisions Poor decisions can lead to undesirable outcomes

8 First, Understand Your Data 1. Is your variable discrete or continuous ? Do not overlook this! Discrete distributions- take one of a set of identifiable values, each of which has a calculable probability of occurrence. Continuous distributions- a variable that can take any value within a defined range

9 What Values Are Possible? 2. Is your variable bounded or unbounded? Bounded-value confined to lie between two determined values Unbounded-value theoretically extends from minus infinity to plus infinity Partially bounded-constrained at one end (truncated distributions) Use a distribution that matches

10 Are There Parameters 3. Does your variable have parameters that are meaningful? Parametric--model-based distributions, for which the shape is determined by the mathematics describing a conceptual probability model Require a greater knowledge of the underlying Non-parametric—empirical distributions for which the mathematics is defined by the shape required Intuitively easy to understand Flexible and therefore useful

11 Is It Dependent on Other Variables 4. Univariate and multivariate distributions Univariate--describes a single parameter or variable that is not probabilistically linked to any other in the model Multivariate--describe several parameters that are probabilistically linked in some way

12 Do You Know the Parameters? 5. First or Second order distribution First order—a probability distribution with precisely known parameters (N(100,10)) Second order--a probability with some uncertainty about its parameters (N( m, s ))

13 Continuous Distribution Examples Unbounded Normal t Logistic Left Bounded Chi-square Exponential Gamma Lognormal Weibull Bounded Beta Cumulative General/histogram Pert Uniform Triangle

14 Discrete Distribution Examples Unbounded None Left Bounded Poisson Negative binomial Geometric Bounded Binomial Hypergeometric Discrete Discrete Uniform

15 Parametric and Non-Parametric Normal Lognormal Exponential Poisson Binomial Gamma Uniform Pert Triangular Cumulative

16 2 General Approaches to Choosing Distributions Choose the math (parametric) Choose the shape (nonparametric) Empirical data exist No data are available

17 Choose Parametric Distribution If Theory supports choice Distribution proven accurate for modelling your specific variable (without theory) Distribution matches observed data well Need distribution with tail extending beyond the observed minimum or maximum

18 Choose Non-Parametric Distribution If Theory is lacking There is no commonly used model Data are severely limited Knowledge is limited to general beliefs and some evidence

19 What is source of data? Experiments Observation Surveys Computer databases Literature searches Simulations Test case The source of the data may affect your decision to use it or not.

20 Checklist for Choosing a Distributions From Some Data 1. Understand your variable (preceding) 2. Look at your data—plot it 3. Use theory 4. Calculate statistics 5. Use previous experience 6. Distribution fitting 7. Expert opinion 8. Sensitivity analysis

21 Plot--Old Faithful Eruptions Find this distribution! You could fit data to this Mean & SD and assume its normal Beware, danger lurks Always plot your data

22 Which Distribution? Examine a histogram Look for distinctive shapes of specific distributions Single peaks Symmetry Positive skew Negative values Gamma, Weibull, beta are useful and flexible forms

23 Which Distribution? Summary statistics can provide clues Normal has low coefficient of variation and equal mean and median Exponential has positive skew and equal mean and standard deviation Go to RiskView and check this out.

24 Outliers Extreme observations can drastically influence a probability model No prescriptive method for addressing them If observation is an error remove it If not what is data point telling you? What about your world-view is inconsistent with this result? Should you reconsider your perspective? What possible explanations have you not yet considered?

25 Outliers (cont) Your explanation must be correct, not merely plausible Consensus is poor measure of truth If you must keep it and can't explain it Use conventional practices and live with skewed consequences Choose methods less sensitive to such extreme observations (Gumbel, Weibull)

26 Goodness of Fit Provides statistical evidence to test hypothesis about nature of the distribution H 0 these data come from an “x” distribution Small test statistic and large p are “desirable” for accepting H 0 Another piece of evidence not a determining factor

27 Chi-Square Test Most common—discrete & continuous Tests H 0 that sample data come from a specific distribution versus H that they do not Non-parametric and one-sided Data are divided into a number of cells, each cell with at least five Usually 50 observations or more

28 Kolomogorov-Smirnov Test Tests H 0 that continuous sample data come from a specific distribution versus H that they do not More suitable for small samples than Chi-Square Sort data in ascending order and find greatest difference between theoretical value for each ranked observation and that observation’s theoretical counterpart Better fit for means than tails Less than 0.03 indicates a good fit to hypothesized distribution

29 Kolmogorov-Smirnov Statistic

30 Andersen-Darling Test Similar to K-S—continuous variables Weights differences between theoretical and empirical distributions at their tails greater than at their midranges Desirable when better fit at extreme tails of distribution are desired Value less than 1.5 is usually a good fit

31 No Data Available Modelers must resort to judgment Knowledge of distributions is valuable in this situation

32 Defining Distributions w/ Expert Opinion Data never collected Data too expensive or impossible Past data irrelevant Opinion needed to fill holes in sparse data New area of inquiry, unique situation that never existed

33 What Experts Estimate The distribution itself Judgment about distribution of value in population E.g. population is normal Parameters of the distribution E.g. mean is x and standard deviation is y

34 Modeling Techniques Disaggregation (Reduction) Subjective Probability Elicitation * PDF or CDF * Parametric or Non-parametric distributions

35 Take Away Points Choosing the best distribution is where most new risk assessors feel least comfortable. Choice of distribution matters. Distributions come from data and expert opinion. Distribution fitting should never be the basis for distribution choice.


Download ppt "Choosing A Distribution Risk Analysis for Water Resources Planning and Management Institute for Water Resources May 2008."

Similar presentations


Ads by Google