Download presentation
Presentation is loading. Please wait.
1
Chapter 3 INTERVAL ESTIMATES
BAE 5333 Applied Water Resources Statistics Biosystems and Agricultural Engineering Department Division of Agricultural Sciences and Natural Resources Oklahoma State University Source Dr. Dennis R. Helsel & Dr. Edward J. Gilroy 2006 Applied Environmental Statistics Workshop and Statistical Methods in Water Resources
2
Population vs. Sample We measure characteristics of a sample and infer that they apply to the population.
3
Intervals An interval computed from sample data provides information on how certain we are of the true population parameter.
4
Interval Estimates Confidence Interval – contains an unknown parameter (mean, median) of the population with a specified probability Prediction Interval – contains one or more future observations with a specified probability Tolerance Interval – contains a proportion (percentile) of future observations with a specified probability
5
What is Inside the Interval?
Confidence Interval – contains an unknown parameter (mean, median) of the population with a specified probability Prediction Interval – contains one or more future observations with a specified probability Tolerance Interval – contains a proportion (percentile) of future observations with a specified probability
6
Your Interval May Not Contain the True Value!
7
Meaning of a Confidence Interval
If you compute ten 90% confidence intervals Each from a different sample of data collected under identical conditions with identical methods Thus, each sample is equally valid Nine of the 10 intervals (90%) will contain the true mean. One will not! You never know if yours is that one!!!
8
Meaning of a Confidence Interval Ten 90% Confidence Intervals
9
Meaning of a Confidence Interval
Example: 90% Confidence Interval about the Mean We are 90% confident that the true mean turbidity in the Poteau River is between 5 and 200 NTU (Nephelometric Turbidity Units).
10
Computing Confidence Intervals
Parametric Intervals μ = population mean X = sample mean z = depends on confidence level σ = standard error of the mean _ Symmetric around the sample mean Confidence levels are valid if data are normally distributed or there are a large amount of data
11
Computing Confidence Intervals Nonparametric Intervals
Usually computed on median or other percentile Endpoints are data values Count the same number of data observations from each end of the ranked dataset Does not depend on assumption that data are normally distributed
12
Confidence Intervals on Skewed Data
Parametric intervals assume data follow a normal distribution or the mean does. If this is incorrect, the confidence intervals will not include the true value as often as the confidence interval suggests.
13
Confidence Intervals on Skewed Data
First Approach Transform data to approximate normality Compute the confidence interval Problem When the confidence interval is retransformed back to the original units, it is no longer a confidence internal on the mean (i.e. the confidence interval is only valid in transformed space) With logs, it is a confidence interval on the geometric mean; an estimate of the median
14
Confidence Intervals on Skewed Data
Example Arsenic Concentrations New Hampshire Ground Water
15
Confidence Intervals on Skewed Data
Second Approach Hope that the Central Limit Theorem applies. This is a function of data skewness and the sample size See Chapter 2 for Central Limit Theorem discussion
16
Bootstrapping Currently the best way to compute a confidence internal from skewed data, or small sample size Does not require assumption of normality
17
Confidence Intervals on Skewed Data
Third Approach - Bootstrapping Sample from the data set, with replacement The subsample is generated with replacement so that any data point can be sampled multiple times or not sampled at all. Compute the estimated statistic Do this many times Confidence endpoints determined from the ranked estimated statistic Based on the data set, so it works best with more data
18
Confidence Intervals on Skewed Data
Bootstrapping Example: Arsenic Data Set Randomly pick 25 values from a 25 point arsenic data set. Sample with replacement. Compute the mean of these 25 values Do again 1000 times A 2-sided 95% confidence interval for the mean is the 0.025*1000th and 0.975*1000th ranked values for the mean
19
Confidence Intervals on Skewed Data
Bootstrapping Example: Arsenic Data Set
20
Confidence Intervals on Skewed Data
Bootstrapping Example: Arsenic Data Set
21
Other Confidence Intervals
Can have other confidence intervals for other parameters Variance Standard Deviation Other percentiles Median Confidence intervals for a percentile is often call a “tolerance interval”
22
Prediction Intervals (contains one or more future observations with a specified probability) Simplest prediction interval (nonparametric) is to use the percentiles of the data set Two-sided 90% prediction interval use the 5th and 95th percentiles 90% of the observed data fall within this interval, and thus we expect that 90% of the future observations will also fall within this interval Requires ample data
23
Prediction Intervals Parametric prediction interval will be shorter than a nonparametric interval if: Data follow the distribution assumed by the interval calculation Easy method for prediction interval Transform data to look normal Compute interval Transform interval back to original units
24
Confidence vs. Prediction Intervals
A prediction interval will always be larger than the confidence interval for the same alpha. Why? The mean of 10 observations, for example, is always less variable than the location of the 10 observations themselves.
25
Tolerance Intervals (contains a proportion of future observations with a specified probability) An interval around a proportion of the distribution The proportion is called the “converge” What cutoff(s) will cover 95% of all future observations, with 90% confidence? Easy method for tolerance interval Transform data to look normal Compute interval Transform interval back to original units Works for prediction and tolerance intervals, but not confidence intervals
26
MINITAB Laboratory 2 Reading Assignment
Chapter 3 Describing Uncertainty (pages 65 to 96) Statistical Methods in Water Resources by D.R. Helsel and R.M. Hirsch MINITAB Laboratory 2
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.