Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sampling And Resampling Risk Analysis for Water Resources Planning and Management Institute for Water Resources May 2007.

Similar presentations


Presentation on theme: "Sampling And Resampling Risk Analysis for Water Resources Planning and Management Institute for Water Resources May 2007."— Presentation transcript:

1 Sampling And Resampling Risk Analysis for Water Resources Planning and Management Institute for Water Resources May 2007

2 Learning Objectives At the end of this session participants will be able to: List sampling techniques Estimate desired sample size Describe a resample and a bootstrap procedure

3 Why Sample? Expense Time Impossible/impractical Better data from a sample

4 We Want Representativeness Men Women W M Representative Not Representative Not Representative

5 Impediments to Representativeness Bias Systematic differences between sample and population Can be eliminated by good sample design Sample error Unavoidable differences due to chance in selection of a sample This is “dumb luck” It cannot be eliminated or avoided so we have to address it

6 Sampling Techniques Simple Random Sample Stratified Sample Cluster Sample Sequential Sample Non-Probability Sample/Convenience Sample

7 Think of Random Like This You want to identify each element of population Place a unique number from 1 to N on it Randomly select a number between 1 and N Find item with that number and measure it Replace number and repeat If same number comes up again, ignore it, replace it and choose again

8 How do you take a sample? Objectives of survey What question(s) are you trying to answer? What information do you need? ID target population Obtain sample frame Sample design Method of measurement Measurement instrument

9 How do you take a sample? Select and train field workers Pretest Organize field work Organize data management Data analysis

10 How big should a sample be? Trade-offs determine this precision (size of interval estimate) accuracy (capturing the value) sample size (what’s it going to cost?) What is important to your decision process? You pick any two and third is determined for you

11 Sample Size for Mean n is size of sample E is allowable error Precision z is z- value Accuracy (level of confidence) s is sample SD Pilot survey Guesstimate

12 Example Mean house value N = 501 E = $3000 z = 1.96 (95%) s = $10,000 n=[(1.96*10000)/3000] 2 =43

13 Sample Size for Proportion p is proportion w/ characteristic 1-p is proportion w/o characteristic Z and E as before

14 Example Proportion of homes with basement N=501 p=.5 1-p=.5 z=1.96 E=.05 n=.5*.5*(1.96/.03) 2 =1067

15 What happens when the population has less members than the sample size calculated requires? Step One : Calculate the sample size as before. n = n o noNnoN 1 + where n o is the sample size calculated in step one. Step Two : Calculate the new sample size.

16 What Happens if n > N? First, calculate the sample size as before. Second, calculate the new sample size using: n new =n old /[1+(n old /N)] n new =1067/[1+(1067/501)]=340

17 How n is Chosen in Practice Arbitrarily select a sample size As large a sample as you can get for a budget Pick a percentage for your sample Identify sample size required to obtain precision and accuracy desired!

18 With Good Samples…. We have classical statistical techniques that enable us to make inferences about the populations from which the samples were drawn Confidence intervals Hypothesis testing

19 Resampling Statistics is changing Computers make computational methods once inconceivable, possible Bootstrap Permutation tests Other resampling methods

20 Advantages of Resampling Fewer assumptions—normal and large n not required Greater accuracy—can be better than classical methods in some cases Generality—approach is pretty similar Promote understanding—not so theoretical

21 Bootstrapping Procedure 1) Resample Calculate bootstrap distribution Use bootstrap distribution

22 Bootstrap Idea Original sample represents population Take resamples by sampling with replacement from original random sample They “represent” many samples from population Bootstrap distribution of statistic represents sampling distribution

23 Concept 594 structure values ($1,000s) You want the population mean Glance says not normal Mean = 155.4 SD = 20.6

24 Original & Resample

25 Calculate Bootstrap Distribution Calculate statistic for each resample and make distribution of them

26 Resampling Distribution Took 500 samples of n = 594 with replacement from the original sample Calculated (500) means of these 500 samples Plot the resampling distribution of means (nearly normal) Mean = 155.9 (close) SD = 0.8

27 Bootstrap a Statistic Draw hundreds of resamples with replacement from original sample Inspect the bootstrap distribution of resampled statistics Bootstrap distribution approximates sampling distribution Approximate shape and spread, centers on original statistic not parameter Does not replace or add to data

28 Use Bootstrap Distribution Study characteristics of resampling distribution for insight

29 Bootstrap Mean & Confidence Intervals Sample Mean155.4 Resamples Mean155.9 Bias+0.5 Standard Error0.8 2.5 percentile154.3 5 percentile154.6 95 percentile157.3 97.5 percentile157.6 Confidence Interval 95% (t)155.9 ± 1.6

30 Why Bootstrapping Works Seems to create data out of nothing? Resamples not used as if real data Resample means are used to estimate how the sample mean for a sample of size 594 varies because of random sampling Use data twice Once to estimate population mean (original) Once to estimate variation in sample mean (resamples)

31 Applies to Other Statistics 25% trimmed mean (middle 50%) Difference between means Ratio of means Median Correlation coefficient Most anything

32 Take Away Points Sampling is a cost effective way to gather data Resampling offers analysts a powerful numerical technique for statistical analysis Resampling is relatively simple with resampling software

33 Accuracy Bootstrap based on large sample (n>100) Shape and spread do not depend much on original sample Does show shape and spread of sampling distribution Bootstrap based on small samples Almost all variation for a statistic comes from original sample, reduce variation with smaller sample size Does not overcome weakness of small samples as basis for inference Some methods (BCa, tilting) are better than standard methods

34 Beyond the Basics Bootstrap bias-corrected accelerated Adjusts percentile endpoints for 95% CI E.g., 4.1 to 98.6 instead of 2.5 to 97.5 for the 95% Bootstrap tilting Adjusts process of randomly forming resamples More efficient than BCa Use one of these more accurate methods if your software offers it

35 Permutation Tests Imagine experiment with 23 assigned randomly to control and 25 to treatment (n=48) Choose 25 of 48 at random and call this treatment (others to control) This is SRS without replacement—permutation resample Repeat 100s of times, calculate statistic of interest Permutation distribution—for 2 sample problems We can see if observed difference is so large that it would rarely occur if treatment did not matter!


Download ppt "Sampling And Resampling Risk Analysis for Water Resources Planning and Management Institute for Water Resources May 2007."

Similar presentations


Ads by Google