Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 9/18/12 Confidence Intervals: Bootstrap Distribution SECTIONS 3.3, 3.4 Bootstrap.

Slides:



Advertisements
Similar presentations
Panel at 2013 Joint Mathematics Meetings
Advertisements

What Can We Do When Conditions Arent Met? Robin H. Lock, Burry Professor of Statistics St. Lawrence University BAPS at 2012 JSM San Diego, August 2012.
Confidence Intervals: Bootstrap Distribution
Hypothesis Testing: Intervals and Tests
Early Inference: Using Bootstraps to Introduce Confidence Intervals Robin H. Lock, Burry Professor of Statistics Patti Frazer Lock, Cummings Professor.
Estimation in Sampling
Chapter 8: Estimating with Confidence
Hypothesis Testing I 2/8/12 More on bootstrapping Random chance
Section 3.4 Bootstrap Confidence Intervals using Percentiles.
Chapter 19 Confidence Intervals for Proportions.
Describing Data: One Quantitative Variable
Estimation from Samples Find a likely range of values for a population parameter (e.g. average, %) Find a likely range of values for a population parameter.
1 Hypothesis Testing In this section I want to review a few things and then introduce hypothesis testing.
Estimates and sample sizes Chapter 6 Prof. Felix Apfaltrer Office:N763 Phone: Office hours: Tue, Thu 10am-11:30.
Confidence Intervals: The Basics
Connecting Simulation- Based Inference with Traditional Methods Kari Lock Morgan, Penn State Robin Lock, St. Lawrence University Patti Frazer Lock, St.
Starting Inference with Bootstraps and Randomizations Robin H. Lock, Burry Professor of Statistics St. Lawrence University Stat Chat Macalester College,
Bootstrapping: Let Your Data Be Your Guide Robin H. Lock Burry Professor of Statistics St. Lawrence University MAA Seaway Section Meeting Hamilton College,
Statistics: Unlocking the Power of Data Lock 5 Inference for Proportions STAT 250 Dr. Kari Lock Morgan Chapter 6.1, 6.2, 6.3, 6.7, 6.8, 6.9 Formulas for.
Statistics: Unlocking the Power of Data Lock 5 Hypothesis Testing: Hypotheses STAT 101 Dr. Kari Lock Morgan SECTION 4.1 Statistical test Null and alternative.
Chapter 7 Confidence Intervals and Sample Sizes
Confidence Intervals: Bootstrap Distribution
Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 250 Dr. Kari Lock Morgan Chapter 5 Normal distribution Central limit theorem Normal.
Normal Distribution Chapter 5 Normal distribution
Confidence Intervals I 2/1/12 Correlation (continued) Population parameter versus sample statistic Uncertainty in estimates Sampling distribution Confidence.
Statistics: Unlocking the Power of Data Lock 5 Synthesis STAT 250 Dr. Kari Lock Morgan SECTIONS 4.4, 4.5 Connecting bootstrapping and randomization (4.4)
Using Lock5 Statistics: Unlocking the Power of Data
Ch 8 Estimating with Confidence. Today’s Objectives ✓ I can interpret a confidence level. ✓ I can interpret a confidence interval in context. ✓ I can.
What Can We Do When Conditions Aren’t Met? Robin H. Lock, Burry Professor of Statistics St. Lawrence University BAPS at 2011 JSM Miami Beach, August 2011.
Estimation: Sampling Distribution
Estimating a Population Proportion
CHAPTER 20: Inference About a Population Proportion ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture.
Ch 8 Estimating with Confidence. Today’s Objectives ✓ I can interpret a confidence level. ✓ I can interpret a confidence interval in context. ✓ I can.
Stats 120A Review of CIs, hypothesis tests and more.
Chapter 18: Sampling Distribution Models
Rule of sample proportions IF:1.There is a population proportion of interest 2.We have a random sample from the population 3.The sample is large enough.
Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 101 Dr. Kari Lock Morgan 10/18/12 Chapter 5 Normal distribution Central limit theorem.
Active Learning Lecture Slides For use with Classroom Response Systems Statistical Inference: Confidence Intervals.
1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.
Confidence Intervals: Bootstrap Distribution
Statistics: Unlocking the Power of Data Lock 5 Bootstrap Intervals Dr. Kari Lock Morgan PSU /12/14.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Estimation: Confidence Intervals SECTION 3.2 Confidence Intervals (3.2)
Statistics : Statistical Inference Krishna.V.Palem Kenneth and Audrey Kennedy Professor of Computing Department of Computer Science, Rice University 1.
Statistics: Unlocking the Power of Data Lock 5 Exam 2 Review STAT 101 Dr. Kari Lock Morgan 11/13/12 Review of Chapters 5-9.
The Sampling Distribution of
Sampling distributions rule of thumb…. Some important points about sample distributions… If we obtain a sample that meets the rules of thumb, then…
Give your data the boot: What is bootstrapping? and Why does it matter? Patti Frazer Lock and Robin H. Lock St. Lawrence University MAA Seaway Section.
Confidence Interval Estimation For statistical inference in decision making:
Chapter 19 Confidence intervals for proportions
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Describing Data: One Quantitative Variable SECTIONS 2.2, 2.3 One quantitative.
1 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Example: In a recent poll, 70% of 1501 randomly selected adults said they believed.
Review Normal Distributions –Draw a picture. –Convert to standard normal (if necessary) –Use the binomial tables to look up the value. –In the case of.
Statistics: Unlocking the Power of Data Lock 5 Inference for Means STAT 250 Dr. Kari Lock Morgan Sections 6.4, 6.5, 6.6, 6.10, 6.11, 6.12, 6.13 t-distribution.
Constructing Bootstrap Confidence Intervals
Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 250 Dr. Kari Lock Morgan Chapter 5 Normal distribution (5.1) Central limit theorem.
Statistics: Unlocking the Power of Data Lock 5 Inference for Means STAT 250 Dr. Kari Lock Morgan Sections 6.4, 6.5, 6.6, 6.10, 6.11, 6.12, 6.13 t-distribution.
1 Probability and Statistics Confidence Intervals.
Synthesis and Review 2/20/12 Hypothesis Tests: the big picture Randomization distributions Connecting intervals and tests Review of major topics Open Q+A.
Chapter 7: Sampling Distributions Section 7.2 Sample Proportions.
1 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Example: In a recent poll, 70% of 1501 randomly selected adults said they believed.
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Estimation: Confidence Intervals SECTION 3.2 Confidence Intervals (3.2)
SECTION 7.2 Estimating a Population Proportion. Where Have We Been?  In Chapters 2 and 3 we used “descriptive statistics”.  We summarized data using.
Notes on Bootstrapping Jeff Witmer 10 February 2016.
Political Science 30: Political Inquiry. The Magic of the Normal Curve Normal Curves (Essentials, pp ) The family of normal curves The rule of.
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Synthesis and Review for Exam 1.
Confidence Intervals: Sampling Distribution
Week 10 Chapter 16. Confidence Intervals for Proportions
Elementary Statistics
Chapter 8: Estimating with Confidence
Presentation transcript:

Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 9/18/12 Confidence Intervals: Bootstrap Distribution SECTIONS 3.3, 3.4 Bootstrap distribution (3.3) 95% CI using standard error (3.3) Percentile method (3.4)

Statistics: Unlocking the Power of Data Lock 5 Common Misinterpretations “A 95% confidence interval contains 95% of the data in the population” “I am 95% sure that the mean of a sample will fall within a 95% confidence interval for the mean” “The probability that the population parameter is in this particular 95% confidence interval is 0.95”

Statistics: Unlocking the Power of Data Lock 5 Standard Error The standard error of a statistic, SE, is the standard deviation of the sample statistic The standard error can be calculated as the standard deviation of the sampling distribution

Statistics: Unlocking the Power of Data Lock 5 Confidence Intervals Population Sample... Calculate statistic for each sample Sampling Distribution Standard Error (SE): standard deviation of sampling distribution Margin of Error (ME) (95% CI: ME = 2×SE) statistic ± ME

Statistics: Unlocking the Power of Data Lock 5 Summary To create a plausible range of values for a parameter: o Take many random samples from the population, and compute the sample statistic for each sample o Compute the standard error as the standard deviation of all these statistics o Use statistic  2  SE One small problem…

Statistics: Unlocking the Power of Data Lock 5 Reality … WE ONLY HAVE ONE SAMPLE!!!! How do we know how much sample statistics vary, if we only have one sample?!? BOOTSTRAP!

Statistics: Unlocking the Power of Data Lock 5 Sample: 52/100 orange Where might the “true” p be? ONE Reese’s Pieces Sample

Statistics: Unlocking the Power of Data Lock 5 Imagine the “population” is many, many copies of the original sample (What do you have to assume?) “Population”

Statistics: Unlocking the Power of Data Lock 5 Reese’s Pieces “Population” Sample repeatedly from this “population”

Statistics: Unlocking the Power of Data Lock 5 To simulate a sampling distribution, we can just take repeated random samples from this “population” made up of many copies of the sample In practice, we can’t actually make infinite copies of the sample… … but we can do this by sampling with replacement from the sample we have (each unit can be selected more than once) Sampling with Replacement

Statistics: Unlocking the Power of Data Lock 5 Suppose we have a random sample of 6 people:

Statistics: Unlocking the Power of Data Lock 5 Original Sample A simulated “population” to sample from

Statistics: Unlocking the Power of Data Lock 5 Bootstrap Sample: Sample with replacement from the original sample, using the same sample size. Original Sample Bootstrap Sample

Statistics: Unlocking the Power of Data Lock 5 How would you take a bootstrap sample from your sample of Reese’s Pieces? Reese’s Pieces

Statistics: Unlocking the Power of Data Lock 5 Reese’s Pieces “Population”

Statistics: Unlocking the Power of Data Lock 5 Bootstrap A bootstrap sample is a random sample taken with replacement from the original sample, of the same size as the original sample A bootstrap statistic is the statistic computed on a bootstrap sample A bootstrap distribution is the distribution of many bootstrap statistics

Statistics: Unlocking the Power of Data Lock 5 Original Sample Bootstrap Sample Bootstrap Statistic Sample Statistic Bootstrap Statistic Bootstrap Distribution

Statistics: Unlocking the Power of Data Lock 5 Your original sample has data values 18, 19, 19, 20, 21 Is the following a possible bootstrap sample? 18, 19, 20, 21, 22 a)Yes b)No Bootstrap Sample 22 is not a value from the original sample

Statistics: Unlocking the Power of Data Lock 5 Your original sample has data values 18, 19, 19, 20, 21 Is the following a possible bootstrap sample? 18, 19, 20, 21 a)Yes b)No Bootstrap Sample Bootstrap samples must be the same size as the original sample

Statistics: Unlocking the Power of Data Lock 5 Your original sample has data values 18, 19, 19, 20, 21 Is the following a possible bootstrap sample? 18, 18, 19, 20, 21 a)Yes b)No Bootstrap Sample Same size, could be gotten by sampling with replacement

Statistics: Unlocking the Power of Data Lock 5 You have a sample of size n = 50. You sample with replacement 1000 times to get 1000 bootstrap samples. What is the sample size of each bootstrap sample? (a) 50 (b) 1000 Bootstrap Sample Bootstrap samples are the same size as the original sample

Statistics: Unlocking the Power of Data Lock 5 You have a sample of size n = 50. You sample with replacement 1000 times to get 1000 bootstrap samples. How many bootstrap statistics will you have? (a) 50 (b) 1000 Bootstrap Distribution One bootstrap statistic for each bootstrap sample

Statistics: Unlocking the Power of Data Lock 5 “Pull yourself up by your bootstraps” Why “bootstrap”? Lift yourself in the air simply by pulling up on the laces of your boots Metaphor for accomplishing an “impossible” task without any outside help

Statistics: Unlocking the Power of Data Lock 5 StatKey lock5stat.com/statkey/

Statistics: Unlocking the Power of Data Lock 5 Sampling Distribution Population µ BUT, in practice we don’t see the “tree” or all of the “seeds” – we only have ONE seed

Statistics: Unlocking the Power of Data Lock 5 Bootstrap Distribution Bootstrap “Population” What can we do with just one seed? Grow a NEW tree! µ

Statistics: Unlocking the Power of Data Lock 5 Bootstrap statistics are to the original sample statistic as the original sample statistic is to the population parameter Golden Rule of Bootstrapping

Statistics: Unlocking the Power of Data Lock 5 Standard Error The variability of the bootstrap statistics is similar to the variability of the sample statistics The standard error of a statistic can be estimated using the standard deviation of the bootstrap distribution!

Statistics: Unlocking the Power of Data Lock 5 Confidence Intervals Sample Bootstrap Sample... Calculate statistic for each bootstrap sample Bootstrap Distribution Standard Error (SE): standard deviation of bootstrap distribution Margin of Error (ME) (95% CI: ME = 2×SE) statistic ± ME Bootstrap Sample

Statistics: Unlocking the Power of Data Lock 5 Based on this sample, give a 95% confidence interval for the true proportion of Reese’s Pieces that are orange. a)(0.47, 0.57) b)(0.42, 0.62) c)(0.41, 0.51) d)(0.36, 0.56) e)I have no idea Reese’s Pieces 0.52 ± 2 × 0.05

Statistics: Unlocking the Power of Data Lock 5 What about Other Parameters? Generate samples with replacement Calculate sample statistic Repeat...

Statistics: Unlocking the Power of Data Lock 5 We can use bootstrapping to assess the uncertainty surrounding ANY sample statistic! If we have sample data, we can use bootstrapping to create a 95% confidence interval for any parameter! (well, almost…) The Magic of Bootstrapping

Statistics: Unlocking the Power of Data Lock 5 Atlanta Commutes What’s the mean commute time for workers in metropolitan Atlanta? Data: The American Housing Survey (AHS) collected data from Atlanta in 2004

Statistics: Unlocking the Power of Data Lock 5 Where might the “true” μ be? Random Sample of 500 Commutes WE CAN BOOTSTRAP TO FIND OUT!!!

Statistics: Unlocking the Power of Data Lock 5 Original Sample

Statistics: Unlocking the Power of Data Lock 5 “Population” = many copies of sample

Statistics: Unlocking the Power of Data Lock 5 Atlanta Commutes 95% confidence interval for the average commute time for Atlantans: (a) (28.2, 30.0) (b) (27.3, 30.9) (c) 26.6, 31.8 (d) No idea ± 2 × 0.915

Statistics: Unlocking the Power of Data Lock 5 What percentage of Americans believe in global warming? A survey on 2,251 randomly selected individuals conducted in October 2010 found that 1328 answered “Yes” to the question “Is there solid evidence of global warming?” Global Warming Source: “Wide Partisan Divide Over Global Warming”, Pew Research Center, 10/27/

Statistics: Unlocking the Power of Data Lock 5 Global Warming We are 95% sure that the true percentage of all Americans that believe there is solid evidence of global warming is between 57% and 61%

Statistics: Unlocking the Power of Data Lock 5 Does belief in global warming differ by political party? “Is there solid evidence of global warming?” The sample proportion answering “yes” was 79% among Democrats and 38% among Republicans. (exact numbers for each party not given, but assume n=1000 for each group) Give a 95% CI for the difference in proportions. Global Warming Source: “Wide Partisan Divide Over Global Warming”, Pew Research Center, 10/27/

Statistics: Unlocking the Power of Data Lock 5 Global Warming We are 95% sure that the difference in the proportion of Democrats and Republicans who believe in global warming is between 0.37 and 0.45.

Statistics: Unlocking the Power of Data Lock 5 Global Warming Based on the data just analyzed, can you conclude with 95% certainty that the proportion of people believing in global warming differs by political party? (a) Yes (b) No Yes. We are 95% confident that the difference is between 0.37 and 0.45, and this interval does not include 0 (no difference)

Statistics: Unlocking the Power of Data Lock 5 Body Temperature What is the average body temperature of humans? We are 95% sure that the average body temperature for humans is between  and  Shoemaker, What's Normal: Temperature, Gender and Heartrate, Journal of Statistics Education, Vol. 4, No. 2 (1996) 98.6  ???

Statistics: Unlocking the Power of Data Lock 5 Other Levels of Confidence What if we want to be more than 95% confident? How might you produce a 99% confidence interval for the average body temperature?

Statistics: Unlocking the Power of Data Lock 5 Percentile Method For a P% confidence interval, keep the middle P% of bootstrap statistics For a 99% confidence interval, keep the middle 99%, leaving 0.5% in each tail. The 99% confidence interval would be (0.5 th percentile, 99.5 th percentile) where the percentiles refer to the bootstrap distribution.

Statistics: Unlocking the Power of Data Lock 5 Body Temperature We are 99% sure that the average body temperature is between  and 

Statistics: Unlocking the Power of Data Lock 5 Level of Confidence Which is wider, a 90% confidence interval or a 95% confidence interval? (a) 90% CI (b) 95% CI A 95% CI contains the middle 95%, which is more than the middle 90%

Statistics: Unlocking the Power of Data Lock 5 Mercury and pH in Lakes Lange, Royals, and Connor, Transactions of the American Fisheries Society (1993) For Florida lakes, what is the correlation between average mercury level (ppm) in fish taken from a lake and acidity (pH) of the lake? Give a 90% CI for 

Statistics: Unlocking the Power of Data Lock 5 Mercury and pH in Lakes We are 90% confident that the true correlation between average mercury level and pH of Florida lakes is between and

Statistics: Unlocking the Power of Data Lock 5 Bootstrap CI Option 1: Estimate the standard error of the statistic by computing the standard deviation of the bootstrap distribution, and then generate a 95% confidence interval by Option 2: Generate a P% confidence interval as the range for the middle P% of bootstrap statistics

Statistics: Unlocking the Power of Data Lock 5 Two Methods for 95%

Statistics: Unlocking the Power of Data Lock 5 Two Methods For a symmetric, bell-shaped bootstrap distribution, using either the standard error method or the percentile method will given similar 95% confidence intervals If the bootstrap distribution is not bell- shaped or if a level of confidence other than 95% is desired, use the percentile method

Statistics: Unlocking the Power of Data Lock 5 Continuous versus Discrete A continuous distribution can take any value within some range. The distribution will look like a smooth curve, without any gaps A discrete distribution only takes certain values. The distribution will be spiky, with gaps. Continuous Discrete

Statistics: Unlocking the Power of Data Lock 5 Criteria for Bootstrap CI Using the percentile method for a confidence interval bootstrapping for a confidence interval works for any statistic, as long as the bootstrap distribution is 1.Approximately symmetric 2.Approximately continuous Using the standard error method also requires 3.Approximately bell-shaped Always look at the bootstrap distribution to make sure these are true!

Statistics: Unlocking the Power of Data Lock 5 Criteria for Bootstrap CI

Statistics: Unlocking the Power of Data Lock 5 Number of Bootstrap Samples When using bootstrapping, you may get a slightly different confidence interval each time. This is fine! The more bootstrap samples you use, the more precise your answer will be. For the purposes of this class, 1000 bootstrap samples is fine. In real life, you probably want to take 10,000 or even 100,000 bootstrap samples

Statistics: Unlocking the Power of Data Lock 5 Summary The standard error of a statistic is the standard deviation of the sample statistic, which can be estimated from a bootstrap distribution Confidence intervals can be created using the standard error or the percentiles of a bootstrap distribution Confidence intervals can be created this way for any parameter, as long as the bootstrap distribution is approximately symmetric and continuous

Statistics: Unlocking the Power of Data Lock 5 Project 1 Pose a question that you would like to investigate. If possible, choose something related to your major! Find or collect data that will help you answer this question (you may need to edit your question based on available data) You can choose either a single variable or a relationship between two variables

Statistics: Unlocking the Power of Data Lock 5 Project 1 The result will be a five page paper including  Description of the data collection method, and the implications this has for statistical inference  Descriptive statistics (summary stats, visualization)  Confidence intervals  Hypothesis testing Proposal due 9/27  Can submit earlier if want feedback sooner  Include data if you are using existing data  If collecting your own data, proposal should include a detailed data collection plan

Statistics: Unlocking the Power of Data Lock 5 Data for Project 1 Collect your own data  Representative sample?  Randomized experiment? Use existing data  Internet!  Ongoing research in your home department 

Statistics: Unlocking the Power of Data Lock 5 To Do Read Sections 3.3, 3.4 Do Homework 3 (due Tuesday, 9/25)Homework 3 Idea and data for Project 1 (proposal due 9/27)Project 1proposal