April 21, 2010 STAT 950 Chris Wichman. Motivation Every ten years, the U.S. government conducts a population census, and every five years the U. S. National.

Slides:



Advertisements
Similar presentations
Introduction Simple Random Sampling Stratified Random Sampling
Advertisements

Estimation of Means and Proportions
“Students” t-test.
Previous Lecture: Distributions. Introduction to Biostatistics and Bioinformatics Estimation I This Lecture By Judy Zhong Assistant Professor Division.
Sections 7-1 and 7-2 Review and Preview and Estimating a Population Proportion.
Sampling: Final and Initial Sample Size Determination
1 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Section 7.3 Estimating a Population mean µ (σ known) Objective Find the confidence.
3.3 Toward Statistical Inference. What is statistical inference? Statistical inference is using a fact about a sample to estimate the truth about the.
Ch 6 Introduction to Formal Statistical Inference.
Evaluating Hypotheses
Inference about a Mean Part II
STAT262: Lecture 5 (Ratio estimation)
Chapter 7 Estimation: Single Population
7-2 Estimating a Population Proportion
7/2/2015 (c) 2001, Ron S. Kenett, Ph.D.1 Sampling for Estimation Instructor: Ron S. Kenett Course Website:
STAT 4060 Design and Analysis of Surveys Exam: 60% Mid Test: 20% Mini Project: 10% Continuous assessment: 10%
5-3 Inference on the Means of Two Populations, Variances Unknown
STAT 572: Bootstrap Project Group Members: Cindy Bothwell Erik Barry Erhardt Nina Greenberg Casey Richardson Zachary Taylor.
Sampling Distributions
Chapter 7 Estimation: Single Population
Chapter 7 Sampling and Sampling Distributions Sampling Distribution of Sampling Distribution of Introduction to Sampling Distributions Introduction to.
Sampling The sampling errors are: for sample mean
1 1 Slide © 2005 Thomson/South-Western Chapter 7, Part A Sampling and Sampling Distributions Sampling Distribution of Sampling Distribution of Introduction.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 7 Sampling Distributions.
Estimation Using a Single Sample
Dan Piett STAT West Virginia University
LECTURE 17 THURSDAY, 2 APRIL STA291 Spring
QBM117 Business Statistics Estimating the population mean , when the population variance  2, is known.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Sections 6-1 and 6-2 Overview Estimating a Population Proportion.
Chap 20-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 20 Sampling: Additional Topics in Sampling Statistics for Business.
Topics: Statistics & Experimental Design The Human Visual System Color Science Light Sources: Radiometry/Photometry Geometric Optics Tone-transfer Function.
PROBABILITY (6MTCOAE205) Chapter 6 Estimation. Confidence Intervals Contents of this chapter: Confidence Intervals for the Population Mean, μ when Population.
PARAMETRIC STATISTICAL INFERENCE
Ch 6 Introduction to Formal Statistical Inference
Introduction to Probability and Statistics Thirteenth Edition Chapter 7 Sampling Distributions.
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Performance of Resampling Variance Estimation Techniques with Imputed Survey data.
Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Chapter 7 Sampling Distributions.
Sampling Design and Analysis MTH 494 LECTURE-12 Ossam Chohan Assistant Professor CIIT Abbottabad.
Sections 7-1 and 7-2 Review and Preview and Estimating a Population Proportion.
Introducing Inference with Bootstrapping and Randomization Kari Lock Morgan Department of Statistical Science, Duke University with.
Limits to Statistical Theory Bootstrap analysis ESM April 2006.
LECTURE 25 THURSDAY, 19 NOVEMBER STA291 Fall
Machine Learning Chapter 5. Evaluating Hypotheses
Ex St 801 Statistical Methods Inference about a Single Population Mean.
Understanding Basic Statistics
Review Normal Distributions –Draw a picture. –Convert to standard normal (if necessary) –Use the binomial tables to look up the value. –In the case of.
Estimation by Intervals Confidence Interval. Suppose we wanted to estimate the proportion of blue candies in a VERY large bowl. We could take a sample.
10.1 – Estimating with Confidence. Recall: The Law of Large Numbers says the sample mean from a large SRS will be close to the unknown population mean.
Understanding CI for Means Ayona Chatterjee Math 2063 University of West Georgia.
Chapter 8 Estimation ©. Estimator and Estimate estimator estimate An estimator of a population parameter is a random variable that depends on the sample.
MATH Section 4.4.
Sampling Design and Analysis MTH 494 LECTURE-11 Ossam Chohan Assistant Professor CIIT Abbottabad.
+ Chapter 8 Estimating with Confidence 8.1Confidence Intervals: The Basics 8.2Estimating a Population Proportion 8.3Estimating a Population Mean.
Statistics for Business and Economics 8 th Edition Chapter 7 Estimation: Single Population Copyright © 2013 Pearson Education, Inc. Publishing as Prentice.
Lecture 13 Dustin Lueker. 2  Inferential statistical methods provide predictions about characteristics of a population, based on information in a sample.
WARM UP: Penny Sampling 1.) Take a look at the graphs that you made yesterday. What are some intuitive takeaways just from looking at the graphs?
Introduction For inference on the difference between the means of two populations, we need samples from both populations. The basic assumptions.
Confidence Interval Estimation
Chapter 4. Inference about Process Quality
Towson University - J. Jung
Introduction to Inference
BOOTSTRAPPING: LEARNING FROM THE SAMPLE
Evaluating Hypotheses
Daniela Stan Raicu School of CTI, DePaul University
Chapter 9: One- and Two-Sample Estimation Problems:
Chapter 12 Inference for Proportions
Sampling Distributions
How Confident Are You?.
Presentation transcript:

April 21, 2010 STAT 950 Chris Wichman

Motivation Every ten years, the U.S. government conducts a population census, and every five years the U. S. National Agricultural Statistics Service conducts an Agriculture Census. Notice, that for the given “moment in time” that the census is taken, the total population, N, is known. In the intervening years, the numbers from each census are used to make inferences. For example, mean population in urban areas, and farm output (average bushels/acre).

Motivation Of interest is an intervening year population average: Two statistics commonly employed in these situations: The ratio estimator: The regression estimator:

Sample Average Without Replacement Samples Population Average, where the unbiased estimator of μ is When is based on a sample taken without replacement, the true variance of is: the unbiased estimator of which is:

The Problem with the Ordinary Bootstrap Recall, when a resample, is taken with replacement from the original sample then: Note that the only matches the form of if the sampling fraction,. In other words, the ordinary bootstrap fails to realize the “contraction” in.

Proposed Resampling Methods Modified Sample Size With replacement Without replacement Mirror Match Population Superpopulation

Modified Sample Size Find a resampling size such that the is approximately matched by. Process: Find the form of Take the expected value of and set equal to Solve for

Modified Sample Size With-Replacement For with replacement resampling, the bootstrapped variance of is: this leads to a modified sample size > than n

Modified Sample Size Without-replacement For without-replacement resampling, notice that the effective N for each resample is really n. The making the obvious choice for one in which

Mirror Match Goals: Capture the dependence due to sampling without- replacement Minimize the instability of the resampled statistic, by matching the original sample size Process: Suppose Then simply concatenate k resamples of size m together to form an

Mirror Match When m and k are not integers: Round m = nf to the nearest whole number Choose k such that Randomly select either k or (k+1) without-replacement resamples of size m from. Sampling probabilities should be chosen to match f

Population Bootstrap If is an integer: create a fake population Y*, by repeating k times. Generate R replicate samples of size n, by sampling without-replacement from Y*. Each resample will have the same sampling fraction as the original sample.

Population Bootstrap If is not an integer: Find k and l such that N = nk + l, and. create a fake population Y*, by repeating k times and joining it with a without replacement sample of size l from. This step is repeated R times. Generate R replicate samples of size n, by sampling without-replacement from Y*. Each resample will have the same sampling fraction as the original sample.

Superpopulation Bootstrap For each resample, 1,...,R Create a fake population, Y*, of size N, by resampling with replacement from, N times. From each Y 1 *,..., Y N * take a without replacement sample of size n. Each resample will have the same sampling fraction as the original sample.

Example 3.15: City Population Data A Comparison of Confidence Intervals In this example, the normal approximation C.I. refers to the bias corrected interval: The remaining intervals are Studentized confidence intervals :

Example 3.15: City Population Data Table 3.7 Resampling Scheme RatioRegression Normal Modified Size, n' = NA Modified Size, n' = Mirror Match, m = Population Superpopulation Resampling Scheme RatioRegression Normal Modified Size, n' = NA Modified Size, n' = Mirror Match, m = Population Superpopulation

Example 3.15: City Population Data Table 3.8 Recreated in R Coverage Length LowerUpperOverall AverageSD Normal Modified Size, n' = Modified Size, n' = Mirror Match, m = Population Superpopulation From BMA pg 96 Coverage Length LowerUpperOverall AverageSD Normal Modified Size, n' = Modified Size, n' = Mirror Match, m = Population Superpopulation

Example 3.15: City Population Data Figure 3.6

How Well does the Normal Approximation fit the Distribution of t reg and t rat ?

Conclusions About t rat and t reg The normal approximation for the ratio and regression estimators performs poorly. The estimated expected length of confidence intervals based on the normal approximation are very short relative to the other resampling methods. The estimated variance of the regression estimator is unstable, potentially causing huge swings in z* ultimately affecting the bounds of Studentized confidence intervals.

Stratified Sampling Suppose the population of interest is divided into k strata, then the population total, Each strata now has it’s own sampling fraction, Each strata represents proportion of the population.

t rat for a Stratified Sample Of interest is the overall mean: The ratio estimator for a stratified population becomes:

Example 3.17: Stratified Ratio Here, Davison and Hinkley drop the regression estimator, due to the potential instability of the variance affecting the bootstrapped confidence intervals. They also drop the Modified Sample, because they felt it was a “less promising” finite population resampling scheme.

Example 3.17: Methodology Simulate N pairs (u, x) divided into k strata of sizes “small-k”: k = 3, N i = 18, n i = 6 “small-k”:k = 5, N i = 72, n i = 24 “large-k”: k = 20, N i = 18, n i = different samples of size were taken from the dataset(s) produced above. For each sample, R=199 resamples were used to compute confidence intervals for θ.

All methods were used on the sample as described in example 3.15, with the exception of superpopulation resampling, which was conducted for each strata. Example 3.17: Methodology BMA Table 3.9k=20, N=18 k=5, N=72 k=3, N=18 LUO LUO LUO Normal Modified Sample Size Mirror-match Population Superpopulation

Conclusions: Stratified Sample The estimated coverage for Normal, Modified Sample Size, and Population resampling methods are all close to the nominal 90% desired. The “tail” probabilities are each roughly 5%. Neither the Mirror-match (estimated coverage of 83%), nor the Superpopulation (estimated coverage of 95%) performed very well. Due to their ease of calculation, Davison and Hinkley conclude that the Population and Modified Sample Size perform the best.