Sampling Theory and Surveys GV917. Introduction to Sampling In statistics the population refers to the total universe of objects being studied. Examples.

Slides:



Advertisements
Similar presentations
Mean, Proportion, CLT Bootstrap
Advertisements

Sampling.
Estimation in Sampling
1. Exams 2. Sampling Distributions 3. Estimation + Confidence Intervals.
Statistics for Managers Using Microsoft® Excel 5th Edition
Economics 105: Statistics Review #1 due next Tuesday in class Go over GH 8 No GH’s due until next Thur! GH 9 and 10 due next Thur. Do go to lab this week.
Confidence Intervals for Proportions
Chapter 7 Sampling Distributions
Sampling Distributions
Suppose we are interested in the digits in people’s phone numbers. There is some population mean (μ) and standard deviation (σ) Now suppose we take a sample.
Chapter 7 Sampling and Sampling Distributions
Quantitative Methods – Week 6: Inductive Statistics I: Standard Errors and Confidence Intervals Roman Studer Nuffield College
Why sample? Diversity in populations Practicality and cost.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 7-1 Chapter 7 Sampling Distributions Basic Business Statistics 10 th Edition.
The Excel NORMDIST Function Computes the cumulative probability to the value X Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc
7-1 Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall Chapter 7 Sampling and Sampling Distributions Statistics for Managers using Microsoft.
Formalizing the Concepts: Simple Random Sampling.
Understanding sample survey data
Sampling Theory Determining the distribution of Sample statistics.
Sampling Methods and Sampling Theory Alex Stannard.
INFERENTIAL STATISTICS – Samples are only estimates of the population – Sample statistics will be slightly off from the true values of its population’s.
Sample Design.
Chapter 5 Sampling Distributions
COLLECTING QUANTITATIVE DATA: Sampling and Data collection
Sampling Methods and Sampling Theory
Sampling. Concerns 1)Representativeness of the Sample: Does the sample accurately portray the population from which it is drawn 2)Time and Change: Was.
IB Business and Management
Multiple Indicator Cluster Surveys Survey Design Workshop Sampling: Overview MICS Survey Design Workshop.
7-1 Estim Unit 7 Statistical Inference - 1 Estimation FPP Chapters 21,23, Point Estimation Margin of Error Interval Estimation - Confidence Intervals.
Chap 20-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 20 Sampling: Additional Topics in Sampling Statistics for Business.
Review: Two Main Uses of Statistics 1)Descriptive : To describe or summarize a collection of data points The data set in hand = all the data points of.
Random Sampling, Point Estimation and Maximum Likelihood.
PARAMETRIC STATISTICAL INFERENCE
Copyright ©2011 Pearson Education 7-1 Chapter 7 Sampling and Sampling Distributions Statistics for Managers using Microsoft Excel 6 th Global Edition.
Scot Exec Course Nov/Dec 04 Survey design overview Gillian Raab Professor of Applied Statistics Napier University.
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
1 Chapter 7 Sampling Distributions. 2 Chapter Outline  Selecting A Sample  Point Estimation  Introduction to Sampling Distributions  Sampling Distribution.
Sampling Methods and Sampling Distributions
Estimation Chapter 8. Estimating µ When σ Is Known.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 7-1 Chapter 7 Sampling Distributions Basic Business Statistics.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Pearson Prentice-Hall, Inc.Chap 7-1 Statistics for Managers Using Microsoft® Excel 5th Edition.
STA Lecture 171 STA 291 Lecture 17 Chap. 10 Estimation – Estimating the Population Proportion p –We are not predicting the next outcome (which is.
ICCS 2009 IDB Workshop, 18 th February 2010, Madrid 1 Training Workshop on the ICCS 2009 database Weighting and Variance Estimation picture.
Statistics : Statistical Inference Krishna.V.Palem Kenneth and Audrey Kennedy Professor of Computing Department of Computer Science, Rice University 1.
Chapter 6: 1 Sampling. Introduction Sampling - the process of selecting observations Often not possible to collect information from all persons or other.
Inference: Probabilities and Distributions Feb , 2012.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 7-1 Chapter 7 Sampling and Sampling Distributions Basic Business Statistics 11 th Edition.
 When every unit of the population is examined. This is known as Census method.  On the other hand when a small group selected as representatives of.
Copyright © 2009 Pearson Education, Inc. 8.1 Sampling Distributions LEARNING GOAL Understand the fundamental ideas of sampling distributions and how the.
Basic Business Statistics
IPDET Module 9: Choosing the Sampling Strategy. IPDET © Introduction Introduction to Sampling Types of Samples: Random and Nonrandom Determining.
Sampling Distributions Sampling Distributions. Sampling Distribution Introduction In real life calculating parameters of populations is prohibitive because.
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION Sample Knowledge of students No. of red blood cells in a person Length of.
Chapter 7 Introduction to Sampling Distributions Business Statistics: QMIS 220, by Dr. M. Zainal.
Sampling Theory Determining the distribution of Sample statistics.
Producing Data: Sampling Inference: Probabilities and Distributions Week 8 March 1-2, 2011 (REVISED March 8 22:30)
The inference and accuracy We learned how to estimate the probability that the percentage of some subjects in the sample would be in a given interval by.
The accuracy of averages We learned how to make inference from the sample to the population: Counting the percentages. Here we begin to learn how to make.
Estimating a Population Proportion ADM 2304 – Winter 2012 ©Tony Quon.
Introduction Sample surveys involve chance error. Here we will study how to find the likely size of the chance error in a percentage, for simple random.
Statistics 19 Confidence Intervals for Proportions.
Sampling Design and Procedure
Lecture 5.  It is done to ensure the questions asked would generate the data that would answer the research questions n research objectives  The respondents.
Sampling Distributions
Making inferences from collected data involve two possible tasks:
Sampling Why use sampling? Terms and definitions
Chapter 7 Sampling Distributions
Sampling Distributions
Confidence Intervals for Proportions
Presentation transcript:

Sampling Theory and Surveys GV917

Introduction to Sampling In statistics the population refers to the total universe of objects being studied. Examples include: All voters in the UK All graduate students at the University of Essex These are finite populations, but we also meet infinite populations such as: All possible rolls of a six sided dice All possible turns of a roulette wheel

The Purpose of Sampling We take samples in order to: Estimate population characteristics or parameters – e.g. the average age of all voters in the UK Test hypotheses about a population eg Did 60 per cent of women turn out to vote in a general election?

Hypothetical Population Suppose we have a population consisting of five numbers: 3, 5, 7, 9, 11 The sum of this population is 35 and the mean is 7 (denoted µ) Now suppose we are trying to infer this population mean from a single random sample of size 2. How likely is it that we will infer the population mean correctly?

Samples of Size Two From a Population of Size Five Sample Number: _________________________________ _________________________________ Sum Mean

The Sampling Distribution – The likelihood of different samples occurring Sample Probability Sample*Probability Mean(x) p(x) p(x).X Mean of the Means E(X)= Σ 7.00 (the expected value)

A Simple Confidence Interval Estimate of the Population Mean _ Point Estimate µ = X (probability of being correct = 0.20) _ Interval Estimate µ = X + or – 1.0 (probability of being correct = 0.60) _ µ = X + or – 2.0 (probability of being correct = 0.80)

The Standard Deviation of the Sampling Distribution Sample Probability Deviations Deviations Deviations Mean (X) p(X) (X – E(X)) Squared Squared*Probability (X – E(X)) 2 p(X).(X-E(x)) Σ 3.00 Standard Error (σ x )= √ [Σ p(X).(X-E(x)) 2 ] (average error) = √3.0 = 1.73

Using the Standard Error in a Confidence Interval _ µ = X + or – standard error _ µ = X + or – 1.73 (probability of being correct = 0.60) A multiple of the Standard Error _ µ = X + or – 1.73 * 1.5 _ µ = X + or – 2.6 (probability of being correct = 0.80)

The Sampling Distribution with Large Samples – The Normal Distribution

Confidence intervals with the Normal Distribution µ = X + or – σ x [probability of being correct of 0.68] µ = X + or – 1.96*σ x [probability of being correct of 0.95] µ = X + or – 2.58*σ x [probability of being correct of 0.99]

But how can we know the standard error with only one sample? In practical applications we cannot calculate the sampling distribution directly because there are millions of possible samples of size say, 1,000, which can be taken from a population of 60 million (the approximate size of the UK population). A powerful theorem in statistics called the Central Limit Theorem enables us to infer the standard error from one sample only The intuition behind this is that large enough sample is going to provide a measure of the variability of all samples taken from a given population providing that any sample can be chosen Thus if a random sample is very variable, then different random samples taken from that population are going to be quite variable too. If a random sample is not very variable then it suggests that samples taken from the population will not vary much either

Calculating the Standard Error The theorem shows that: σ x = s/√n Where σ x is the standard error of the mean s is the sample standard deviation n is the sample size

A confidence Interval from the 2005 BES Feelings about Labour µ = X + or – 1.96*σ x [probability of being correct of 0.95] µ = or * (2.6017/√3517) µ = or µ = to (probability of being correct = 0.95)

Complications The calculation assumed that the BES is a simple random sample of the UK voting population, that is every adult in the country has an equal chance of being selected for the sample. But if we used a simple random sample respondents would be evenly spread across the country, involving a lot of travel time and costs for the interviewers. Costs can be reduced by ‘clustering’ the sample – that is choosing people who live relatively close together. This is done by sampling in stages – first constituencies, then wards and finally individuals. The accuracy of the sample can be improved by stratifying it – ensuring that groups appear in the sample exactly in the proportions as they appear in the population. In the 2005 general election 26.6 per cent of the seats had majorities less than 10 per cent – these were the marginal seats that decided the election. In a new sample there is an advantage in making sure that exactly 26.6 per cent of the constituencies are marginal seats. A simple random sample would not necessarily deliver this – it might deliver 25 per cent by chance. So we improve accuracy by replicating the known characteristics of the population. This is called stratifying by marginality. We might want to over-sample some groups because they have interesting political characteristics and a simple random sample would provide too few cases for analysis. This was done in Scotland in Scots make up about 9 per cent of the British population, but just over 25 per cent of the BES sample in 2005 came from Scotland, because we wanted enough cases to analyse Scottish politics, which is rather different from England. Of course any analysis of voters in Britain as a whole has to weight the sample, to make sure that the Scots are represented accurately.

Sampling in Practice – the BES 2005 The survey was designed to yield a representative sample of adults aged 18 or above living in private households in Britain (excluding the area north of the Caledonian Canal). Adults living in Northern Ireland were excluded from the study. The sample was drawn from the Postcode Address File, a list of addresses (or postal delivery points) compiled by the Post Office. For practical reasons, samples are confined to those living in private households. People living in institutions (though not in private households at such institutions) are excluded, as are households whose addresses are not on the Postcode Address File. The sampling method involved a clustered multi-stage design, with three separate stages of selection. In the first instance, 128 constituencies were sampled at random: 77 in England, 29 in Scotland and 22 in Wales, using stratification on marginality of election results, geographic regions and population density. (In Wales, percent Welsh-speakers was used instead of geographic region). Scottish and Welsh constituencies were over- sampled to achieve Scottish and Welsh boost samples. In England, marginal constituencies were slightly over-sampled. Within each constituency, two wards were sampled at random, giving 256 sample points. At each sample point (ward), addresses were selected with equal probability across the sample point. More addresses were selected in Scottish and Welsh sample points than in English ones (27 compared with 24) – again, in order to achieve Scottish and Welsh boost samples. Using random methods, the interviewer then selected one person for interview at each address.

Sample Precision The sample precision is measured by the size of the standard errors. If we stratify the sample this increases precision, (reduces the size of the standard error). If we cluster this decreases it Non-response can decrease precision if the non- respondents differ from the respondents – which they generally do. They tend to be less interested in politics and less likely to vote, so we need to weight the sample to correct for this source of bias

Response Rates in the 2005 BES pre-election survey N% Addresses issued6,450 Out of scope (eg derelict building)515 Eligible5, % Interview achieved3, % Interview not achieved because:2, % Refused1, % Not contacted (eg someone who moved without a forwarding address) % Other unproductive (eg too ill to talk to interviewers)2854.8%

Weighting in the BES The Scots are over-represented in the sample, so if we want to analyse Britain as a whole they have to be reduced in numbers or weighted. On average if each Scot in the sample counts only of a person this corrects their over-representation. Thus *933 = 318 Scots in the weighted sample, which is 8.8 per cent of the total of This is the correct proportion of Scots in Britain.

Unweighted Party Voting in 2010

Weighted Party Voting in 2010 (weighted for post-election analysis)

The Effects of Weighting The Actual Party Vote Shares in 2010 were: Labour 29.0% Conservatives 36.1% Liberal Democrats 23.0% Others 11.9% The weighted Conservative and Liberal Democrats vote shares are clearly more accurate than the unweighted ones

Conclusions Statistical Theory helps us to make inferences about populations from much smaller samples Inferences are possible because everyone in the population has a (small) chance of ending up in the sample – therefore the sample is representative In practice the calculation of sampling errors is complicated by various sample design factors aimed at making surveys less costly