Understanding sample survey data

Slides:



Advertisements
Similar presentations
Mean, Proportion, CLT Bootstrap
Advertisements

Sampling: Final and Initial Sample Size Determination
RESEARCH METHODOLOGY & STATISTICS LECTURE 6: THE NORMAL DISTRIBUTION AND CONFIDENCE INTERVALS MSc(Addictions) Addictions Department.
Chapter 10: Sampling and Sampling Distributions
Topics: Inferential Statistics
1 Trust and divorce Separated or Divorced trust | No Yes | Total Low | | 247 | |
Estimation from Samples Find a likely range of values for a population parameter (e.g. average, %) Find a likely range of values for a population parameter.
Sampling Distributions
© 2013 Pearson Education, Inc. Active Learning Lecture Slides For use with Classroom Response Systems Introductory Statistics: Exploring the World through.
Formalizing the Concepts: Simple Random Sampling.
How to calculate Confidence Intervals and Weighting Factors
Estimation 1.Appreciate the importance of random sampling 2.Understand the concept of estimation from samples 3.Understand the Central Limit Theorem 4.Be.
Standard error of estimate & Confidence interval.
1. Homework #2 2. Inferential Statistics 3. Review for Exam.
Section #4 October 30 th Old: Review the Midterm & old concepts 1.New: Case II t-Tests (Chapter 11)
Sampling: Theory and Methods
Many times in statistical analysis, we do not know the TRUE mean of a population of interest. This is why we use sampling to be able to generalize the.
Chapter Nine Copyright © 2006 McGraw-Hill/Irwin Sampling: Theory, Designs and Issues in Marketing Research.
Chapter 11: Estimation Estimation Defined Confidence Levels
Learning Objectives Copyright © 2004 John Wiley & Sons, Inc. Sample Size Determination CHAPTER Eleven.
Sampling Distributions
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 8-1 Confidence Interval Estimation.
QBM117 Business Statistics Estimating the population mean , when the population variance  2, is known.
Introduction to Statistical Inference Chapter 11 Announcement: Read chapter 12 to page 299.
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
Understanding sample survey data. Underlying Concept A sample statistic is our best estimate of a population parameter If we took 100 different samples.
Chapter 11 – 1 Chapter 7: Sampling and Sampling Distributions Aims of Sampling Basic Principles of Probability Types of Random Samples Sampling Distributions.
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 7: Sampling and Sampling Distributions
CONFIDENCE INTERVAL It is the interval or range of values which most likely encompasses the true population value. It is the extent that a particular.
Statistics PSY302 Quiz One Spring A _____ places an individual into one of several groups or categories. (p. 4) a. normal curve b. spread c.
Confidence Intervals: The Basics BPS chapter 14 © 2006 W.H. Freeman and Company.
Determination of Sample Size: A Review of Statistical Theory
Chapter 7 Sampling and Sampling Distributions ©. Simple Random Sample simple random sample Suppose that we want to select a sample of n objects from a.
Sampling Distribution WELCOME to INFERENTIAL STATISTICS.
Chapter 8 Confidence Intervals 8.1 Confidence Intervals about a Population Mean,  Known.
Chapter Thirteen Copyright © 2004 John Wiley & Sons, Inc. Sample Size Determination.
Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Pearson Prentice-Hall, Inc.Chap 7-1 Statistics for Managers Using Microsoft® Excel 5th Edition.
STA Lecture 171 STA 291 Lecture 17 Chap. 10 Estimation – Estimating the Population Proportion p –We are not predicting the next outcome (which is.
ICCS 2009 IDB Workshop, 18 th February 2010, Madrid 1 Training Workshop on the ICCS 2009 database Weighting and Variance Estimation picture.
Confidence Intervals (Dr. Monticino). Assignment Sheet  Read Chapter 21  Assignment # 14 (Due Monday May 2 nd )  Chapter 21 Exercise Set A: 1,2,3,7.
Learning Objective Chapter 12 Sample Size Determination Copyright © 2000 South-Western College Publishing Co. CHAPTER twelve Sample Size Determination.
Statistics and Quantitative Analysis U4320 Segment 5: Sampling and inference Prof. Sharyn O’Halloran.
What is a Confidence Interval?. Sampling Distribution of the Sample Mean The statistic estimates the population mean We want the sampling distribution.
Summarizing Risk Analysis Results To quantify the risk of an output variable, 3 properties must be estimated: A measure of central tendency (e.g. µ ) A.
Sampling Theory and Some Important Sampling Distributions.
ICCS 2009 IDB Seminar – Nov 24-26, 2010 – IEA DPC, Hamburg, Germany Training Workshop on the ICCS 2009 database Weights and Variance Estimation picture.
POLS 7000X STATISTICS IN POLITICAL SCIENCE CLASS 5 BROOKLYN COLLEGE-CUNY SHANG E. HA Leon-Guerrero and Frankfort-Nachmias, Essentials of Statistics for.
Measuring change in sample survey data. Underlying Concept A sample statistic is our best estimate of a population parameter If we took 100 different.
1 Probability and Statistics Confidence Intervals.
Chapter Eleven Sample Size Determination Chapter Eleven.
Sampling Design and Analysis MTH 494 LECTURE-11 Ossam Chohan Assistant Professor CIIT Abbottabad.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Sampling Design and Procedure
CHAPTER 6: SAMPLING, SAMPLING DISTRIBUTIONS, AND ESTIMATION Leon-Guerrero and Frankfort-Nachmias, Essentials of Statistics for a Diverse Society.
Sampling and Sampling Distributions
Chapter 7 Review.
Chapter 6 Inferences Based on a Single Sample: Estimation with Confidence Intervals Slides for Optional Sections Section 7.5 Finite Population Correction.
ESTIMATION.
Sample Size Determination
Introduction to Sampling Distributions
Statistics in Applied Science and Technology
CONCEPTS OF ESTIMATION
Random sampling Carlo Azzarri IFPRI Datathon APSU, Dhaka
Calculating Probabilities for Any Normal Variable
Statistics PSY302 Review Quiz One Spring 2017
Determining Which Method to use
Chapter 4 (cont.) The Sampling Distribution
How Confident Are You?.
Presentation transcript:

Understanding sample survey data

Underlying Concept A sample statistic is our best estimate of a population parameter If we took 100 different samples from the same population to measure, for example, the mean height of men, we would get a 100 different estimates of mean height. The mean of these means would be very close to the real population mean.

Population of men (Population Size = N)

Sample of men from population (Sample size = n) We take a sample from the population and measure the heights of all our sample members. The mean height from this sample is 174.4 cm.

Sample of men from population (Sample size = n) We take a another sample from the same population and measure the heights of all sample members. The mean height from this sample is 165.9 cm.

Sample of sample means We take a another 100 samples from the same population and measure the heights of all sample members. Sample 1 mean was 174.4 cm Sample 2 mean was 165.9 cm Sample 3 mean was 171.0 cm Sample 4 mean was 175.2 cm Sample 5 mean was 162.8 cm Etc

Sample of sample means We don’t ever take hundreds of samples. We just take 1. The concept of the mean of sample means is central to all survey statistics. The central limit theorem says that if we took a sufficiently large number of samples, the mean of the sample means would be normally distributed. This is true even if the thing we are measuring is not normally distributed. The central limit theorem can be proved mathematically. It is the basis of how we calculate our required sample size and how we calculate confidence intervals around our estimates……………….

Variance, standard deviation and standard error Variance = the sum of squared differences from the mean divided by n-1 Variance = 30 / 4 = 7.5 Standard deviation = the square root of the variance SD = √ variance = √7.5 = 2.74 Standard error = the square root of the variance divided by the sample size SE = √ (variance / n) = √ (7.5 / 5) = 1.22 Sample values (n=5) Difference from mean Squared difference from the mean 172 171 - 172 = -1 -1 x -1 = 1 169 2 4 168 3 9 175 -4 16 171 Mean = 171 Sum = 0 Sum of squares = 30

Standard Error The standard error is our best estimate of the standard deviation of the sample means. In other words if we took 100 samples from the same population and got 100 estimates of men’s mean height, the standard deviation of that mean is the standard error.

Confidence Intervals Because the means of sample means are normally distributed, we can use the characteristics of the normal distribution to look at our mean and standard error. We know that in a normal distribution 68.3% of values fall within one standard deviation of the mean and 95% fall within 1.96 standard deviations of the mean. So 1.96 times the standard error gives us the 95% confidence limits. Our standard error is 1.22. 1.96 x 1.22 = 2.4 Our sample mean is 171.0 171.0 – 2.4 = 168.6 171.0 + 2.4 = 173.4 So.. If we took 100 samples, 95 of them would have a mean somewhere between 168.6 and 173.4. Or… we can be 95% confident that the true mean (the population mean) lies between 168.6 and 173.4.

It works the same for proportions The 95% confidence interval around a proportion is 1.96 times the standard error of the estimate. The standard error of a proportion is √ ( (p (100-p)) / n ) Where p is the percentage and n is the sample size. So if we estimate that 75% of people prefer dogs from a sample of 45, p=75 and n= 45. = √ (( 75 x (100-75)) / 45 ) = √ ( (75 x 25) / 45 ) = √ ( 1875 / 45 ) = √ 42.7 = 6.5 75 – 6.5 = 68.5 and 75 + 6.5 = 81.5 So.. if we took 100 samples, 95 of them would have a percentage somewhere between 68.5 and 81.5. Or… we can be 95% confident that the true percentage of people who prefer dogs (the population percentage) lies between 68.5 and 81.5.

Sample Size We can use our understanding of confidence intervals to decide how big we need our sample to be First we think through what inferences about the population we are going to be making what level of uncertainty we can live with

For example We decide to conduct a survey to find out how many people in Scotland believe in the Loch Ness monster We define the population and source an appropriate sampling frame from which we will take a simple random sample We decide we want to be 95% confident that our estimate will be accurate to 3 percentage points The confidence intervals for proportions are widest for a 50% estimate We have no good reason to expect the proportion of people believing in the Loch Ness monster will be much more or less than 50% so we will use that as our basis

Excel spreadsheet example

Design effects If the sample is not a simple random sample then an adjustment will need to be made to the standard error Proportionate stratification will decrease the standard error Disproportionate stratification will increase the standard error Clustering will increase the standard error See PEAS website for information about design effects http://www2.napier.ac.uk/depts/fhls/peas/index.htm

Finite Population Correction If the sample size is a large proportion of the population size (>5%) then applying the finite population correction will reduce the standard error

Weighting and Grossing Factors How many people in the population the sample respondent represents Weights are used to alter proportions (e.g. to adjust for unequal selection probabilities or non-response) Grossing factors gross up to the total population number Often combined

Weighting & grossing factor For example Achieved sample Known population Weighting factor Grossing factor Weighting & grossing factor Men 150 (36%) 4,500 (45%) 45 / 36 = 1.25 4,500 / 150 = 30 Women 270 (64%) 5,400 (55%) 55 / 64 = 0.86 5,400 / 270 = 20 Total 420 9,900 9,900 / 420 = 23.6