From Sample to Population Often we want to understand the attitudes, beliefs, opinions or behaviour of some population, but only have data on a sample.

Slides:



Advertisements
Similar presentations
Statistics for Managers Using Microsoft® Excel 5th Edition
Advertisements

* Students will be able to identify populations and samples. * Students will be able to analyze surveys to see if there is bias. * Students will be able.
STAT Section 5 Lecture 7 Professor Hao Wang University of South Carolina Spring 2012 TexPoint fonts used in EMF. Read the TexPoint manual before.
Chapter 19: Confidence Intervals for Proportions
Why sample? Diversity in populations Practicality and cost.
Sampling Distributions
7-1 Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall Chapter 7 Sampling and Sampling Distributions Statistics for Managers using Microsoft.
A P STATISTICS LESSON 9 – 1 ( DAY 1 ) SAMPLING DISTRIBUTIONS.
Chapter 41 Sample Surveys in the Real World. Chapter 42 Thought Question 1 (from Seeing Through Statistics, 2nd Edition, by Jessica M. Utts, p. 14) Nicotine.
Chapter 7 Sampling Distributions
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 7 Sampling Distributions 7.1 What Is A Sampling.
Statistics: Concepts and Controversies What Do Samples Tell Us?
Sampling : Error and bias. Sampling definitions  Sampling universe  Sampling frame  Sampling unit  Basic sampling unit or elementary unit  Sampling.
COLLECTING QUANTITATIVE DATA: Sampling and Data collection
Sampling. Concerns 1)Representativeness of the Sample: Does the sample accurately portray the population from which it is drawn 2)Time and Change: Was.
BPS - 5th Ed. Chapter 81 Producing Data: Sampling.
Producing Data: Sampling BPS - 5th Ed.Chapter 81.
CHAPTER 8 Producing Data: Sampling BPS - 5TH ED.CHAPTER 8 1.
 The situation in a statistical problem is that there is a population of interest, and a quantity or aspect of that population that is of interest. This.
1 Stat 1510 Statistical Thinking & Concepts Producing Data: Sampling.
Statistics: Concepts and Controversies What Is a Confidence Interval?
PARAMETRIC STATISTICAL INFERENCE
Copyright ©2011 Pearson Education 7-1 Chapter 7 Sampling and Sampling Distributions Statistics for Managers using Microsoft Excel 6 th Global Edition.
Chapter 7: Data for Decisions Lesson Plan Sampling Bad Sampling Methods Simple Random Samples Cautions About Sample Surveys Experiments Thinking About.
Chapter 41 Sample Surveys in the Real World. Chapter 42 Thought Question 1 (from Seeing Through Statistics, 2nd Edition, by Jessica M. Utts, p. 14) Nicotine.
MAT 1000 Mathematics in Today's World. Last Time 1.Two types of observational study 2.Three methods for choosing a sample.
Chapter 7 The Logic Of Sampling. Observation and Sampling Polls and other forms of social research rest on observations. The task of researchers is.
BPS - 5th Ed. Chapter 81 Producing Data: Sampling.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 7-1 Chapter 7 Sampling Distributions Basic Business Statistics.
A.P. STATISTICS LESSON SAMPLE PROPORTIONS. ESSENTIAL QUESTION: What are the tests used in order to use normal calculations for a sample? Objectives:
Chapter 8: Estimating with Confidence
1 Chapter 2: Sampling and Surveys. 2 Random Sampling Exercise Choose a sample of n=5 from our class, noting the proportion of females in your sample.
Chapter 19 Confidence intervals for proportions
Chapter 10 Sampling: Theories, Designs and Plans.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 7-1 Chapter 7 Sampling and Sampling Distributions Basic Business Statistics 11 th Edition.
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. PPSS The situation in a statistical problem is that there is a population of interest, and a quantity or.
Parameter or statistic? The mean income of the sample of households contacted by the Current Population Survey was $60,528.
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 7: Sampling Distributions Section 7.1 What is a Sampling Distribution?
Unit 7: Sampling Distributions
Basic Business Statistics
Chapter 7 Data for Decisions. Population vs Sample A Population in a statistical study is the entire group of individuals about which we want information.
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION Sample Knowledge of students No. of red blood cells in a person Length of.
Chapter 5 Sampling and Surveys. Section 5.3 Sample Surveys in the Real World.
7.1 What is a Sampling Distribution? Objectives SWBAT: DISTINGUISH between a parameter and a statistic. USE the sampling distribution of a statistic to.
Plan for Today: Chapter 1: Where Do Data Come From? Chapter 2: Samples, Good and Bad Chapter 3: What Do Samples Tell US? Chapter 4: Sample Surveys in the.
Chapter 31 What Do Samples Tell Us?. Chapter 32 Thought Question 1 During a medical exam, the doctor measures your cholesterol two times. Do you think.
Essential Statistics Producing Data: Sampling
Sampling Why use sampling? Terms and definitions
CHAPTER 7 Sampling Distributions
Chapter 7: Sampling Distributions
Daniela Stan Raicu School of CTI, DePaul University
Essential Statistics Producing Data: Sampling
Chapter 7: Sampling Distributions
Chapter 7: Sampling Distributions
Chapter 7: Sampling Distributions
Chapter 7: Sampling Distributions
Chapter 9: Sampling Distributions
Chapter 7: Sampling Distributions
Chapter 7: Sampling Distributions
Chapter 7: Sampling Distributions
Basic Practice of Statistics - 5th Edition Producing Data: Sampling
Chapter 7: Sampling Distributions
Chapter 7: Sampling Distributions
What do Samples Tell Us Variability and Bias.
Sampling Distributions
The Practice of Statistics – For AP* STARNES, YATES, MOORE
Chapter 7: Sampling Distributions
Chapter 7: Sampling Distributions
Chapter 7: Sampling Distributions
Chapter 7: Sampling Distributions
Presentation transcript:

From Sample to Population Often we want to understand the attitudes, beliefs, opinions or behaviour of some population, but only have data on a sample from that population.  e.g. Want to know Proportion of U.S. adults confident in president Obama's handling of the economy. Only have survey data from n=1027 respondents. Gallup April 2009 data: 71% surveyed have “great deal/fair amount of confidence” How do we move from the known sample statistic, call it s, to the unknown population parameter, call it p? Can we say p=71%? How accurate? How reliable? How confident? Error margin? What sorts of errors may be involved?

From Sample to Population We discuss:  Parameter versus Statistic  Bias and Variability  Margin of Error  Confidence Statements  Error types/sources (sampling errors and non-sampling errors); Sampling designs.

From Sample to Population Parameter  fixed, unknown number that describes some characteristic of the population Statistic  known value calculated from a sample  a statistic is used to estimate a parameter Two major issues in estimating p from s   Bias: in repeated samples, the sample statistic consistently misses the population parameter in the same direction (e.g. Sampling frame wrong, under-coverage)‏  Variability: different samples from the same population may yield different values of the sample statistic  Want to minimize both

Bias and Variability Figure 3.3 Bias and variability in shooting arrows at a target. Bias means the archer systematically misses in the same direction. Variability means that the arrows are scattered.

Bias and Variability

To reduce bias, use random sampling  We've seen how “bad” samples can result from convenience sampling and voluntary response samples, leading to bias in estimation results. To reduce variability, use larger samples  estimate from a random sample will be closer to the true population parameter if the sample is larger. (In the limit it would be a census.) Estimates from larger samples differ less from one another (in the limit there is no variation)

The Effect of Sample Size: Sampling Distribution for n=100 Figure 3.1 The results of many SRSs have a regular pattern. Here, we draw 1000 SRSs of size 100 from the same population. The population proportion is p = 0.5. The sample proportions vary from sample to sample, but their values center at the truth about the population.

The Effect of Sample Size: Sampling Distribution for n=2527 Figure 3.2 Draw 1000 SRSs of size 2527 from the same population as in Figure 3.1. The 1000 values of the sample proportion are much less spread out than was the case for smaller samples.

Margin of Error The sample statistic is unlikely to be identical to the population parameter. What's the error margin? (Two elements: the error, and the confidence)‏

Margin of Error Assuming random sampling, two components: Variation in the population,  we'll come back to this later on). The larger the  the less accurate is the sample statistic as an estimate under a given sample size.  Sample size. Relation: error margin proportional to  sqrt(n). (The quick approximate method below is nearly exact for p=1/2.) 

Confidence Statement “95%” confidence: Standard. But can also use other levels, such as 99%. (What can we do, in terms of error margin and sample size, to increase confidence level?)‏ Exactly how do we get the confidence statement? Need knowledge of the sampling distribution. (More later)‏

What the Margin of Error Doesn't Say Under coverage, convenience and voluntary sampling bias are examples of sampling errors Non-response, problems in survey question construction and response errors are examples of non-sampling errors.

Non-response

Some Issues in Survey Design Induced bias:  “If you found a wallet with $20 in it, would you do the right thing and return the money?” Question ordering:  “How often do you normally go out on a date? about ___ times a month”  “How happy are you with life in general?”  (Induces association of the questions)‏ Complex question:  Do you sometimes find that you have arguments with your family members and co-workers?  (If one has arguments only with family members, should he answer “yes” or “no”?)

Who carried out the survey? What was the population? How was the sample selected? How large was the sample and what was the margin of error? What was the response rate? How were the subjects contacted? When was the survey conducted? What were the exact questions asked? See, e.g. Pew Research Center: Questions to Ask Before You Believe a Poll

“Random undergraduate classroom survey of n=810 students was conducted by the Office of Health Promotion within the University Student Health Services, Division of Student Affairs. Statistics from this survey led to the following conclusions: - most students (67%) have 0-4 drinks when they go out - most (69%) have had 0-1 sex partners in the past year - most (76%) either don’t drink, or use designated drivers if they do” What questions should you ask to help you assess the credibility of these results? Example: “University Students are Healthier than You Think”

Probability Sampling Plans So far we've been focusing on simple random sampling. In the real world, many surveys use more complex sampling designs (in order to save resources, ensure representation of certain groups, etc.)‏ e.g. Stratify on race for a survey on racial relations on campus. (e.g., you might draw 10% of black students, 1% of white students) Simple random sampling and stratified sampling are both examples of probability sampling, in which the probability of each individual being selected is known, even though the probabilities may not be equal. Weighting may be used to make the sample from a complex plan to mimic a simple random sample.

Another example of probability sampling Divide the population of interest into groups Randomly select some of those groups Divide the resulting collection of individuals into smaller groups Randomly select some of those groups Continue dividing the resulting collection of individuals into groups and randomly selecting some of those groups until you can simply list all of the resulting individuals and randomly select n of them for your sample Multistage Sample

Example: Selecting 1500 registered U.S. voters [Use multistage sampling since we don't have a sampling frame (list) of all registered U.S. voters.]  randomly select five U.S. states  obtain a list of all counties/cities in those states  randomly select 20 of those counties/cities  obtain a list of all registered voters in those 20 counties/cities  randomly select 1500 voters from that list