Sample surveys and polls. YearSample size WinnerGallup prediction Election result Error 1936~50,000Roosevelt55.7% ↑62.5%-6.8% 1940~50,000Roosevelt52.0%

Slides:



Advertisements
Similar presentations
Response bias People respond differently to how they believe People respond differently to how they believe Deliberate bias Deliberate bias Do you agree.
Advertisements

VI. Sampling: (Nov. 2, 4) Frankfort-Nachmias & Nachmias (Chapter 8 – Sampling and Sample Designs) King, Keohane and Verba (Chapter 4) Barbara Geddes
Sampling.
Sampling Design Questions, questions, questions –Do you support U.S. role in Iraq?
AP Statistics!!! Test Review 2013.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. Sampling: Surveys and How to Ask Questions Chapter 4.
* Students will be able to identify populations and samples. * Students will be able to analyze surveys to see if there is bias. * Students will be able.
Literary Digest Poll 1936 election: Franklin Delano Roosevelt vs. Alf Landon Literary Digest had called the election since 1916 Sample size: 2.4 million!
STAT Section 5 Lecture 7 Professor Hao Wang University of South Carolina Spring 2012 TexPoint fonts used in EMF. Read the TexPoint manual before.
Causes of Error in Sampling. Sampling Error Sampling error is error caused by the way you chose your sample – Volunteer Sampling & Convenience Sampling.
Copyright ©2011 Brooks/Cole, Cengage Learning Sampling: Surveys and How to Ask Questions Chapter 5 1.
GATHERING AND PRODUCING DATA.
Chapter 12 Sample Surveys. At the end of this chapter, you should be able to Identify populations, samples, parameters and statistics for a given problem.
Chapter 4 How to get the Data Part1 n In the first 3 lectures of this course we spoke at length about what care we should take in conducting a study ourselves.
Chapter 12 Sample Surveys
Section 5.1. Observational Study vs. Experiment  In an observational study, we observe individuals and measure variables of interest but do not attempt.
Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. How to Get a Good Sample Chapter 4.
AP Statistics!!! Test Review Sampling Error ◦ Occurs in the act of choosing the sample ◦Undercoverage – certain members of population are ‘left.
Copyright © 2011 Pearson Education, Inc. Samples and Surveys Chapter 13.
Chapter 5 Section 3 Part 1.  Often when we hear of a sample we do not know the truth behind the sampling process.  The whole truth about opinion polls.
From Sample to Population Often we want to understand the attitudes, beliefs, opinions or behaviour of some population, but only have data on a sample.
C1, L2, S1 Design Method of Data Collection Surveys and Polls Experimentation Observational Studies.
Ch 4 - Designing Studies.
4.2 Statistics Notes What are Good Ways and Bad Ways to Sample?
Sampling 12/4/2012. Readings Chapter 8 Correlation and Linear Regression (Pollock) (pp ) Chapter 6 Foundations of Statistical Inference (Pollock)
Sampling Defined / The idea – Making inference about a larger population What is the population – Some particular value in the population estimating.
Copyright © 2009 Pearson Education, Inc. Publishing as Longman. The 1936 Literary Digest Presidential Election Poll Case Study: Special Topic Lecture Chapter.
 Sampling Design Unit 5. Do frog fairy tale p.89 Do frog fairy tale p.89.
Homework Read pages Page 467: 1 – 16, 29 – 34, 37, 38, 59.
Chapter 12 Designing Good Samples. Doubting the Holocaust? An opinion poll conducted in 1992 for the American Jewish Committee asked: Does it seem possible.
Measurements, Mistakes and Misunderstandings in Sample Surveys Lecture 1.
Section 1.2 ~ Sampling Introduction to Probability and Statistics Ms. Young.
DATA COLLECTION METHODS Sampling
Pitfalls of Surveys. The Literary Digest Poll 1936 US Presidential Election Alf Landon (R) vs. Franklin D. Roosevelt (D)
Sampling: Surveys and How to Ask Questions Chapter 4.
Sampling Design Notes Pre-College Math.
Sampling. Sampling Can’t talk to everybody Select some members of population of interest If sample is “representative” can generalize findings.
Random and Non-Random samples 12/3/2013. Readings Chapter 6 Foundations of Statistical Inference (Pollock) (pp )
Chapter 41 Sample Surveys in the Real World. Chapter 42 Thought Question 1 (from Seeing Through Statistics, 2nd Edition, by Jessica M. Utts, p. 14) Nicotine.
Statistics Section 2-3 Day 1 - Sampling in the Real World.
Level of Knowledge *. Most Important News Source, 1959 – 2005.
Random Samples 12/5/2013. Readings Chapter 6 Foundations of Statistical Inference (Pollock) (pp )
Bias in Sampling. Definitions Bias = where the results of the sample are not representative of the population Three sources of Bias in Sampling –Sampling.
SECTION 4.1. INFERENCE The purpose of a sample is to give us information about a larger population. The process of drawing conclusions about a population.
Statistics – OR 155 Section 1 J. S. Marron, Professor Department of Statistics and Operations Research.
I can identify the difference between the population and a sample I can name and describe sampling designs I can name and describe types of bias I can.
 An observational study observes individuals and measures variable of interest but does not attempt to influence the responses.  Often fails due to.
Unit 8: The Normal Distribution. Probability distributions The probability of an outcome in an interval is shown in an histogram as the area above that.
Inference for Sampling. The purpose of a sample is to give usinformation about a larger population. The process of drawing conclusions abouta population.
5.3: SAMPLING. Errors in Sampling Sampling Errors- Errors caused by the act of taking a sample. Makes sample results inaccurate. Random Sampling Error.
Causes of Error in Sampling. Sampling Error Sampling error is error caused by the way you chose your sample – Volunteer Sampling & Convenience Sampling.
Chapter 5 Sampling and Surveys. Section 5.3 Sample Surveys in the Real World.
Errors in Sampling Sampling Errors- Errors caused by the act of taking a sample. Makes sample results inaccurate. Random Sampling Error Errors caused by.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 13 Samples and Surveys.
Chapter 11 Sample Surveys. How do we gather data? Surveys Opinion polls Interviews Studies –Observational –Retrospective (past) –Prospective (future)
Introduction/ Section 5.1 Designing Samples.  We know how to describe data in various ways ◦ Visually, Numerically, etc  Now, we’ll focus on producing.
Ten percent of U. S. households contain 5 or more people
MATH Section 6.1. Sampling: Terms: Population – each element (or person) from the set of observations that can be made Sample – a subset of the.
Sources of Error In Sampling
Sampling.
Sources of Bias 1. Voluntary response 2. Undercoverage 3. Nonresponse
Section 5.1 Designing Samples
Bias On-Level Statistics.
Inference for Sampling
Chapter 4 Sampling Design.
MA151 Lecture 2: Sampling methods
Sampling Population – any well-defined set of units of analysis; the group to which our theories apply Sample – any subset of units collected in some manner.
Sample Surveys in the Real World
MATH 2311 Section 6.1.
Chapter 5: Producing Data
Presentation transcript:

Sample surveys and polls

YearSample size WinnerGallup prediction Election result Error 1936~50,000Roosevelt55.7% ↑62.5%-6.8% 1940~50,000Roosevelt52.0% ↑55.0%-3.0% 1944~50,000Roosevelt51.5% ↑53.8%-2.3% 1948~50,000Truman44.5% ↓49.5%-5.0% 19525,385Eisenhower51.0% ↑55.4%-4.4% 19568,144Eisenhower59.5% ↑57.8%+1.7% 19608,015Kennedy51.0% ↑50.1%+0.9% 19646,625Johnson64.0% ↑61.3%+2.7% 19684,414Nixon43.0% ↑43.5%-0.5% 19723,689Nixon62.0% ↑61.8%+0.2% 19763,439Carter48.0% ↓50.1%-2.1% 19803,500Reagan47.0% ↑50.8%-3.8% 19843,456Reagan59.0% ↑59.2%+0.2% 19884,089Bush56.0% ↑53.9%+2.1% 19922,019Clinton49% ↑43.3%+5.7% ,417Clinton52.0% ↑50.1%+1.9% 20003,129Bush48.0% ↑47.9%+0.1% 20041,866Bush49.0% ↔51.0%-2.0%

Some classic mistakes The Literary Digest Poll 1936 presidential election: Franklin Delano Roosevelt vs. Alf Landon Literary Digest had called every presidential election since 1916 Sample size: 2.4 million! They predicted Roosevelt would lose by 43% In fact it was a landslide for Roosevelt at 62%

Literary Digest poll Context –Midst of the Great Depression –9 million unemployed; real income down 1/3 –Landon: “Cut spending” –Roosevelt: “Balance peoples’ budgets before government’s budget” How the polling was done –Survey sent to 10 million people –2.4 million responded (huge!)

Literary Digest poll was biased Sampling frame not representative –Phone numbers, subscriptions lists, drivers’ registrations, country club memberships –Lists not representative –Telephones were a luxury –Biased toward better off groups (and more Republican) –Selection bias and non-response bias Voluntary response bias –Main issue was the economy –The anti-Roosevelt forces were angry---and had a higher response rate!

Young pollster George Gallup used a sample of 3,000 of the 2.4 million responses to reproduce the Literary Digest’s prediction Then, by using a completely different sample of 50,000, Gallup predicted 56% for Roosevelt and 44% for Landon Roosevelt received 62% of the vote Gallup used random sampling methods Despite improve, note the bias against the Democratic candidates from 1936 to 1948 This had disastrous consequences in 1948 Beginning of the Gallup Poll and scientific sampling methods

The Year the Polls Elected Dewey 1948 Election: Harry Truman versus Thomas Dewey Every major poll (including Gallup) predicted Dewey would win by 5 percentage points

What went wrong? Pollsters chose their samples using quota sampling Each interviewer assigned a fixed quota of subjects in certain categories (race, sex, age) E.g., a Gallup Poll interviewer in St. Louis was required to interview 13 people, of whom –6 live in the suburb, 7 in the central city –7 men and 6 women; Over the 7 men (similar for women): 3 under 40 years old, 4 over 40 1 Black, 6 white Even monthly rentals paid by the subjects were specified In each category, interviewers free to choose Left room for human choice and inevitable bias Republicans were easier to reach –Had telephones, permanent addresses, “nicer” neighborhoods Interviewers ended up with too many Republicans Quota sampling abandoned for random sampling

How surveys can get it wrong Sampling error –Errors caused by taking a sample (versus census) Random sampling error –Deviation between statistic and parameter –Error due to chance inevitable with random sample –Margin of error in confidence statement includes only random sampling error Non-sampling error –Errors not related to act of selecting a sample –Could happen in a census Distinction between sampling error and non-sampling error: could it happen in a census

Sampling error Most common form is undercoverage Sampling frame leaves out parts of the population Using telephone directories for phone survey –Half the households in large cities are unlisted –About 5% of households without phones Random digit dialing –Misses students in dorms, inmates in prison, soldiers in the military, homeless people –Too expensive to call Hawaii and Alaska

Nonsampling error From the Gannett News Service, Lafayette Journal and Courier, Nov. 24, 1983 Initial release of income data from 1980 census showed Stumpy Point, North Carolina (pop. 205) with median household income $84,413 Income from census forms entered in tens of dollars. $8000 is entered is “0800”. Many incomes incorrectly entered as “8000”. Computer read it as $80,000. Example of processing error Response error

Nonsampling error: nonresponse Serious problem facing sample surveys Common for opinion polls and market research studies to have 75% to 80% nonresponse rate Current Population Survey (US Bureau of Labor Statistics and Census Bureau): 6-7% nonresponse rate General Social Survey (U of Chicago): –Run by university –Contacts people in person, goes house to house –Many advantages –24% nonresponse rate

Wording the question Do you agree? (From The New York Times, April, 1982) –(1) “A freeze in nuclear weapons should be opposed because it would do nothing to reduce the danger of thousands of nuclear weapons already in place and would leave the Soviet Union in a position of nuclear superiority.” –(2) “A freeze in nuclear weapons should be favored because it would begin a much-needed process to stop everyone in the world from building nuclear weapons now and reduce the possibility of nuclear war in the future.” Results: 58% agreed with (1). 56% agreed with (2), and 27% agreed with both!

Open versus closed questions “What do you think is the most important problem facing the country today?” “Which of the following do you think is the most important problem facing the country today---the energy shortage, the quality of public schools, legalized abortion, or pollution---or, if you prefer, you may name a different problem as most important.” –From “Problems in the use of survey questions to measure public opinion,” Science, Volume 236 (1987)

Open versus closed questions Results of 171 responses to open question and 178 responses to closed question ProblemOpenClosed Energy0.0%5.6% Schools1.2%32.0% Abortion0.0%8.4% Pollution1.2%14.0% Others93.0%39.3% Don’t know 4.7%0.6%

Response bias People respond differently to how they believe Deliberate bias –“Do you agree that abortion, the murder of innocent beings, should be outlawed?” Unintentional bias –“Do you or do you not use drugs?” People often want to please the interviewer –“Do you think your professor is doing a good job teaching statistics?” Affected by sex, attire, race, behavior of interviewer Wording, Ordering, Complexity of Questions

Another type of response bias “Some people say that the 1975 Public Affairs Act should be repealed. Do you agree or disagree that it should be repealed.” Washington Post, Feb Results: For repeal: 24%, Against repeal: 19%, No opinion: 57% No such thing as the Public Affairs Act!

How to cope with errors: weighting the sample “The sample first was weighted to take into account unequal probabilities of selection from sampling: Weighting accounts for the number of telephones going into the household, and household size. It then was weighted for age, gender, and education to take care of minor fluctuations in the sample, and align it with the findings of the 2000 Census of the adult population. It is assumed to be representative of all Minnesota households with telephones, within the margin of sampling error.” – How the Poll was Conducted, Minneapolis Star Tribune

Weighting responses in a sample Weighting responses is common method to deal with non- response Example for a telephone poll: Suppose women are twice as likely to answer the phone as men Then weight survey results by multiplying women’s responses by ½. For instance: “Will you vote for X”? –Responses: 150 men: (90 Yes, 60 No) – 300 women: (100 Yes, 200 No) After weighting: –150 men: 90 Yes, 60 No –150 women: 50 Yes, 100 No Report sample proportion of (90+50) /300 = 46.67% In practice, it’s very complicated

Stratified sampling More complex sampling methods to insure better representation Goal: Random sample of 240 Carleton students To insure discipline representation divide into strata according to population –Arts and Literature 20% –Humanities 15% –Social Sciences 30% –Mathematics and Natural Sciences 35% Within each discipline, choose at random Choose 240 x.20 = 48 Arts and Lit students 240 x.15 = 36 Humanities 240 x.30 = 72 Social science 240 x.35 = 84 Math and natural

Stratified sampling Advantages: Sample will be representative for the strata; Can gain precision of estimate Disadvantages: Logistically difficult; must know about the population; May not be possible Note that technically a stratified sample is not a simple random sample Every possible group of 240 students is not equally likely to be selected

Cluster sampling Warehouse contains 10,000 window frames stored on pallets Each pallet contains 20 to 30 window frames Goal: Estimate how many window frames have wood rot Would like to sample about 500 frames Cluster sample –Sample pallets, not windows. Choose, say 20. –Include in sample all the windows on each pallet

Cluster sampling Door-to-door surveys – City blocks are the clusters Survey farms throughout the Midwest on pesticide use –Counties are the clusters Airlines get customer opinions –Individual flights are the clusters Advantage: Much easier to implement depending on context Disadvantage: Greater sampling variability; less statistical accuracy

Current Population Survey: Multistage cluster sampling Countries divided into 2,007 Primary Sampling Units Stage 1: 792 PSUs chosen (but not quite at random) –432 highly populated PSUs (like Chicago and LA) are automatically in the sample PSUs divded into smaller census blocks Blocks grouped into strata Households in each block grouped into clusters of about 4 households each Final sample consists of clusters and interviewers go to all households in the chosen clusters Offers some of the advantages of quota sampling but with no selection bias

How to evaluate a poll or survey Who carried out and funded the survey? What is the population? How was the sample selected? –Random methods? How large was the sample? –What’s the margin of error? What was the response rate? How were subjects contacted? When was the survey conducted? What are the exact questions asked?