THE EFFECT OF SAMPLING BIAS ON BIG DATA BY USING THE READERS DIGEST POLL OF THE 1936 ELECTION AS A CASE STUDY, WE EXAMINE HOW THE SAMPLE OF DATA USED AFFECTS.

Slides:



Advertisements
Similar presentations
Where do data come from and Why we don’t (always) trust statisticians.
Advertisements

By:Heather Gubser Objective 26i
How to survey data without adding bias.
GrowingKnowing.com © Sampling A sample is a subset of the population In a sample, you study a few members of the population In a census, you study.
Drawing Samples in “Observational Studies” Sample vs. the Population How to Draw a Random Sample What Determines the “Margin of Error” of a Poll?
Literary Digest Poll 1936 election: Franklin Delano Roosevelt vs. Alf Landon Literary Digest had called the election since 1916 Sample size: 2.4 million!
About BIAS…. Bias A systematic error in measuring the estimateA systematic error in measuring the estimate favors certain outcomesfavors certain outcomes.
Section Decision Making with Data  NOT ALL DATA IS GOOD DATA!  “Do not put faith in what statisticians say until you have carefully considered.
Chapter 4 How to get the Data Part1 n In the first 3 lectures of this course we spoke at length about what care we should take in conducting a study ourselves.
Chapter 12 Sample Surveys
LT 4.1—Sampling and Surveys Day 3 Notes--Bias
The number of minutes each of 26 students in a class spent to complete an obstacle course is shown below. 5,2,5,5,8,12,6,7,5,5,6,5,5,5,6,10,7,5,5,7,5,7,5,7,6,6.
How We Form Political Opinions Political Opinions Personal Beliefs Political Knowledge Cues From Leaders.
A P STATISTICS LESSON 9 – 1 ( DAY 1 ) SAMPLING DISTRIBUTIONS.
Dear Readers, If you had it to do all over again, would you have children? Ann Landers Ann Landers posed the question to the readers of her advice column.
Copyright © 2011 Pearson Education, Inc. Samples and Surveys Chapter 13.
Statistical Inference: Which Statistical Test To Use? Pınar Ay, MD, MPH Marmara University School of Medicine Department of Public Health
C1, L2, S1 Design Method of Data Collection Surveys and Polls Experimentation Observational Studies.
4.2 Statistics Notes What are Good Ways and Bad Ways to Sample?
THE WHO AND HOW. Opinion Polling. Who does polling? News organizations like CNN, Fox News, ABC, and NBC. Polling organizations like Rasmussen, Gallup,
SAMPLING Nuances of sample size determination Brett Oppegaard, Washington State University Vancouver Language, Texts and Technology, Spring 2011.
Sampling Defined / The idea – Making inference about a larger population What is the population – Some particular value in the population estimating.
Excursions in Modern Mathematics, 7e: Copyright © 2010 Pearson Education, Inc. 13 Collecting Statistical Data 13.1The Population 13.2Sampling.
Copyright © 2009 Pearson Education, Inc. Publishing as Longman. The 1936 Literary Digest Presidential Election Poll Case Study: Special Topic Lecture Chapter.
 Sampling Design Unit 5. Do frog fairy tale p.89 Do frog fairy tale p.89.
Political Science 30: Political Inquiry Drawing a Good Sample.
Homework Read pages Page 467: 1 – 16, 29 – 34, 37, 38, 59.
Public Opinion? Chapter 7. Public Opinion and Democracy If we are a government “of, by, and for the people” why do they do so many things we don’t agree.
Vegas Baby A trip to Vegas is just a sample of a random variable (i.e. 100 card games, 100 slot plays or 100 video poker games) Which is more likely? Win.
Measurements, Mistakes and Misunderstandings in Sample Surveys Lecture 1.
American Government and Politics Today Chapter 6 Public Opinion and Political Socialization.
DATA COLLECTION METHODS Sampling
Designing Social Inquiry week 4 I36005 Soohyung Ahn Case Study 1936 PRESIDENTIAL ELECTION : Roosevelt VS Landon.
Decision Making with Data Section 8.4. Evaluate data collection procedures Sample size Random assignment Validity –Did the test measure what it was supposed.
Pitfalls of Surveys. The Literary Digest Poll 1936 US Presidential Election Alf Landon (R) vs. Franklin D. Roosevelt (D)
Chapter 12 Sample Surveys *Sample *Bias *Randomizing *Sample Size.
Sampling Design Notes Pre-College Math.
Sampling. Sampling Can’t talk to everybody Select some members of population of interest If sample is “representative” can generalize findings.
Chapter 7: Data for Decisions Lesson Plan Sampling Bad Sampling Methods Simple Random Samples Cautions About Sample Surveys Experiments Thinking About.
Chapter 12 Sample Surveys
Using Ratios to Predict Outcomes
Making Inferences. Sample Size, Sampling Error, and 95% Confidence Intervals Samples: usually necessary (some exceptions) and don’t need to be huge to.
Chapter 2 Lesson 2.2a Collecting Data Sensibly 2.2: Sampling.
STT 421 Day 7: September 28, 2015 September 28, 2015
Section 2.1: Taking a Good Sample. Sampling Design  Design of a sample refers to the method used to collect the data.  A proper sampling design must.
Bias in Sampling. Definitions Bias = where the results of the sample are not representative of the population Three sources of Bias in Sampling –Sampling.
Political Beliefs and Public Opinion. Political efficacy The belief that one’s political participation really matters.
SECTION 4.1. INFERENCE The purpose of a sample is to give us information about a larger population. The process of drawing conclusions about a population.
Statistics – OR 155 Section 1 J. S. Marron, Professor Department of Statistics and Operations Research.
 Elections: The voice of the people. › Frequently interpreted as voters acceptance or rejection of a party platform. › Affected by many factors and give.
7: The Logic of Sampling. Introduction Nobody can observe everything Critical to decide what to observe Sampling –Process of selecting observations Probability.
 An observational study observes individuals and measures variable of interest but does not attempt to influence the responses.  Often fails due to.
Types of Bias How to pick the right sample. What is bias? Bias is any inconsistencies in using a sample to make inferences about the entire population.
Bias in Survey Sampling. Bias Due to Unrepresentative Samples A good sample is representative. This means that each sample point represents the attributes.
STATS IN THE REAL WORLD Statistics, Representation, and Interpretation.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 13 Samples and Surveys.
Chapter 10 Sampling Design. How do we gather data? Surveys Opinion polls InterviewsStudies –Observational –Retrospective (past) –Prospective (future)
Chapter 11 Sample Surveys. How do we gather data? Surveys Opinion polls Interviews Studies –Observational –Retrospective (past) –Prospective (future)
Ten percent of U. S. households contain 5 or more people
Math III U9D5 Warm-up: 1. Decide which method of data collection you would use to collect data for the study (observational study, experiment, simulation,
 A popular magazine that began presidential public opinion polls in 1916  One of the first public opinion polls in the US  Was hailed as “amazing right”
LOOKING AT SOME BASICS Can you tell the difference?
By Boon Xuan, Mei Ying and Fatin
Sources of Bias 1. Voluntary response 2. Undercoverage 3. Nonresponse
Bias On-Level Statistics.
Inference for Sampling
Chapter 4 Sampling Design.
a) Survey c) Simulation e) Simulation b) Observation d) Experiment
Sampling and Study Design
COLLECTING STATISTICAL DATA
Presentation transcript:

THE EFFECT OF SAMPLING BIAS ON BIG DATA BY USING THE READERS DIGEST POLL OF THE 1936 ELECTION AS A CASE STUDY, WE EXAMINE HOW THE SAMPLE OF DATA USED AFFECTS THE OUTCOME OF THE SURVEY. Rachel Heymach CS46 Jennifer Widom Akash Das Sarma

BACKGROUND HISTORY Before the 1936 poll, Readers digest was known for their unprecedented accuracy in predicting presidential elections, including the election of 1932 which was correct and was within 1% of the recorded poll results. During the 1930’s, the US was in the middle of the great depression.

HOW THE SURVEY WAS CONDUCTED Reader’s Digest sent out 10 million mock ballots to a randomized sample from lists such as phone directories, club membership lists, and magazine subscriptions. This sample is very large and contained almost a tenth of the population (although they claim that it represented 1 in 4). From there they used the data collected from the 2.4 million surveys sent back to determine that Landon was going to win over Roosevelt with 57% to 43% respectively.

THE DATA Chart from

DOES ANYONE ELSE SEE FLAWS??? Flaw #1: The sample selected was from phone directories and magazine subscriptions. In During the depression. Many people were homeless during this time and telephones were considered luxuries for even the people who managed to keep their homes. This type of sample bias comes into play when the sample selected (mainly middle-upper class citizens in this case) do not properly represent the population you are predicting the outcome for (the entire US population).

DOES ANYONE ELSE SEE FLAWS??? Flaw #2: out of 128 million people in the US in 1936, 10 million people were asked to respond, of that 10 million only 2.4 million mailed back their surveys. This flaw in the poll is called nonresponse bias and actually resulted for two reasons. The first reason is that with mail surveys, the mock ballot could have been seen as another random piece of junk mail and thrown away. The second reason, bringing a more substantial bias is due to the fact that people with stronger opinions are often the ones to share them and people who are shy or unsure of their actions won’t respond as often.

THE RESULT As expected, since the sample was not representative of the population, the sample poll did not accurately depict the national population’s poll in Reader’s Digest had a record high of 19% in sampling error, the greatest error of any major public opinion poll.

HOW TO AVOID SAMPLE BIAS It is important when creating your sample for big data projects that the sample is as close to unbiased as possible. Phone surveys are cheaper but less affective as in person surveys. Surveys of convenience occur when the sample is not random, rather just taken in the simplest way possible. As mentioned earlier, call in and mail surveys often result in nonresponse bias and only show the strongest of opinions instead of a more distributed spread. Larger samples result in more precise data with smaller standard deviations but only when the sample in unbiased. It is often better to have an unbiased smaller sample than to have a large bias sample as the size often just increases the errors caused by the sampling bias.

THANK YOU INTERNET (SOURCES)