Presentation on theme: "Chapter 12 Sample Surveys"— Presentation transcript:
1 Chapter 12 Sample Surveys Richard Dong & Stanley Chen
2 VocabularyPopulation - the entire group of individuals or instances about whom we hope to learnSample - a (representative) subset of a population, examined in hope of learning about the populationSample Survey - a study that asks questions of a sample drawn from some population in the hope of learning about the entire population; Polls taken to assess voter preferences are common theseBias - any systematic failure of a sampling method to represent its population is this; These sampling methods tend to over-or underestimate parameters; It is almost impossible to recover from this, so efforts to avoid it are well spentRandomization - the best defense against bias is this, in which each individual is given a fair, random chance of selectionSample size - the number of individuals in a sample; Determines how well the sample represents the population, not the fraction of the population sampledCensus - a sample that consists of the entire populationPopulation parameter - a numerically valued attribute of a model for a population.
3 Vocabulary Statistic - values calculated for sampled data Representative - if the statistics computed from the sample accurately reflect the corresponding population parametersSimple Random Sample SRS - this of sample size n is a sample in which each set of n elements in the population has an equal chance of selectionSampling Frame - a list of individuals from which the sample is drawnSampling variability - natural tendency of randomly drawn samples to differ, one from anotherStratified random sampling - population is divided into subpopulations, or strata, and random samples are then drawn from each stratum; Best if strata are homogeneous, but different from each otherCluster sampling - entire groups, or cluster, are chosen at random. Selected as a matter of convenience, practicality, or cost; Clusters should be representative of the population, and therefore heterogeneous and similar to each otherMultistage sampling - combine several different types of sampling methods
4 VocabularySystematic sample - individuals are selected systematically from a sampling frame. first number must be random; ex: every 10th personPilot - a small trial runVoluntary Response Bias - individuals can choose on their own whether or not to participate in the sampleConvenience Sample - taken from individuals who are conveniently availableUndercoverage - part of population is less representedNonresponse bias - large fraction of those sampled fail to respond; those who respond are not likely to represent the whole population; ex: telephone surveyResponse bias - the word of questions that influences a responders answer; ex: "How do you feel about the cost cuts to local zoos that are making animals starve to death?"
5 FormulasThere are no formulas for this chapter because this unit covers surveys and thus, math is not required.
6 ConceptsRepresentative samples can offer us important insights about population.The size of the sample is most important.Simple Random Sample (SRS) is the standard. Every person has an equal chance.Stratified samples reduce variability by identifying homogenous subgroups and use random sampling in those groups.Cluster samples randomly select among heterogeneous subgroups that resemble the population, only smaller.Systematic samples are the least expensive and can work in some situations. We want to start randomly though.Multistage samples combine several of the aforementioned methods.
7 Concepts Bias can destroy our insights through poor sampling methods. Nonresponse bias arises when respondents might not respond.Response bias arises when sampled individuals might be influenced by wording or interviewer behavior.Voluntary response bias are almost always biased and should be avoided.Convenience samples are likely to be flawed for similar reasonsEven with a reasonable design, sample frames may not be representative. Undercoverage may occur when too few individuals are sampled.Always report all biases when performing a survey so that others can evaluate your data for fairness and accuracy in your results.
8 Problem Example #21Question: Examine each of the following questions for possible bias. If you think the question is biased, indicate how and propose a better question.Should companies that pollute the environment be compelled to pay the costs of cleanup?Given that 18-year-olds are old enough to vote and to serve in the military, is it fair to set the drinking age at 21?AnswersThere’s a bias towards yes because of the word ‘pollute’. A better question should be ‘Should companies be responsible for any costs of environmental clean-up?’There’s a bias towards no because of ‘old enough to serve in the military’. ‘Do you think drinking age should be lowered from 21?’ is a better survey question.
9 Problem Example #23Question: Anytime we conduct a survey we must take care to avoid undercoverage. Suppose we plan to select 500 names from the city phone book, call their homes between noon and 4 p.m., and interview whoever answers, anticipating contacts with at least 200 people.Why is it difficult to use a simple random sample here?Describe a more convenient, but still random, sampling strategy.What kinds of households are likely to be included in the eventual sample of opinion? Who will be excluded?Suppose, instead, that we continue calling each number, perhaps in the morning or evening, until an adult is contacted and interviewed. How does this improve the sampling design?Random digit dialing machines can generate the phone calls for us. How would this improve our design? Is anyone still excluded?AnswersPeople with unlisted numbers, people without phones and those at work cannot be reached. As a result, not everyone has an equal chance.We can generate random numbers and call at random times to make sure everyone has equal chanceFamilies that has person at home are more likely to be included under the original plan. Many more people could be included under the second plan. However, people without phones are still excluded.This design can randomize the phone numbers, but time of day is still an issue. And People without phones are still excluded.