Presentation on theme: "AP Statistics Section 5.1 B More on Sampling. Methods for sampling from large populations spread out over a wide area are usually more complex than an."— Presentation transcript:
Methods for sampling from large populations spread out over a wide area are usually more complex than an SRS.
To select a stratified random sample, 1. Divide the population into groups of individuals (strata) that are similar in some way important to the response variable. 2. Choose an SRS in each stratum. 3. Combine the SRSs to form a full sample.
When defining strata, be sure that each group are likely to be similar. Example: Define some strata for the following populations: high schools: music: public vs private: DI, DII, etc jazz, country, rap
A stratified sample can give good information about each stratum seperately as well as about the overall population.
If the individuals in each stratum are less varied than the population as a whole, a stratified sample can produce better information about the population than an SRS of the same size. In the extreme case, all individuals in each stratum would be the same, then we would need only one individual in each stratum to completely represent the population.
Cluster sampling involves 1. Dividing the population into groups (clusters). 2. Randomly selecting some of the clusters. 3. Use all individuals in the chosen clusters to form the sample
Example: Suppose you work for the tax department for Cuyahoga county and you want to investigate whether waitresses/waiters are honestly claiming the tips they make. Explain a method of creating a stratified sample and a method of creating a cluster sample. stratified cluster Group restaurants by price range Choose an SRS from each group of restaurants Randomly select some of the groups of restaurants Combine the SRSs to make a sample Combine the chosen groups to make a sample
Stratified sampling versus cluster sampling: In stratified sampling, we study a random sample in every stratum. But, in cluster sampling, we study all individuals in the chosen cluster and none of the individuals in other clusters.
Undercoverage occurs when some groups in the population are left out of the process used to choose a sample.
Example: What people would be left out in each of the following surveys? 1. Surveys conducted by randomly selecting telephone numbers. 2. Surveys conducted by randomly selecting households. people without telephones homeless, college students
Nonresponse occurs when an individual chosen for a sample can’t be contacted or does not respond.
Nonresponse often reaches 30% or more, even with careful planning and several callbacks. A research center found out of 2879 households called, 1658 were never at home, refused or would not finish the interview. That’s a nonresponse rate of 58%.
There are other details that can affect the results.
The behavior of the respondent or of the interviewer can cause response bias in sample results. Respondents may lie, especially about illegal or unpopular behavior. The race or sex of the interviewer can influence responses to questions about race relations or attitudes toward feminism. Answers to questions that ask respondents to recall past events are often inaccurate because of faulty memory.
Example: One of the most frequently observed survey measurement errors is the over reporting of voting behavior. In a typical sample of 663 people after an election, 478 people (72%) said that they voted, but only 371 people (56%) actually did.
The wording of questions is the most important influence on the answers given to a sample survey. Confusing or leading questions can introduce strong bias and even minor changes in wording can change a survey’s outcome.
Example: A survey conducted in 1992 for the American Jewish Committee asked the following question: Does it seem possible or does it seem impossible to you that the Nazi extermination of the Jews never happened? 22% of the sample said “possible”.
When considering reports based on surveys of large human populations, insist on knowing the exact questions asked, the rate of nonresponse, and the date and method of the survey before you trust a poll result.
You can get more precise results from surveys by using ______ random samples. The Current Population Survey’s sample of 50,000 households estimates the national unemployment rate very accurately whereas Nightline’s voluntary response sample of 186,000 people is worthless. Using a probability sampling method and taking care to deal with practical difficulties reduce bias in a sample. larger