Presentation on theme: "Chapter 7: Data for Decisions Lesson Plan Sampling Bad Sampling Methods Simple Random Samples Cautions About Sample Surveys Experiments Thinking About."— Presentation transcript:
Chapter 7: Data for Decisions Sampling 2 Statistics The science of collecting, organizing, and interpreting data. How is the data produced? Sampling and experiments. Sampling Gather information about a large group of individuals. Time, cost, and inconvenience forbid contacting every individual. Instead, gather information about only part of the group in order to draw conclusions about the whole. Population – The entire group of individuals about which we want information. Sample – Part of the population from which we actually collect information used to draw conclusions about the whole.
Chapter 7: Data for Decisions Bad Sampling Methods Bad Sampling Methods If personal choice is involved in selecting the sample, the following could happen: Results could become biased. The sample may not be a true representation of the population. 3 Bias – The design of a statistical study that systematically favors a certain outcome. 1.Convenience Samples Interviewer chooses the sample from individuals close at hand (easiest to reach). Example: Mall surveys 2.Voluntary Response Sample People who choose themselves by responding to a general appeal. People with strong opinions are most likely to respond; can cause bias. Examples: Opinion polls, call-ins.
Chapter 7: Data for Decisions Simple Random Samples Simple Random Sample (SRS) An SRS of size n consists of n individuals from the population chosen in such a way that every set of n individuals has an equal chance to be the sample actually selected. Choosing a sample by chance avoids bias by giving all individuals an equal chance to be chosen (a good sampling method). Examples of SRS Draw names from a hat: Place all the names of the people in the population into a hat and draw out a handful (the sample). Slow and inconvenient Use the table of random digits: A more efficient way of randomly selecting the sample without bias. For smaller samples, tables of random digits are used. For larger samples, computers do the random digit sampling. 4
Chapter 7: Data for Decisions Simple Random Sample Two Steps in Choosing a Simple Random Sample 1.Give each member of the population a numerical label of the same length. Example: 100 items can be labeled with two digits 01, 02, …, 99, 00 2.To choose the random sample, select a line in the digit table. 5 A table of random digits – A list of the digits 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 with these two properties: 1.Each entry in the table is equally likely to be any of the 10 digits from 0 through 9. 2.The entries are independent of one another. That is, knowledge of one part of the table gives no information about the other part. For a sample size of n, start reading off numbers of length of the labels until n individuals are selected from the population. When selecting the n individuals for the sample from the random digits table: 1. Do not use any group of digits not used as a label. 2. Do not use any repeats.
Chapter 7: Data for Decisions Simple Random Sample Using the Random Digit Table 6 Section Taken From a Random Digits Table 101 19223 95034 05756 28713 96409 12531 42544 82853 102 73676 47150 99400 01927 27754 42648 82425 36290 103 45467 71709 77558 00095 32863 29485 82226 90056 104 52711 38889 93074 60227 40011 85848 48767 52573 Example: A group of 70 people were labeled 01, 02, 03, …, 69, 70. In the random digits table, line 104 was selected and two lucky winners were selected. Reading off two digit labels from line 104… 52 was selected first, 71 was skipped over (because it is not in the range of labels), and 13 was chosen. Answer: 52 and 13
Chapter 7: Data for Decisions Cautions About Sample Surveys Sample surveys of large populations require the following: A good sampling design (can be done with SRS) An accurate and complete list of the population Participation of all individuals selected for the sample A question posed that is neutral and clear Bias can occur due to the following: Problems with obtaining accurate and complete population list Undercoverage – Occurs when some groups in the population are left out of the process of choosing the sample. Example: Homeless, prison inmates, students in dormitories, etc. Problems with getting 100% participation of sampled people Nonresponse – Occurs when an individual chosen for the sample cannot be contacted or refuses to participate. Problems with posing a misleading or confusing question 7
Chapter 7: Data for Decisions Experiments Observation versus Experiments Observational Study – Example: sample survey Observes individuals and measures variable of interest but does not attempt to influence the response. Purpose is to describe some group or situation. Experiment Deliberately imposes some treatment on individuals in order to observe their responses. Purpose is to study whether the treatment causes a change in the response. Examining Cause and Effect Between Variables Experiments are the preferred method for examining the effect of one variable on another. By imposing specific treatment of interest and controlling other influences, we can pin down cause and effect. 8
Chapter 7: Data for Decisions Experiments Uncontrolled Experiment When it is not possible to control outside factors that can influence the outcome. Confounding – The variables, whether part of a study or not, are said to be confounded when their effects on the outcome cannot be distinguished from each other. 9 Apply a treatment Influences by outside effects Observe or measure the response Randomized Comparative Experiment (helps confounding) The outside effects and confounding variables act on all groups. An experiment to compare two or more treatments in which people, animals, or things are assigned to treatments by chance. Randomized – The subjects are assigned to treatments by chance. Comparative – Compares two or more treatments.
Chapter 7: Data for Decisions Thinking About Experiments Statistical Significance An observed effect is statistically significant if it is so large that it is unlikely to occur just by chance in the absence of a real effect in the population from which the data were drawn. Example: The connection between smoking and lung cancer is statistically significant. Control Group A group of experimental subjects who are given a standard treatment or no treatment at all (such as a placebo). Placebo Effect The effect of a dummy treatment (such as an inert pill in a medical experiment) on the response of the subjects. The tendency to respond favorably to any treatment. Double-Blind Experiments An experiment in which neither the experimental subjects nor the persons who interact with them know which treatment each subject received. This helps to eliminate possible influences or biases between the subjects and workers — everyone kept “blind.” 10
Chapter 7: Data for Decisions Inference: From Sample to Population Statistical Inference When the sample was chosen at random from a population, we can infer conclusions about the wider population from these data. Statistical inference works only if the data come from random samples or a randomized comparative experiment. Parameter is a number that describes the population. A parameter is a fixed number (in practice we do not know its value). A statistic is a number that describes a sample. The value of a statistic is known when we have taken a sample, but it can change from sample to sample. Example: A random sample of 2500 people was chosen from the population and asked a question: “Do you like getting new clothes but find shopping for clothes frustrating and time consuming?” 1650 people agreed. Sample statistic, p = 1650/2500 = 0.66 = 66% Infer that 66% of the population agrees. 11 ^
Chapter 7: Data for Decisions Inference: From Sample to Population Sampling Distribution The distribution of values taken by the statistic in all possible samples of the same size from the same population. For a fixed number of trials, a distribution with larger sample sizes will have less variation and the values will lie closer to the mean. 12 1000 SRSs of size 2500 from the same population (less variable than samples of size 100) 1000 SRSs of size 100 from the same population
Chapter 7: Data for Decisions Inference: From Sample to Population Sample Proportion Choose a SRS of size n from a large population that contains population proportion p of successes: count of successes in the sample Sample proportion of successes, p = n Then… Shape: For large sample sizes, the sampling distribution of p is approximately Normal. Center: The mean of the sampling distribution is p. Spread: The standard deviation of the sampling distribution is: p (1 – p ) Standard deviation is n For the shopping example… 0.60 ( 1 - 0.60) With a mean p = 0.6 and n = 2500, stand. dev. is 2500 = 0.0098 13 ^ ^
Chapter 7: Data for Decisions Confidence Intervals The 68-95-99.7 Rule 68% of the observations fall within ± 1 standard deviation of the mean. 95% of the observations fall within ± 2 standard deviations of the mean. 99.7% of the observations fall within ± 3 standard deviations of the mean. 95% Confidence Interval An interval obtained from the sample data by a method that in 95% of all samples will produce an interval containing the true population parameter A 95% confidence interval for p is p (1 – p) n Where p is the proportion of successes in the sample And the margin of error is 2 p (1 – p ) /n This recipe is only approximately correct, but it is quite accurate when the sample size n is large. 14 ^ ^^ ^ ^ ^ p ± 2