The Basic Practice of Statistics

The Basic Practice of Statistics
Chapter 5:

Discuss how to design a statistical study
Objectives Discuss how to design a statistical study Discuss data collection techniques Discuss how to design an experiment Discuss sampling techniques 2

Types of Sampling Random Systematic Convenience Stratified Cluster
Voluntary response

Population and Sample Sampling and Surveys
The distinction between population and sample is basic to statistics. To make sense of any sample result, you must know what population the sample represents Sampling and Surveys Definition: The population in a statistical study is the entire group of individuals about which we want information. A sample is the part of the population from which we actually collect information. We use information from a sample to draw conclusions about the entire population. Population Collect data from a representative Sample... Sample Make an Inference about the Population.

The individuals in an experiment are the experimental units
The individuals in an experiment are the experimental units. If they are human, we call them subjects. In an experiment, we do something to the subject and measure the response. The “something” we do is a called a treatment, or factor. The factor may be the administration of a drug. One group of people may be placed on a diet/exercise program for 6 months (treatment), and their blood pressure (response variable) would be compared with that of people who did not diet or exercise.

If the experiment involves giving two different doses of a drug, we say that we are testing two levels of the factor. A response to a treatment is statistically significant if it is larger than you would expect by chance (due to random variation among the subjects). We will learn how to determine this later. In a study of sickle cell anemia, 150 patients were given the drug hydroxyurea, and 150 were given a placebo (dummy pill). The researchers counted the episodes of pain in each subject. Identify: The subjects (patients, all 300) The factors/treatments (hydroxyurea and placebo) And the response variable (episodes of pain)

Principles of comparative experiments
Experiments are comparative in nature: We compare the response to a treatment versus to: another treatment no treatment (a control) a placebo or any combination of the above A control is a situation in which no treatment is administered. It serves as a reference mark for an actual treatment (e.g., a group of subjects does not receive any drug or pill of any kind). A placebo is a fake treatment, such as a sugar pill. It is used to test the hypothesis that the response to the treatment is due to the actual treatment and not to how the subject is being taken care of.

About the placebo effect
The “placebo effect” is an improvement in health due not to any treatment but only to the patient’s belief that he or she will improve. The placebo effect is not understood, but it is believed to have therapeutic results on up to a whopping 35% of patients. It can sometimes ease the symptoms of a variety of ills, from asthma to pain to high blood pressure and even to heart attacks. An opposite, or “negative placebo effect,” has been observed when patients believe their health will get worse. The most famous and perhaps most powerful placebo is the “kiss,” blow, or hug—whatever your technique. Unfortunately, the effect gradually disappears once children figure out that they sometimes get better without help, and vice versa.

placebo effect: a patient responds favorably to being treated, not to the treatment itself.
It is much more useful to do comparative experiments using a control group. A control group is a set of experimental units that receive no active treatment. If the units are subjects, then the control group usually receives a placebo and any effect of the treatment on this group is called the placebo effect. In experiments we hope to see a difference in the response that is too large to happen merely due to random chance. In other words, we are looking for a statistically significant result.

Designing “controlled” experiments
Fisher found the data from experiments that had been going on for decades to be basically worthless because of poor experimental design. Fertilizer had been applied to a field one year and not in another in order to compare the yield of grain produced in the two years. BUT It may have rained more, or been sunnier, in different years. The seeds used may have differed between years as well. Or fertilizer was applied to one field and not to a nearby field in the same year. BUT The fields might have different soil, water, drainage, and history of previous use.  Too many factors affecting the results were “uncontrolled.”

Randomized comparative experiments
Fisher’s solution: Randomized comparative experiments In the same field and same year, apply fertilizer to randomly spaced plots within the field. Analyze plants from similarly treated plots together. This minimizes the effect of variation of drainage and soil composition within the field on yield as well as controlling for weather. F

Parameter a numerical measurement describing some characteristic of a population Statistic a numerical measurement describing some characteristic of a sample.

The Idea of a Sample Survey
We often draw conclusions about a whole population on the basis of a sample. Choosing a sample from a large, varied population is not that easy. Sampling and Surveys Step 1: Define the population we want to describe. Step 2: Say exactly what we want to measure. A “sample survey” is a study that uses an organized plan to choose a sample that represents some specific population. Step 3: Decide how to choose a sample from the population.

Designing a Statistical Study
Identify the variable(s) of interest (the focus) and the population of the study. Develop a detailed plan for collecting data. If you use a sample, make sure the sample is representative of the population. Collect the data. Describe the data using descriptive statistics techniques. Interpret the data and make decisions about the population using inferential statistics. Identify any possible errors. 17

Data Collection Observational study
A researcher observes and measures characteristics of interest of part of a population. Researchers observed and recorded the mouthing behavior on nonfood objects of children up to three years old. (Source: Pediatric Magazine) 18

Data Collection Experiment
A treatment is applied to part of a population and responses are observed. An experiment was performed in which diabetics took cinnamon extract daily while a control group took none. After 40 days, the diabetics who had the cinnamon reduced their risk of heart disease while the control group experienced no change. (Source: Diabetes Care) 19

Data Collection Simulation Uses a mathematical or physical model to reproduce the conditions of a situation or process. Often involves computers Automobile manufacturers use simulations with dummies to study the effects of crashes on humans.

Observational Study versus Experiment
In contrast to observational studies, experiments don’t just observe individuals or ask them questions. They actively impose some treatment in order to measure the response. Experiments Definition: An observational study observes individuals and measures variables of interest but does not attempt to influence the responses. NO TREATMENT IMPOSED !!! An experiment deliberately imposes some treatment on individuals to measure their responses. When our goal is to understand cause and effect, experiments are the only source of fully convincing data. The distinction between observational study and experiment is one of the most important in statistics.

Observational Study versus Experiment
Observational studies of the effect of one variable on another often fail because of confounding between the explanatory variable and one or more lurking variables. Experiments Definition: A lurking variable is a variable that is not among the explanatory or response variables in a study but that may influence the response variable. Confounding occurs when two variables are associated in such a way that their effects on a response variable cannot be distinguished from each other. Well-designed experiments take steps to avoid confounding.

The Language of Experiments
An experiment is a statistical study in which we actually do something (a treatment) to people, animals, or objects (the experimental units) to observe the response. Here is the basic vocabulary of experiments. Experiments Definition: A specific condition applied to the individuals in an experiment is called a treatment. If an experiment has several explanatory variables, a treatment is a combination of specific values of these variables. The experimental units are the smallest collection of individuals to which treatments are applied. When the units are human beings, they often are called subjects. Sometimes, the explanatory variables in an experiment are called factors. Many experiments study the joint effects of several factors. In such an experiment, each treatment is formed by combining a specific value (often called a level) of each of the factors.

What experimental design?
A researcher wants to see if there is a significant difference in resting pulse rates for men and women. Twenty-eight men and twenty-four women had their pulse rate measured at rest in the lab. One factor, two levels (male and female) Stratified/block design (by gender) Many dairy cows now receive injections of BST, a hormone intended to spur greater milk production. The milk production of 60 Ayrshire dairy cows was recorded before and after they received a first injection of BST. SRS of 60 cows Match pair design (before and after)

Experiment Vocabulary Terms
Experimental Unit: individual or units on which the experiment is done. Subjects: Experimental units that are human beings. Treatments: A specific experimental condition applied to the units. Factors: The explanatory variable(s) being studied. Factor level: A specific level (option) for the factor.

A sports engineer is interested in determining the effects that speed
Example: A sports engineer is interested in determining the effects that speed and air pressure have on throwing distance for his mechanical trainer football throwing machine. Two speeds (40 mph, 55 mph) and three air pressures (175 psi, 200 psi, 230 psi) were chosen for the study. Treatments were randomly assigned to thirty footballs. Experimental Unit: a football Treatments: 40 mph and 175 psi 40 mph and 200 psi 40 mph and 230 psi 55 mph and 175 psi 55 mph and 200 psi 55 mph and 230 psi Factors: speed and air pressure Levels: For speed - 40 mph & 55 mph. For air pressure psi, 200 psi, 230 psi.

For example, suppose that we wanted to measure the effect that going to college has on your future income, but we cannot measure your ability, motivation, and other factors. We talk about three types of variables. Type of Variable Definition Example Response Variable The variable that we measure to assess the significance of the results income of a sample of people Explanatory Variables Variables whose impact on the response variable we plan to measure years of schooling completed for people in the sample Lurking or Unobserved Variables that may affect the response variable but which we cannot measure individual ability; individual motivation

How to Experiment Badly
Experiments are the preferred method for examining the effect of one variable on another. By imposing the specific treatment of interest and controlling other influences, we can pin down cause and effect. Good designs are essential for effective experiments, just as they are for sampling. Experiment A high school regularly offers a review course to prepare students for the SAT. This year, budget cuts will allow the school to offer only an online version of the course. Over the past 10 years, the average SAT score of students in the classroom course was The online group gets an average score of That’s roughly 10% higher than the long- time average for those who took the classroom review course. Is the online course more effective? Students -> Online Course -> SAT Scores

A group of subjects (students) were exposed to a treatment (on-line course)
The outcome was observed A closer look shows that the studnets in the on-line course were quite different from the students who took the classroom course. They had higher GPA’s and more AP classes. The effect of on-line versus in-class instruction is mixed up with the effect of these lurking variables. Maybe on-line students earned higher SAT scores because they were smarter to begin with not because the online course prepared them better. This confounding prevents us from concluding the online course is more effective

How to Experiment Badly
Many laboratory experiments use a design like the one in the online SAT course example: Experiment Treatment Measure Response Experimental Units In the lab environment, simple designs often work well. Field experiments and experiments with animals or people deal with more variable conditions. Outside the lab, badly designed experiments often yield worthless results because of confounding.

An investigation of one or more characteristics of a population.
Data Collection Survey An investigation of one or more characteristics of a population. Commonly done by interview, mail, or telephone. A survey is conducted on a sample of female physicians to determine whether the primary reason for their career choice is financial stability. 31

Example: Methods of Data Collection
Consider the following statistical studies. Which method of data collection would you use to collect data for each study? A study of the effect of changing flight patterns on the number of airplane accidents. Solution: Simulation (It is impractical to create this situation) 32

A study of the effect of eating oatmeal on lowering blood pressure. Solution: Experiment (Measure the effect of a treatment – eating oatmeal) 33

A study of how fourth grade students solve a puzzle. Solution: Observational study (observe and measure certain characteristics of part of a population) 34

A study of U.S. residents’ approval rating of the U.S. president. Solution: Survey (Ask “Do you approve of the way the president is handling his job?”) 35

Key Elements of Experimental Design
Control Randomization Replication . 36

Key Elements of Experimental Design: Control
Control for effects other than the one being measured. Confounding variables Occurs when an experimenter cannot tell the difference between the effects of different factors on a variable. A coffee shop owner remodels her shop at the same time a nearby mall has its grand opening. If business at the coffee shop increases, it cannot be determined whether it is because of the remodeling or the new mall. 37

The Randomized Comparative Experiment
Experiments Definition: In a completely randomized design, the treatments are assigned to all the experimental units completely by chance. Some experiments may include a control group that receives an inactive treatment or an existing baseline treatment. Group 1 Group 2 Treatment 1 Treatment 2 Compare Results Experimental Units Random Assignment

Principles of Experimental Design
Three Principles of Experimental Design Randomized comparative experiments are designed to give good evidence that differences in the treatments actually cause the differences we see in the response. Experiments Principles of Experimental Design Control for lurking variables that might affect the response: Use a comparative design and ensure that the only systematic difference between the groups is the treatment administered. Random assignment: Use impersonal chance to assign experimental units to treatments. This helps create roughly equivalent groups of experimental units by balancing the effects of lurking variables that aren’t controlled on the treatment groups. Replication: Use enough experimental units in each group so that any differences in the effects of the treatments can be distinguished from chance differences between the groups.

Example: The Physicians’ Health Study
Read the description of the Physicians’ Health Study on page 241. Explain how each of the three principles of experimental design was used in the study. Experiments A placebo is a “dummy pill” or inactive treatment that is indistinguishable from the real treatment.

Key Elements of Experimental Design: Control
Placebo effect A subject reacts favorably to a placebo(no active ingredient drug) when in fact he or she has been given no medical treatment at all. Blinding is a technique where the subject does not know whether he or she is receiving a treatment or a placebo. Double-blind experiment neither the subject nor the experimenter knows if the subject is receiving a treatment or a placebo. 41

Experiments: What Can Go Wrong?
The logic of a randomized comparative experiment depends on our ability to treat all the subjects the same in every way except for the actual treatments being compared. Good experiments, therefore, require careful attention to details to ensure that all subjects really are treated identically. Experiments A response to a dummy treatment is called a placebo effect. The strength of the placebo effect is a strong argument for randomized comparative experiments. Whenever possible, experiments with human subjects should be double-blind. Definition: In a double-blind experiment, neither the subjects nor those who interact with them and measure the response variable know which treatment a subject received.

Inference for Experiments
In an experiment, researchers usually hope to see a difference in the responses so large that it is unlikely to happen just because of chance variation. We can use the laws of probability, which describe chance behavior, to learn whether the treatment effects are larger than we would expect to see if only chance were operating. If they are, we call them statistically significant. Experiments Definition: An observed effect so large that it would rarely occur by chance is called statistically significant. A statistically significant association in data from a well-designed experiment does imply causation.

Key Elements of Experimental Design: Randomization
Randomization is a process of randomly assigning subjects to different treatment groups. Completely randomized design Subjects are assigned to different treatment groups through random selection. Randomized block design Divide subjects with similar characteristics into blocks, and then within each block, randomly assign subjects to treatment groups. 44

Blocking Completely randomized designs are the simplest statistical designs for experiments. But just as with sampling, there are times when the simplest method doesn’t yield the most precise results. Experiments Definition A block is a group of experimental units that are known before the experiment to be similar in some way that is expected to affect the response to the treatments. In a randomized block design, the random assignment of experimental units to treatments is carried out separately within each block. Form blocks based on the most important unavoidable sources of variability (lurking variables) among the experimental units. Randomization will average out the effects of the remaining lurking variables and allow an unbiased comparison of the treatments. Control what you can, block on what you can’t control, and randomize to create comparable groups.

Blocks have a homogeneous characteristic ie (men women) but within the block you must discuss how to randomly assign the subjects into groups within the blocks like before.

Block designs In a block or stratified design, subjects are divided into groups, or blocks, prior to the experiment to test hypotheses about differences between the groups. The blocking, or stratification, here is by gender. Discuss the number of subjects and women and men Discuss the type of random assignment to the groups ie. Flipping a coin or using the random digits table. As discussed before.

Randomized block design An experimenter testing the effects of a new weight loss drink may first divide the subjects into age categories. Then within each age group, randomly assign subjects to either the treatment group or control group. 48

Matched Pairs Design Subjects are paired up according to a similarity. One subject in the pair is randomly selected to receive one treatment while the other subject receives a different treatment. 49

Matched-Pairs Design Experiments
A common type of randomized block design for comparing two treatments is a matched pairs design. The idea is to create blocks by matching pairs of similar experimental units. Experiments Definition A matched-pairs design is a randomized blocked experiment in which each block consists of a matching pair of similar experimental units. Chance is used to determine which unit in each pair gets each treatment. Sometimes, a “pair” in a matched-pairs design consists of a single unit that receives both treatments. Since the order of the treatments can influence the response, chance is used to determine with treatment is applied first for each unit.

Matched pairs designs Matched pairs: Choose pairs of subjects that are closely matched— e.g., same sex, height, weight, age, and race. Within each pair, randomly assign who will receive which treatment. There is no random allocation here you are matching. It is also possible to just use a single person and give the two treatments to this person over time in random order. In this case, the “matched pair” is just the same person at different points in time. The most closely matched pair studies use identical twins.

A matched pairs design is a special case of the randomized block design. It is used when the experiment has only two treatment conditions; and participants can be grouped into pairs, based on some blocking variable. Then, within each pair, participants are randomly assigned to different treatments. In a matched pairs design, experimental units within each pair are assigned to different treatment levels.

Matched Pairs Design: A design that compares just two treatments.
This can occur in one of two ways: · Choose pairs of subjects that are closely matched. Within each pair, randomly assign one individual to each treatment. · All subjects receive both treatments. The order of the treatments may be randomly assigned to see if the order of the treatments matters. Example - Do students retain information better if there is noise in the background while they are studying? A group of 20 students are randomly selected to participate in an experiment. Each will be put in a room where they will read a list of 50 statements. After a break they will be given quiz to see how much of the information they retained. Ten of the students will read the statements in a silent room, ten in a room with background noise piped in. The groups will switch places and repeat the process using the opposite environment. Students are randomly assigned to a starting group.

be two women, both age 22; and so on.
Matched pairs example For example: say 1000 subjects are grouped into 500 matched pairs. Each pair is matched on gender and age. For example, Pair 1 might be two women, both age 21. Pair 2 might be two men, both age 21. Pair 3 might be two women, both age 22; and so on. For this hypothetical example, the matched pairs design is an improvement over a completely randomized design. Like the completely randomized design, the matched pairs design uses randomization to control for confounding. However, unlike the other design, the matched pairs design explicitly controls for two potential lurking variables - age and gender.

Key Elements of Experimental Design: Replication
Replication is the repetition of an experiment using a large group of subjects. To test a vaccine against a strain of influenza, 10,000 people are given the vaccine and another 10,000 people are given a placebo. Because of the sample size, the effectiveness of the vaccine would most likely be observed. 55

Example: Experimental Design
A company wants to test the effectiveness of a new gum developed to help people quit smoking. Identify a potential problem with the given experimental design and suggest a way to improve it. The company identifies one thousand adults who are heavy smokers. The subjects are divided into blocks according to gender. After two months, the female group has a significant number of subjects who have quit smoking. 56

Solution: Experimental Design
Problem: The groups are not similar. The new gum may have a greater effect on women than men, or vice versa. Correction: The subjects can be divided into blocks according to gender, but then within each block, they must be randomly assigned to be in the treatment group or the control group. 57

The Randomized Comparative Experiment
Experiments Definition: In a completely randomized design, the treatments are assigned to all the experimental units completely by chance. Some experiments may include a control group that receives an inactive treatment or an existing baseline treatment. Group 1 Group 2 Treatment 1 Treatment 2 Compare Results Experimental Units Random Assignment

Completely randomized designs
In a completely randomized experimental design, individuals are randomly assigned to groups, then the groups are randomly assigned to treatments. Discuss the type of random assignment to the groups ie. Flipping a coin or using the random digits table.

This has a control group
Discuss the type of random assignment to the groups ie. Flipping a coin or using the random digits table. As discussed on a previous slide.

Discuss the type of random assignment to the groups ie. Flipping
a coin or using the random digits table line ??? Taking ?? numbers at a time. Making sure to discuss throw out doubles and ignore duplicates. Be sure to be detailed on the Random assignment discussion. Labeling of subjects 01,02…50 for example.

Getting rid of sampling biases
The best way to exclude biases in an experiment is to randomize the design. Both the individuals and treatments are assigned randomly. A double-blind experiment is one in which neither the subjects nor the experimenter know which individuals got which treatment until the experiment is completed. Another way to make sure your conclusions are robust is to replicate your experiment—do it over. Replication ensures that particular results are not due to uncontrolled factors or errors of manipulation.

How do we obtain low variability?
1. Increase our sample size. 2. Use a sampling design that enables us to get a sample that best reflects our population. Variability is related to the spread of the sampling distribution. The variability of a statistic from a random sample does not depend on the size of the population, as long as the population is at least 100 times larger than the sample.

Types of Sampling Sampling Techniques Simple Random Sample
Every possible sample of the same size has the same chance of being selected. x 66

Definitions Random Sample Simple Random Sample (of size n)
members of the population are selected in such a way that each individual member has an equal chance of being selected Simple Random Sample (of size n) subjects selected in such a way that every possible sample of the same size n has the same chance of being chosen

Random Sampling - selection so that each has an equal chance of being selected

Simple Random Sample Random numbers can be generated by a random number table, a software program or a calculator. Assign a number to each member of the population. Members of the population that correspond to these numbers become members of the sample. 69

Sampling and Surveys How to Sample Well: Random Sampling
The statistician’s remedy is to allow impersonal chance to choose the sample. A sample chosen by chance rules out both favoritism by the sampler and self-selection by respondents. Random sampling, the use of chance to select a sample, is the central principle of statistical sampling. Sampling and Surveys Definition: A simple random sample (SRS) of size n consists of n individuals from the population chosen in such a way that every set of n individuals has an equal chance to be the sample actually selected. In practice, people use random numbers generated by a computer or calculator to choose samples. If you don’t have technology handy, you can use a table of random digits.

selection so that each has an
Random Sampling selection so that each has an equal chance of being selected

Simple Random Samples Every individual or item from the frame has an equal chance of being selected Selection may be with replacement or without replacement Samples obtained from table of random numbers or computer random number generators

How to Choose an SRS Using Table D
Sampling and Surveys Definition: A table of random digits is a long string of the digits 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 with these properties: • Each entry in the table is equally likely to be any of the 10 digits • The entries are independent of each other. That is, knowledge of one part of the table gives no information about any other part. Step 1: Label. Give each member of the population a numerical label of the same length. Step 2: Table. Read consecutive groups of digits of the appropriate length from Table D. Your sample contains the individuals whose labels you find. How to Choose an SRS Using Table D 73

Example: How to Choose an SRS
Problem: Use Table D at line 130 to choose an SRS of 4 hotels. Sampling and Surveys 01 Aloha Kai 08 Captiva 15 Palm Tree 22 Sea Shell 02 Anchor Down 09 Casa del Mar 16 Radisson 23 Silver Beach 03 Banana Bay 10 Coconuts 17 Ramada 24 Sunset Beach 04 Banyan Tree 11 Diplomat 18 Sandpiper 25 Tradewinds 05 Beach Castle 12 Holiday Inn 19 Sea Castle 26 Tropical Breeze 06 Best Western 13 Lime Tree 20 Sea Club 27 Tropical Shores 07 Cabana 14 Outrigger 21 Sea Grape 28 Veranda Our SRS of 4 hotels for the editors to contact is: 05 Beach Castle, 16 Radisson, 17 Ramada, and 20 Sea Club. 74

Example: Simple Random Sample
There are 731 students currently enrolled in statistics at your school. You wish to form a sample of eight students to answer some survey questions. Select the students who will belong to the simple random sample. Assign numbers 001 to 731 to each student taking statistics. On the table of random numbers, choose a starting place at random (suppose you start in the third row, second column.) 75

Solution: Simple Random Sample
Read the digits in groups of three Ignore numbers greater than 731 The students assigned numbers 719, 662, 650, 004, 053, 589, 403, and 129 would make up the sample. 76

Stratified Sample Divide a population into groups (strata) and select a random sample from each group. To collect a stratified sample of the number of people who live in West Ridge County households, you could divide the households into socioeconomic levels and then randomly select households from each level. 77

Other Sampling Methods
The basic idea of sampling is straightforward: take an SRS from the population and use your sample results to gain information about the population. Sometimes there are statistical advantages to using more complex sampling methods. One common alternative to an SRS involves sampling important groups (called strata) within the population separately. These “sub-samples” are combined to form one stratified random sample. Sampling and Surveys Definition: To select a stratified random sample, first classify the population into groups of similar individuals, called strata. Then choose a separate SRS in each stratum and combine these SRSs to form the full sample.

subdivide the population into at
Stratified Sampling subdivide the population into at least two different subgroups that share the same characteristics, then draw a sample from each subgroup (or stratum)

Stratified Samples Population divided into two or more groups according to some common characteristic Simple random sample selected from each group The two or more samples are combined into one

Stratified Sampling - subdivide population and draw sample from each stratum

Examples of Stratified Sampling
The population is first broken down into groups or strata, and then sampling is done within the different groups. Splitting up a population by gender and sampling Separating by religious preference and then looking at those differences

Cluster Sample Divide the population into groups (clusters) and select all of the members in one or more, but not all, of the clusters. In the West Ridge County example you could divide the households into clusters according to zip codes, then select all the households in one or more, but not all, zip codes. 83

divide the population into sections
Cluster Sampling divide the population into sections (or clusters); randomly select some of those clusters; choose all members from selected clusters

Cluster Samples Population divided into several “clusters,” each representative of the population Simple random sample selected from each The samples are combined into one Population divided into 4 clusters.

Cluster Sampling - divide into sections; choose a few of those sections; choose all from selected sections

Examples of Cluster Sampling
Remember, the main assumption here is that each cluster accurately represents the characteristics of the entire population. Every element from that cluster is then sampled Instead of surveying all heart surgeons, Make one hospital a cluster and survey all heart surgeons from that hospital. Randomly select one dorm (cluster) from all dorms on campus, and then sample every student from that dorm.

Cluster sampling Sampling and Surveys
Although a stratified random sample can sometimes give more precise information about a population than an SRS, both sampling methods are hard to use when populations are large and spread out over a wide area. In that situation, we’d prefer a method that selects groups of individuals that are “near” one another. Sampling and Surveys Definition: To take a cluster sample, first divide the population into smaller groups. Ideally, these clusters should mirror the characteristics of the population. Then choose an SRS of the clusters. All individuals in the chosen clusters are included in the sample.

Voluntary response sample
one in which the respondents themselves decide whether to be included. In this case, valid conclusions can be made only about the specific group of people who agree to participate.

How to Sample Badly Convenience samples are almost guaranteed to show bias. So are voluntary response samples, in which people decide whether to join the sample in response to an open invitation. Sampling and Surveys Definition: A voluntary response sample consists of people who choose themselves by responding to a general appeal. Voluntary response samples show bias because people with strong opinions (often in the same direction) are most likely to respond.

use results that are easy to get
Convenience Sampling use results that are easy to get

Convenience Sampling - use readily available results
Hey! Do you believe in the death penalty?

Systematic Sample Choose a starting value at random. Then choose every kth member of the population. In the West Ridge County example you could assign a different number to each household, randomly choose a starting number, then select every 100th household. 93

Systematic Sampling Select some starting point and then
select every K th element in the population

Systematic Samples Decide on sample size: n
Divide frame of N individuals into groups of k individuals: k=n/n N = 64 n = 8 k = 8 First Group

Systematic Sampling Every K th element
Randomly select one individual from the 1st group Select every k-th individual thereafter Every K th element

How to Sample Badly How can we choose a sample that we can trust to represent the population? There are a number of different methods to select samples. Sampling and Surveys Definition: Choosing individuals who are easiest to reach results in a convenience sample. Convenience samples often produce unrepresentative data…why? Definition: The design of a statistical study shows bias if it systematically favors certain outcomes.

Advantages and Disadvantages
Simple random sample and systematic sample Simple to use May not be a good representation of the population’s underlying characteristics Stratified sample Ensures representation of individuals across the entire population Cluster sample More cost effective Less efficient (need larger sample to acquire the same level of precision)

Example: Identifying Sampling Techniques
You are doing a study to determine the opinion of students at your school regarding stem cell research. Identify the sampling technique used. You divide the student population with respect to majors and randomly select and question some students in each major. Solution: Stratified sampling (the students are divided into strata (majors) and a sample is selected from each major) 99

Example: Identifying Sampling Techniques
You assign each student a number and generate random numbers. You then question each student whose number is randomly selected. Solution: Simple random sample (each sample of the same size has an equal chance of being selected and each student has an equal chance of being selected.) 100

Example: Sampling at a School Assembly
Describe how you would use the following sampling methods to select 80 students to complete a survey. (a) Simple Random Sample (b) Stratified Random Sample (c) Cluster Sample Sampling and Surveys

Types of Survey Errors undercoverage error Non response error
Sampling error Measurement error Excluded from frame. Follow up on non responses. Chance differences from sample to sample. Bad Question!

Definitions Sampling Error Nonsampling Error
the difference between a sample result and the true population result; such an error results from chance sample fluctuations Nonsampling Error sample data that are incorrectly collected, recorded, or analyzed (such as by selecting a biased sample, using a defective instrument, or copying the data incorrectly)

Controlling Effects of Variables
Blinding subject does not know he or she is receiving a treatment or placebo Blocks groups of subjects with similar characteristics Completely Randomized Experimental Design subjects are put into blocks through a process of random selection Rigorously Controlled Design subjects are very carefully chosen

Potential Sources of Bias in Sampling:
Response bias: When the behavior of the respondent or the interviewer changes the sample result. Examples of this include the respondent lying, poor interviewing techniques, wording of questions, race or sex of interviewer influencing respondent, etc. Nonresponse Bias: Nonresponse occurs when a selected unit either cannot be contacted or refuses to cooperate. If the non-respondent systematically differs with respect to the variable of interest compared to those who respond, then sampling bias will be introduced into the study. Example – If you were able to find phone numbers for everyone in your sample group, what might happen if you called each one and asked if they would take part in your survey?

Potential Sources of Bias in Sampling:
Sampling Bias occurs when the sample systematically favors certain parts of the population over others. Undercoverage bias: Occurs when the sample systematically excludes a portion of the population. Note: If the excluded portion systematically differs with respect to the response from those units that are available for sampling, sampling bias will be introduced into the study. Example – If you wanted to take a survey of people in Lafayette, could you use the phone book to select your sample? Why not?

Using Studies Wisely The Challenges of Establishing Causation
A well-designed experiment tells us that changes in the explanatory variable cause changes in the response variable. Lack of realism can limit our ability to apply the conclusions of an experiment to the settings of greatest interest. In some cases it isn’t practical or ethical to do an experiment. Consider these questions: Does texting while driving increase the risk of having an accident? Does going to church regularly help people live longer? Does smoking cause lung cancer? It is sometimes possible to build a strong case for causation in the absence of experiments by considering data from observational studies. Using Studies Wisely

The Challenges of Establishing Causation
When we can’t do an experiment, we can use the following criteria for establishing causation. Using Studies Wisely The association is strong. The association is consistent. Larger values of the explanatory variable are associated with stronger responses. The alleged cause precedes the effect in time. The alleged cause is plausible. Discuss how each of these criteria apply to the observational studies of the relationship between smoking and lung cancer.

The Basic Practice of Statistics

Similar presentations

Presentation on theme: "The Basic Practice of Statistics"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

The Basic Practice of Statistics

Similar presentations

Presentation on theme: "The Basic Practice of Statistics"— Presentation transcript:

Similar presentations

About project

Feedback