# The Practice of Statistics

## Presentation on theme: "The Practice of Statistics"— Presentation transcript:

The Practice of Statistics
Daniel S. Yates The Practice of Statistics Third Edition Chapter 5: Producing Data Copyright © 2008 by W. H. Freeman & Company

Chapter 5 – Producing Data
Sampling – a technique used to study a part or sample of a larger group in order to gain information about the entire group. The sample must be chosen carefully. Experiment – involves more than observations or questions of individuals. A condition is imposed on the individuals in order to observe a response. Experiments must be designed carefully. Confounding variables can disguise the effects of explanatory variables on response variables. Experiments must be designed to control these variables.

Section 5.1 – Designing Samples

Sample design is the method used to select a sample from a population
Sample design is the method used to select a sample from a population. Poor sample design will lead to bias and misleading conclusions

Simple random samples - a sampling design method which attempts to eliminate bias.
The easiest way to construct a SRS is to place names, numbers, etc. in a “hat” and chose.

Using a random number table to generate a random sample.
Assign a numerical label to every individual in the population. Use table B to select labels at random. Don’t scramble labels as you assign them. The table will randomize. All labels must have the same number of digits. Ex. If choosing 5 individuals out of 30. Assign: 01,02,03,04,……30 not 1,2,3,4,…….30. You can read Table B in any order and start anywhere. Standard practice is to read across rows.

Other sampling designs

Other sampling designs - continued
Multistage Sample – each stage is selected by a SRS Ex. Want to personally interview 60,000 people in the U.S. Stage 1 - Take a SRS of the 3000 counties in the U.S. Stage 2 - Take a SRS of the towns within each chosen county. Stage 3 – Select a SRS of streets within each chosen town. Stage 4 – Select a SRS of households on each street.

Sample Bias may be introduced by the following: Response Bias – Respondents may lie or be influenced by the race, sex, attitude or questioning techniques of the interviewer. Wording of the question can introduce bias.

Even if great care is taken to design and carry out a sample survey, it is highly unlikely that the sample reflects the population exactly. However, the results do obey the laws of probability because of random sampling. So we can determine The margin of error. This is called statistical inference. Large samples tend to give more accurate results than smaller samples.

observing and measuring specific
observing and measuring specific characteristics without attempting to modify the subjects being studied Observational Study

apply some treatment and then observe its effects on the subjects or experimental units

of n subjects selected in such a way that every possible sample of the same size n has the same chance of being chosen Simple Random Sample

selecting members from a population in such a way that each member of the population has a known (but not necessarily the same) chance of being selected Probability Sample

Systematic Sampling Select some starting point and then
select every kth element in the population Systematic Sampling

use results that are easy to get
Convenience Sampling use results that are easy to get

subdivide the population into at
least two different subgroups that share the same characteristics, then draw a sample from each subgroup Stratified Sampling

divide the population area into sections; randomly select some of those sections; choose all members from selected sections Cluster Sampling

Collect data by using some combination of the basic sampling methods
Pollsters select a sample in different stages, and each stage might use different methods of sampling Multistage Sampling

Randomization is used when subjects are assigned to different groups through a process of random selection. The logic is to use chance as a way to create two groups that are similar.

Replication is the repetition of an experiment on more than one subject. Samples should be large enough so that the erratic behavior that is characteristic of very small samples will not disguise the true effects of different treatments.

Blinding is a technique in which the subject doesn’t know whether he or she is receiving a treatment or a placebo. Blinding allows us to determine whether the treatment effect is significantly different from a placebo effect, which occurs when an untreated subject reports improvement in symptoms.

Double-Blind Blinding occurs at two levels:
(1) The subject doesn’t know whether he or she is receiving the treatment or a placebo (2) The experimenter does not know whether he or she is administering the treatment or placebo

Confounding occurs in an experiment when the experimenter is not able to distinguish between the effects of different factors.

Controlling Effects of Variables
Completely Randomized Experimental Design assign subjects to different treatment groups through a process of random selection Randomized Block Design a block is a group of subjects that are similar, but blocks differ in ways that might affect the outcome of the experiment Rigorously Controlled Design carefully assign subjects to different treatment groups, so that those given each treatment are similar in ways that are important to the experiment Matched Pairs Design compare exactly two treatment groups using subjects matched in pairs that are somehow related or have similar characteristics

Summary Three very important considerations in the design of experiments are the following: 1. Use randomization to assign subjects to different groups 2. Use replication by repeating the experiment on enough subjects so that effects of treatment or other factors can be clearly seen. 3. Control the effects of variables by using such techniques as blinding and a completely randomized experimental design

Section 5.3 – Simulating Experiments
Simulation – The imitation of chance behavior, based on a model that accurately reflects the experiment under consideration. Ex. Flipping a coin to simulate the birth of a baby. Heads-> Boy or Tails -> Girl

Basic Simulation procedure
State the problem or describe the experiment. Ex. What is the likelihood of a run of 3 consecutive heads or 3 consecutive tails when a coin is tossed 10 times. State assumptions A head or a tail are equally likely to occur on each toss. Tosses are independent of each other.

3. Assign digits to represent outcomes
Use random number table or calculator. One digit represents one toss of the coin. Odd digits represent heads; even digits represent tails. Simulate many repetitions State conclusion.