Presentation on theme: "Chapter 25: Paired Samples and Blocks. Paired Data Paired data arise in a number of ways. Compare subjects with themselves before and after treatment."— Presentation transcript:
Paired Data Paired data arise in a number of ways. Compare subjects with themselves before and after treatment Blocking: pairs arise from an experiment Matching: pairs arise from an observational study
Matched Pairs t-test Treat the differences as if they were the data, ignoring the original columns of data. Since we have one column of values, we will use a simple one-sample t-test. A matched pair t-test is just a one- sample t-test for the means of these pairwise differences.
Assumptions and Conditions Paired data assumption: Data must be paired; the samples can’t be independent. Justify your claim that the data are paired. Independence assumption: Randomization : What we want to know usually focuses our attention on where the randomness should be. Independence assumption: 10% condition: When the inference is about a population from which the paired individuals are drawn, we must be sure that we have sampled no more than 10% of that population. Normal population assumption: Nearly Normal condition: Check with a histogram or Normal probability plot of the differences.
A Paired t-test for the Mean Differences Between Two Groups When the conditions are met, test whether the paired differences differ significantly from zero. We test the hypothesis: where the d’s are the pairwise differences and O is almost always zero. We use the statistic: n is the number of pairs and When the conditions are met and the null hypothesis is true, this statistic follows a Student’s t-model on n – 1 degrees of freedom, so we can use that model to obtain a P-value.
Health Dept. Workers and Mileage Hypothesis: H O : The mileage driven by each health dept worker during a four-day work week is the same as his mileage for a five-day work week. The mean difference is zero. H A : The mean difference is different from zero.
Health Dept. Workers and Mileage Paired data assumption: The data are paired because hey are measurements on the same individuals before and after a change in work schedule. Independence assumption: The behavior of any individual is independent of the behavior of others, so the differences are mutually independent. Randomization:The measured values are the sums of individual trips, each of which experienced random events that arose while driving. 10% condition: Our inference is about driving amounts, not about the workers, so we don’t need to check this condition.
Health Dept. Workers and Mileage Nearly Normal Condition: The histogram of the differences is unimodal and symmetric. Under these conditions the sampling distribution of the differences can be modeled by a Student’s t-model with (n – 1) = 10 degrees of freedom. We will use a paired t-test.
Health Dept. Workers and Mileage Find from the data: We know STAT TESTS T-Test
Health Dept. Workers and Mileage Interpretation: With a P-value this small, we can reject the null hypothesis. We conclude that the change in the work week did lead to a change in driving mileage. We should look at the confidence interval. If the difference in mileage proves to be large in a practical sense, then we might recommend a change in schedule for the rest of the department.
Confidence Intervals for Matched Pairs Paired t-interval When the conditions are met, find the confidence interval for the mean of the paired differences. Since the standard error of the mean difference is The confidence interval is The critical value t * depends on the particular confidence level C that is specified and on the degrees of freedom (n – 1) which is based on the number of pairs, n.
Husbands and Wives We wish to find an interval that is likely with 95% confidence to contain the true mean difference in ages of husbands and wives. Histogram: (1 st 16 pairs) Conditions: Paired data assumption: The data are paired because they are members of married couples. Randomization: These couples were randomly selected. 10% condition: The sample is less than 10% of the population of married couples in Britain. Nearly Normal condition: The histogram of the differences is unimodal and symmetric.
Husbands and Wives Under these conditions, the sampling distribution of the differences can be modeled by a Student’s t- model with (n – 1) = 169 degrees of freedom. We will find a paired t-interval. We know STAT TEST TInterval
Husbands and Wives Interpretation: We are 95% confident that in married couples in Britain, the husband is, on average, between 1.6 and 2.8 years older than his wife.
Blocking Pairing removes the extra variation and allows us to focus on the individual differences. In experiments, we block to separate the variability between the the experimental units from the variability in the response.
Caution!! Don’t use a two-sample t-test for paired data. Don’t use a paired-t method when the samples aren’t paired. Don’t forget outliers. Don’t look for the difference in side- by-side boxplots. A scatterplot of the two variables can sometimes be helpful.