1Lecture 6 Outline – Thur. Jan. 29 Review from Lecture 5Sampling with vs. without replacementConfidence IntervalsCase Study 2.1.1Two sample t-test and confidence intervals (Chapter 2.3)Levene’s test for equality of variances (Chapter 4.5.3)
2Terminology ReviewA statistic is any quantity computed from the sample, e.g., sample mean, sample standard deviation, minimum of sample.The sampling distribution of a statistic for a sample of size n is the probability distribution for the statistic over repeated random samples of size n.The sampling distribution of the value for a sample of size 1 is called the population distribution.
3Standard Deviations and Standard Errors The standard deviation of a statistic is the standard deviation of the statistic’s probability distribution, i.e., the square root of the average squared distance of the statistic from its mean over repeated samples.The standard error of a statistic is an estimate of the statistics’ standard deviation.Example: For sampling with replacement,
4Sampling with vs. without replacement For a sample of size n from a population of size N without replacement,The factor is called the finite populationcorrection (FPC).Note that the FPC is near 1 if N/n>50 so that we regard sampling with replacement and sampling without replacement as essentially equivalent if N/n>50.
5One-sample t-tools and paired t-test Testing hypotheses about the mean difference in pairs is equivalent to testing hypotheses about the mean of a single populationProbability model: Simple random sample with replacement from population.Test statistic:
6p-valueFact: If H0 is true, then t has the Student’s t-distribution with n-1 degrees of freedomCan look up quantiles of t-distribution in Table A.2.The (2-sided) p-value is the proportion of random samples with absolute value of t >= observed test statistic |to| if H0 is true.Schizophrenia example: to=3.23, p-value = Prob>|t| =The reliability of the p-value (as the probability of observing as extreme a test statistic as the one actually observed if H0 is true) is only guaranteed if the probability model of random sampling is correct – if the data is collected haphazardly rather than through random sampling, the p-value is not reliable.
8Matched pairs t-test in JMP Click Analyze, Matched Pairs, put two columns (e.g., affected and unaffected) into Y, Paired Response.Can also use one-sample t-test. Click Analyze, Distribution, put difference into Y, columns. Then click red triangle under difference and click test mean.
9Confidence IntervalsPoint estimate: a single number used as the best estimate of a population parameter, e.g., for .Interval estimate (confidence interval): range of values used as an estimate of a population parameter.Uses of a confidence interval:Provides a range of values that is “likely” to contain the true parameter. Confidence interval can be thought of as the range of values for the parameter that are “plausible” given the data.Conveys precision of point estimate as an estimate of population parameter.
10Confidence interval construction A confidence interval typically takes the form:point estimate margin of errorThe margin of error depends on two factors:Standard error of the estimateDegree of “confidence” we want.Margin of error = Multiplier for degree of confidence * SE of estimateFor a 95% confidence interval, the multiplier for degree of confidence is about 2 in most cases.
11CI for population meanIf the population distribution of Y is normal and the sample is a random sample, CI for mean of single population:For schizophrenia data, 95% CI:
12Interpretation of CIsA 95% confidence interval will contain the true parameter (e.g., the population mean) 95% of the time if repeated random samples are taken.It is impossible to say whether it is successful or not in any particular case, i.e., we know that the CI will usually contain the true mean under random sampling but we do not know for the schizophrenia data if the CI (0.067cm3 ,0.331cm3) contains the true mean difference.Confidence interval will only have guaranteed coverage if the assumptions about the probability model are correct, in particular the sample must be a random sample.
13Confidence Intervals in JMP For both methods of doing paired t-test (Analyze, Matched Pairs or Analyze, Distribution), the 95% confidence intervals for the mean are shown on the output.
14Factors determining width of confidence interval confidence interval for under random sampling with replacement from a normal population:Factors determining width of confidence interval:Population standard deviationSample size nDegree of confidence
15Case Study 2.1.1Background: During a severe winter storm in New England, 59 English sparrows were found freezing and brought to Bumpus’ laboratory – 24 died and 35 survived.Broad question: Did those that perish do so because they lacked physical characteristics enabling them to withstand the intensity of this episode of selective elimination?Specific questions: Do humerus (arm bone) lengths tend to be different for survivors than for those that perished? If so, how large is the difference?
16Structure of Data Two independent samples Observational study – cannot infer a causal relationship between humerus length and survivalSparrows were not collected randomly.Fictitious probability model: Independent simple random samples with replacement from two populations (sparrows that died and sparrows that survived). See Display 2.7
17Two-sample t-test Population parameters: H0: , H1: Equal spread model: (call it )Statistics from samples of size n1 and n2 from pops. 1 and 2:For Bumpus’ data:
18Sampling Distribution of (equal spread model)Pooled estimate of :See Display 2.8
19Confidence Interval for Assume the population distributions of group 1 and group 2 are both normal.100(1- )% confidence interval for :For 95% confidence interval,Bumpus’ data: 95% confidence interval:inches
20Two sample t-testH0: , H1:Test statistic: Values of t that are fartherfrom zero are more implausible under H0If population distributions are normal with equal , then if H0 is true, the test statistic t has a Student’s t distribution with degrees of freedom.p-value equals probability that |t| would be greater than observed |t| under random sampling model if H0 is true; calculated from Student’s t distribution.For Bumpus data, two-sided p-value = .0809, suggestive but inconclusive evidence of a difference
21Two sample tests and CIs in JMP Click on Analyze, Fit Y by X, put Group variable in X and response variable in Y, and click OKClick on red triangle next to Oneway Analysis and click Means/ANOVA/t-test (Means/ANOVA/pooled t in JMP version 5).To see the means and standard deviations themselves, click on Means and Std Dev under red triangle
23Bumpus’ Data Revisited Bumpus concluded that sparrows were subjected to stabilizing selection – birds that were markedly different from the average were more likely to have died.Bumpus (1898): “The process of selective elimination is most severe with extremely variable individuals, no matter in what direction the variations may occur. It is quite as dangerous to be conspicuously above a certain standard of organic excellence as it is to be conspicuously below the standard. It is the type that nature favors.”Bumpus’ hypothesis is that the variance of physical characteristics in the survivor group should be smaller than the variance in the perished group
24Testing Equal Variances Two independent samples from populations with variances andH0: vs. H1:Levene’s Test – Section 4.5.3In JMP, Fit Y by X, under red triangle next to Oneway Analysis of humerus by group, click Unequal Variances. Use Levene’s test.p-value = .4548, no evidence that variances are not equal, thus no evidence for Bumpus’ hypothesis.