Presentation on theme: "1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests."— Presentation transcript:
1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests
2 SESSION 2 From description to inference: hypothesis testing
3 INDUCTIVE REASONING Traditional Aristotelian logic is DEDUCTIVE: it argues from the general to the particular. Statistical inference reverses this process by arguing INDUCTIVELY from the particular (the sample) to the general (the population). Statistical inference, therefore, is subject to error and inferences must be expressed in terms of probabilities.
5 Estimates There are two types of estimate: 1.POINT ESTIMATES. For example, we might use the sample mean as an estimate of the value of the population mean. 2.INTERVAL ESTIMATES. On the basis of sample data, we can specify a range of values within which we can assume with specified levels of CONFIDENCE that the population value lies. I discuss confidence intervals in the appendix to this talk.
6 Confirming our data Suppose that we have found, in our results, a pattern that we would like to confirm, such as a difference between means. Could this pattern have arisen merely through sampling error? Would another research team who collect data of this type obtain a similar result? Hypothesis testing can provide an answer to questions of this sort.
7 Statistical hypotheses A statistical hypothesis is a statement about a population, usually to the effect that a parameter, such as the mean, has a specified value, or that the means of two or more populations have the same value or different values. Here, by population is always meant a probability distribution, a hypothetical population of numbers.
8 Two hypotheses In hypothesis testing, as widely practised at present by researchers, a decision is made between two complementary hypotheses, which are set up in opposition to each other: the null hypothesis (H 0 ); the alternative hypothesis (H 1 ).
9 The null hypothesis The null hypothesis (H 0 ) is the statistical equivalent of the hypothesis of NO EFFECT, the negation of the scientific hypothesis. For example, if a researcher thinks a set of scores is from a population with a different mean than a control population, the null hypothesis will state that there is NO such difference. The alternative hypothesis (H 1 ) is that the null hypothesis is false. In traditional statistical testing, it is the null hypothesis, not the alternative hypothesis, that is tested.
10 Number of samples Tests of some hypotheses can be made by drawing a single sample of scores. Other hypotheses, however, can only be tested by drawing two or more samples. It is easiest to consider the elements of hypothesis testing by considering one- sample tests first.
12 Situation (a) The population standard deviation is known
13 An example Let us suppose that in the island of Erewhon, mens heights have an approximately normal distribution with a mean of 69 inches and an SD of 3.2 inches. A researcher wonders whether there might be a tendency for those in the north of the island to be taller than the general population. A sample of 100 northerners has a mean height of 69.8 inches. Remembering that this is merely a sample from the population of northerners, do we have evidence that northerners are taller?
14 Steps in testing a hypothesis Formulate the null and alternative HYPOTHESES. Decide upon a SIGNIFICANCE LEVEL. Decide upon an appropriate TEST STATISTIC. Decide upon the CRITICAL REGION, a range of unlikely values for the test statistic, that is, less probable than the significance level. If the value of the test statistic falls within the critical region, the null hypothesis is rejected.
15 The null and alternative hypotheses The null hypothesis is that, contrary to the researchers speculation, the height of northerners is no different from that of the general population. The alternative hypothesis is that northerners are of different height.
16 Significance level The significance level is a small probability fixed by tradition. The significance level is commonly set at.05, but in some areas researchers insist upon a lower level, such as.01. We shall set the level at.05.
17 Revision We are talking about a situation in which a single sample has been drawn from a population. Here the reference set is the population or probabililty distribution of such samples, which is known as the SAMPLING DISTRIBUTION OF THE MEAN. Its SD is known as the STANDARD ERROR OF THE MEAN (σ M ).
19 The standard normal distribution Questions about ranges of values in any normal distribution can always be referred to questions about corresponding values in the STANDARD NORMAL DISTRIBUTION. We do this by tranforming the original values to values of z, the STANDARD NORMAL VARIATE.
20 The standard normal distribution We transform the original value to z by subtracting the mean, then dividing by the standard deviation. In this case, we must divide by σ M, not σ.
21 The test statistic Since we know the SD σ, we can use as our test statistic z, where the denominator is the STANDARD ERROR OF THE MEAN, that is, the SD of the sampling distribution of the mean.
22 The critical region We want the total probability of a value in the critical region to be.05, that is the significance level. We distribute that probability equally between the two tails of the distribution:.025 in each tail.
23 Calculate the value of z Since this value falls within the critical region, the null hypothesis is rejected. We have evidence that the northerners are taller.
24 The p-value The p-value of a test statistic is the probability, assuming that the null hypothesis is true, of obtaining a value of the test statistic at least as unlikely as the one obtained. The p-value must be clearly distinguished from the significance level (say.05): the significance level is fixed beforehand; but the p-value is determined by your own data.
25 Use of the p-value If the p-value is less than the significance level, the value of your test statistic must have fallen within the critical region. But the p-value tells you more than this. A high p-value means that the value of the test statistic is well short of being significant; whereas a low p-value means we are well over the line.
26 The one-tailed p-value The ONE-TAILED p-value is the probability of a value of the test statistic at least as extreme (in the same direction) as the value actually obtained.
27 The one-tailed p-value We obtain the one-tailed p- value by subtracting the cumulative probability of 2.5 from 1: 1 -.9938 =.0062
28 One-tailed and two-tailed p-values If the region of rejection is located in both tails of the sampling distribution, as in the present example, a TWO-TAILED p-value must be calculated. We must DOUBLE the one-tailed p-value. If we didnt do that, a value only marginally significant would seem to have a probability of only.025, not.05 as previously decided. So if the p-value in either direction is less than.025, the two-sided p-value is less than.05, and we have significance.
29 The two-tailed p-value of 2.5 We must now double the one- tailed p- value:.0062 × 2 =.0124.
31 Directional hypothesis Our researcher suspects that the northerners are TALLER, not simply that they are of DIFFERENT height. This is a DIRECTIONAL hypothesis. On this basis, it could be (and is) argued that the critical region, with a probability of.05, should be located entirely in the UPPER tail of the standard normal distribution.
33 Comparison of the critical regions If you are only interested in the possibility of a difference in ONE direction, you might decide to locate the critical region entirely in one tail of the distribution. 0.025 (2.5%) 0.025 (2.5%) 0.05 (5%)
34 Easier to get a significant result Note that, on a one-tail test, you only need a z-value of 1.64 for significance, rather than a value of 1.96 for a two-tail test. So, on a one-tail test, its easier to get significance IN THE DIRECTION YOU EXPECT.
36 Type I errors Suppose the null hypothesis is true, but the value of z falls within the critical region. We shall reject the null hypothesis, but, in so doing, we shall have made a Type I or alpha (α) error. The probability of a Type I error is simply the chosen significance level and in our example its value is.05.
37 Probability of a Type I error Suppose H 0 is true. If the value of z falls within either tail, we shall reject H 0 and make a Type I error. The probability that we shall do this is the significance level,.05. Accordingly, the significance level is also referred to as the ALPHA-LEVEL.
38 Type II (beta) errors Suppose the null hypothesis is false. The value of test statistic, however, does not fall within the critical region and the null hypothesis is accepted. We have made a Type II or beta (β) error.
39 Power The POWER of a statistical test is the probability that, if the null hypothesis is false, it will be rejected by the statistical test. When the power of a test is low, an insignificant test result is impossible to interpret: there may indeed be nothing to report; but the researcher has no way of knowing this.
40 Two distributions The following diagram shows the relationships among the significance level, the Type I error rate (significance level) and Power when the null hypothesis is tested against a one-sided alternative hypothesis that the mean has a higher value. (This is a one-tailed test.) The overlapping curves represent the sampling distributions of the mean under the null hypothesis (left) and the alternative hypothesis (right). In the diagram, μ 0 and μ 1 are the values of the population mean according to the null and alternative hypotheses, respectively.
41 Power and the type I and type II error rates
42 Points Any value of M to the left of the grey area will result in the acceptance of H 0. If H 1 is true (distribution on the right), a Type II error will have been made. Notice that the Power and Type II error rates sum to unity.
44 Factors affecting the power of a statistical test
45 Significance level and power In the upper figure, the red area is the.05 significance level; the green area is the Type II error rate. The lower figure shows that a lower significance level (e.g..01) reduces the probability of making a Type I error, but the probability of a type II error (green) increases and the power (P) decreases. β P β P
46 Size of the difference between μ 1 and μ 0. The greater the difference between the real population mean and the value assumed by the null hypothesis, the less the overlap between the sampling distributions. The less the overlap, the greater will be the area of the H 1 (right) curve beyond the critical value under H 0 and the greater the power of the test to reject the null hypothesis. The researcher has no control over this determinant of power.
49 Sample size Now we come to another important determinant of the power of a statistical test: sample size. This is the factor over which the research usually has the most control.
50 Revision The larger the sample, the smaller the standard error of the mean and the taller and narrower will be the sampling distribution if drawn to the same scale as the original distribution.
51 Effect of increasing the sample size n μ The IQ distribution Sampling distributions of the mean for n = 16 and n = 64. n = 64 n = 16
52 Sample size When there are two samples, therefore, larger samples will result in greater separation of the sampling distributions, reduction in the Type II error rate and more power.
53 Power and sample size Increasing the sample size reduces the overlap of the sampling distributions under H 0 and H 1 by making them taller and narrower. The beta-rate is reduced and the power (green area) increases. Small samples Large samples
54 Reliability of measurement Greater precision or RELIABILITY of measurement also reduces the standard error of the mean and improves the separation of the null and alternative distributions. The more separation between the sampling distributions, the greater the power of the statistical test. Jeanette Jackson will discuss the topic of reliability.
55 Situation (b) The population standard deviation is unknown
56 Very rarely do we know the population standard deviation
57 Vocabulary A researcher suspects that a new intake to a college of further education may require extra coaching to enrich their vocabulary. The College has been using a vocabulary test, students performance on which, over the years, has been found to have a mean of 50. The standard deviation is not known with certainty (estimates have varied and the records are incomplete); but the population distribution seems to be approximately normal. The 36 new students have vocabulary scores with a mean of 49 and a sample standard deviation of 2.4. Is this evidence that their vocabulary scores are not from the usual College student population?
62 Distribution of t Like the standard normal variate z, the distribution of t has a mean of zero. The t statistic, however, is not normally distributed. Although, like the normal distribution, it is symmetrical and bell-shaped, it has thicker tails: that is, large absolute values of t are more likely than large absolute values of z.
63 The family of t distributions There is only one standard normal distribution, to which any other normal distribution can be transformed; but there is a whole family of t distributions. A normal distribution has two parameters: the mean and the SD. A t distribution has ONE parameter, known as the DEGREES OF FREEDOM (df ).
64 Degrees of freedom The term is borrowed from physics. The degrees of freedom of a system is the number of constraints that must be placed upon it to determine its state completely. By analogy, the variance of n scores is calculated from the squares of n deviations from the mean; but deviations from the mean sum to zero, so if you know the values of (n – 1) deviations, you know the n th deviation. The degrees of freedom of the variance is therefore (n – 1).
65 Degrees of freedom of t The degrees of freedom of the one-sample t statistic is (n – 1), where n is the size of the sample. This is the degrees of freedom of the variance estimate from the sample. In our case, the degrees of freedom of the t statistic = (n – 1) = 36 – 1 = 35. As the size of n increases, the t distribution becomes more and more like the standard normal distribution.
66 Extreme values of t are more likely than extreme values of z
67 The critical region Arguably, since the administrators concern is with low scores, we can justify a one-tailed test here and locate the critical region exclusively in the lower tail of the distribution of t on 35 degrees of freedom. We want the critical region to the left of the 5 th percentile of the distribution in the lower tail.
69 Boundary of critical region for the t test lies further out in the tail Notice that the boundary (-1.69) of the critical region lies further out in the lower tail than does the 5 th percentile of the standard normal distribution (–1.64).
70 A significant result Our value of t (–2.5) lies within the critical region. The null hypothesis is therefore rejected and we have evidence that our sample is from a population with a mean score of less than 50.
73 Is this a repeatable result? The difference between the Caffeine and Placebo means is (11.90 – 9.25) = 2.65 hits. Could this difference have arisen merely from sampling error?
74 Independent samples The Caffeine experiment yielded two sets of scores - one set from the Caffeine group, the other from the Placebo group. There is NO BASIS FOR PAIRING THE SCORES. We have INDEPENDENT SAMPLES. We shall make an INDEPENDENT- SAMPLES t test.
75 The null hypothesis The null hypothesis states that, in the population, the Caffeine and Placebo means have the same value.
76 The alternative hypothesis The alternative hypothesis states that, in the population, the Caffeine and Placebo means do not have the same value.
77 Revision We are talking about a situation in which two samples have been drawn from identical normal populations and the difference between their means M 1 – M 2 has been calculated. Here the reference set is the population or probabililty distribution of such differences, which is known as the SAMPLING DISTRIBUTION OF THE DIFFERENCE (between means). Its SD is known as the STANDARD ERROR OF THE DIFFERENCE
79 The standard normal distribution Questions about ranges of values in any normal distribution can always be referred to questions about corresponding values in the STANDARD NORMAL DISTRIBUTION. We do this by tranforming the original values to values of z, the STANDARD NORMAL VARIATE.
80 If we knew the population standard deviation …
81 The standard normal distribution We could transform the original value to z by subtracting the mean, then dividing by the standard deviation. In this case, we would divide by
82 The test statistic We could have calculated z in the usual way:
83 But we dont know σ! So we must estimate the standard error of the difference from the statistics of our samples:
The critical region We shall reject the null hypothesis if the value of t falls within EITHER tail of the t distribution on 38 degrees of freedom. To be significant beyond the.05 level, our value of t must be greater than +2.02 OR less than –2.02. Since our value for t (2.60) falls within the critical region, the null hypothesis is rejected.
89 The p-value The one-tailed p-value is (1 – the cumulative probability of the t value 2.60), that is,.0066. To obtain the 2-tailed p-value, we must double this value: 2 ×.0066 =.0132.
90 Your report The scores of the Caffeine group (M = 11.90; SD = 3.28) were higher than those of the Placebo group (M = 9.25; 3.16). With an alpha-level of 0.05, the difference is significant: t(38) = 2.60; p =.0132 (two- tailed). degrees of freedom
91 Representing very small p-values Suppose, in the caffeine experiment, that the p- value had been very small indeed. (Suppose t = 6.0). The computer would have given your p- value as.000. NEVER write, p =.000. This is unacceptable in a scientific article. Write, p <.01, or p <.001. You would have written the present result as t(38) = 6.0; p <.001.
92 Lisa DeBruines guidelines Lisa DeBruine has compiled a very useful document describing the most important of the APA guidelines for the reporting of the results of statistical tests. I strongly recommend this document, which is readily available on the Web. http://www.facelab.org/debruine/Teaching/ Meth_A/http://www.facelab.org/debruine/Teaching/ Meth_A/ Sometimes the APA manual is unclear. In such cases, Lisa has opted for what seems to be the most reasonable interpretation. If you follow Lisas guidelines, your submitted paper wont draw fire on account of poor presentation of your statistics!
93 A one-tailed test? The null hypothesis states simply that, in the population, the Caffeine and Placebo means are equal. H 0 is refuted by a sufficiently large difference between the means in EITHER direction. But some argue that if our scientific hypothesis is that Caffeine improves performance, we should be looking at differences in only ONE direction.
94 Assumption In what follows, we shall assume that the researcher, on the basis of sound theory, has planned to make a one-tailed test. Accordingly, the critical region is located entirely in the upper tail of the distribution of t on 38 degrees of freedom.
95 The null hypothesis again The null and alternate hypotheses must be complementary: that is, they must exhaust the possibilities. If the alternative hypothesis says that the Caffeine mean is greater, the null hypothesis must say that it is not greater: that is, it is equal to OR LESS THAN the Placebo mean.
97 Direction of subtraction The direction of subtraction of one sample mean from the other is now crucial. You MUST subtract the Placebo mean from the Caffeine mean. Only a POSITIVE value of t can falsify the directional null hypothesis.
98 A smaller difference between the means Suppose that the mean score of the Caffeine group had been, not 11.90, but 10.99. The cell variances are the same as before. In other words, the Caffeine and Placebo means differ by only 1.74 points, rather than 2.65 points, as in the original example.
100 The result The value of t is now 1.71, which is greater than the critical value (1.69) on a one- tailed test. The null hypothesis that the Caffeine mean is no greater than the Placebo mean is rejected.
101 Report of the one-tailed test The scores of the Caffeine group (M = 10.97; SD = 3.28) were significantly higher than those of the Placebo group (M = 9.25; 3.16): t(38) = 1.71; p =.0477 (one-tailed).
102 Advantage of the one-tailed test Our t value of 1.69 would have failed to achieve significance on the two-tailed test, since the critical value there was +2.03. On the one-tailed test, however, t lies in the critical region and the null hypothesis is rejected.
103 More power In locating the entire critical region in the upper tail of the H 0 distribution, we increase the light-grey area and reduce the dark- grey area - the beta rate. In other words, we increase the POWER of the test to reject H 0.
104 An unexpected result Now suppose that, against expectation, the Placebo group had outperformed the Caffeine group. The mean for the Caffeine group is 9.25 and that for the Placebo group is 10.20. If we subtract the Placebo mean from the Caffeine mean as before, we obtain t = – 2.02. On a two-tailed test, this would have been in the critical region (p <.05) and we should have rejected the null hypothesis.
105 One-sided p-value We cannot, however, change horses and declare this unexpected result to be significant. In the one-tailed test, the null hypothesis is also one-sided. Accordingly, the p-value is also one-sided, that is, it is the probability that the (Caffeine – Placebo) difference would have been at least as LARGE in the positive direction as the one we obtained.
106 The one-sided p-value The one-sided p-value is the entire area under the curve TO THE RIGHT of your value of t. That area is 0.975. You have nothing to report.
107 Correct report of the one-tail test The scores of the Caffeine group (M = 9.25; SD = 3.16) were not significantly higher than those of the Placebo group (M = 10.20; SD = 3.28): t(38) = -2.02 ; p = 0.975 (one-tailed).
108 Why you cant change horses Having decided upon a one-tailed test, you cannot change to a two-tailed test when you get a result in the opposite direction to that expected. If you do, the Type I error rate increases.
109 The true Type I error rate. If you switch to a two- tailed test, your true Type I error rate is now the black area (0.05) PLUS the green area in the lower tail (0.025). This is 0.05 + 0.025 = 0.075, a level many would feel is too high. (See the OR rule in the appendix to my first talk.)
111 A reoccupation with significance For many years, following R. A. Fisher, the first to develop a system of testing, there was a preoccupation with significance and insufficient regard for the MAGNITUDE of the effect one was investigating. Fisher himself observed that, on a sufficiently powerful test, even the most minute difference will be statistically significant, however insubstantial it may be.
112 A substantial difference? We obtained a difference between the Caffeine and Placebo means of (11.90 – 9.25) = 2.75 score points. This difference, as we have seen, is significant in the statistical sense; but is it SUBSTANTIAL, that is, worth reporting?
115 Levels of effect size On the basis of scrutiny of a large number of studies, Jacob Cohen proposed that we regard a d of.2 as a SMALL effect size, a d of.5 as a MEDIUM effect size and a d of.8 as a LARGE effect size. So our experimental result is a large effect. When you report the results of a statistical test, you are now expected to provide a measure of the size of the effect you are reporting.
117 Complete report of your test The scores of the Caffeine group (M = 11.90; SD = 3.28) were higher than those of the Placebo group (M = 9.25; 3.16). With an alpha-level of 0.05, the difference is significant: t(38) = 2.60; p =.0132 (two- tailed). Cohens d =.82, a large effect
120 Nominal data A NOMINAL data set consists of records of membership of the categories making up QUALITATIVE VARIABLES, such as gender or blood group. Nominal data must be distinguished from SCALAR, CONTINUOUS or INTERVAL data, which are measurements of QUANTITATIVE variables on an independent scale with units. Nominal data sets merely carry information about the frequencies of observations in different categories.
121 A set of nominal data A medical researcher wishes to test the hypothesis that people with a certain type of body tissue (Critical) are more likely to have a potentially harmful antibody. Data are obtained on 79 people, who are classified with respect to 2 attributes: 1.Tissue Type; 2.Presence/Absence of the antibody.
122 A question of association Do more of the people in the critical group have the antibody? We are asking whether there is an ASSOCIATION between the variables of category membership (tissue type) and presence/absence of the antibody. The SCIENTIFIC hypothesis is that there is such an association.
123 The null hypothesis The NULL HYPOTHESIS is the negation of the scientific hypothesis. The null hypothesis states that there is NO association between tissue type and presence of the antibody.
124 Contingency tables (cross-tabulations) When we wish to investigate whether an association exists between qualitative or categorical variables, the starting point is usually a display known as a CONTINGENCY TABLE, whose rows and columns represent the categories of the qualitative variables we are studying. Contingency tables are also known as CROSS-TABULATIONS, or CROSSTABS.
125 The equivalent of a scatterplot The contingency table is the equivalent, for use with nominal data, of the scatterplot that is used to display bivariate continuous data sets.
127 Interpretation Is there an association between Tissue Type and Presence of the antibody? The antibody is indeed more in evidence in the Critical tissue group. It looks as if there may be an association.
129 Observed and expected cell frequencies Let O be the frequency of observations in a cell of the contingency table. From the marginal totals, we calculate the cell frequencies E that we should expect if there were NO ASSOCIATION between the two attributes Tissue Type and Presence/Absence of the antibody.
130 Testing the null hypothesis We test the null hypothesis by comparing the values of O and E. Large (O – E ) differences cast doubt upon the null hypothesis of no association.
131 What cell frequencies can be expected? The pattern of the OBSERVED FREQUENCIES (O) would suggest that there is a greater incidence of the antibody in the Critical tissue group. But the marginal totals showing the frequencies of the various groups in the sample also vary. What cell frequencies would we expect under the independence hypothesis?
133 Expected cell frequencies (E) According to the null hypothesis, the joint occurrence of the antibody and a particular tissue type are independent events. The probability of the joint occurrence of independent events is the product of their separate probabilities. (See the appendix of the first talk.) On this basis, we find the expected frequencies (E) by multiplying together the marginal totals that intersect at the cells concerned and dividing by the total number of observations.
136 Marked (O – E ) differences In both cells of the Critical group, there seem to be large differences between O and E: there are many fewer Nos than expected and many more Yess.
137 The chi-square ( χ 2 ) statistic We need a statistic which compares the differences between the O and E, so that a large value will cast doubt upon the null hypothesis of independence. The approximate CHI-SQUARE (χ 2 ) statistic fits the bill.
138 Formula for chi-square The element of this summation expresses the square of the difference between O and E as a proportion of E. Add up these proportional squared differences for all the cells in the contingency table.
139 The value of chi-square There are 8 terms in the summation, but only the first two and the last are shown in the calculation below.
140 Degrees of freedom To decide whether a given value of chi-square is significant, we must specify the DEGREES OF FREEDOM df of the chi-square statistic. If a contingency table has R rows and C columns, the degrees of freedom is given by df = (R – 1)(C – 1) In our example, R = 4, C = 2 and so df = (4 – 1)(2 – 1) = 3.
141 Significance The p-value of a chi-square with a value of 10.655 in the chi-square distribution with three degrees of freedom is.014. We should write this result as: χ 2 (3) = 10.66; p =.014. Since the result is significant beyond the.05 level, we have evidence against the null hypothesis of independence and evidence for the scientific hypothesis.
150 Confidence intervals A CONFIDENCE interval is a range of values centred on the value of the sample statistic and which one can assume with a specified level of confidence includes the true value of the parameter.
152 Equivalent probability statement An expression with terms such as < is known as an INEQUALITY. There are special rules for manipulating inequalities.
153 Inference about the population mean Notice that the population mean is now at the centre of the inequality and the sample mean is in the terms denoting the lower and upper limits of the interval. We have changed a statement about the sample mean to one about the population mean.
154 The 95% confidence interval on the sample mean You can be 95% confident that the value of the population mean lies within this range.
155 Example A sample of 100 people has a mean height of 69.8 inches. Suppose, (very unrealistically), that we know that the population SD is 3.2 inches, but we dont know the value of the population mean. Construct the 95% confidence interval on the mean.
156 The first step Calculate the standard error of the mean.
157 The 95% confidence interval You can be 95% confident that the population mean lies within this range.
158 Using the confidence interval to test the null hypothesis Notice that the 95% confidence interval on the mean, that is, [69.17, 70.43], does not include the value 69. If the confidence interval does not include the value specified by the null hypothesis, the hypothesis can be rejected. The two approaches lead to exactly the same decision about the null hypothesis.
159 Interpretation of a confidence interval The 95% confidence interval on our sample mean is, [69.17, 70.43]. We cannot say, The probability that the mean lies between 69.17 and 70.43 is.95. A confidence confidence interval is not a sample space. (See the appendix to my first talk.) A classical probability refers to a hypothetical future. Here, the die has already been cast and either the interval fell over the population mean or it didnt. In view of the manner in which the interval was constructed, however, we can be 95% confident that it fell over the true value of the population mean.