Presentation is loading. Please wait.

Presentation is loading. Please wait.

Confidence intervals and hypothesis testing Petter Mostad 2005.10.03.

Similar presentations


Presentation on theme: "Confidence intervals and hypothesis testing Petter Mostad 2005.10.03."— Presentation transcript:

1 Confidence intervals and hypothesis testing Petter Mostad 2005.10.03

2 Confidence intervals (repetition) Assume μ and σ 2 are some real numbers, and assume the data X 1,X 2,…,X n are a random sample from N(μ,σ 2 ). –Then –thus –so and we say that is a confidence interval for μ with 95% confidence, based on the statistic

3 Confidence intervals, general idea We have a model with an unknown parameter We find a ”statistic” (function of the sample) with a known distribution, depending only on the unknown parameter This distribution is used to construct an interval with the following property: If you repeat many times selecting a parameter and simulating the statistic, then about (say) 95% of the time, the confidence interval will contain the parameter

4 Hypothesis testing Selecting the most plausible model for the data, among those suggested Example: Assume X 1,X 2,…,X n is a random sample from N(μ,σ 2 ), where σ 2 is known, but μ is not; we want to select μ fitting the data. One possibility is to look at the probability of observing the data given different values for μ. (We will return to this) Another is to do a hypothesis test

5 Example We select two alternative hypotheses: –H 0 : –H 1 : Use the value of to test H 0 versus H 1 : If is far from, it will indicate H 1. Under H 0, we know that Reject H 0 if is outside

6 General outline for hypothesis testing The possible hypotheses are divided into H0, the null hypothesis, and H1, the alternative hypothesis A hypothesis can be –Simple, so that it is possible to compute the probability of data (e.g., ) –Composite, i.e., a collection of simple hypotheses (e.g., )

7 General outline (cont.) A test statistic is selected. It must: –Have a higher probability for ”extreme” values under H 1 than under H 0 –Have a known distribution under H 0 (when simple) If the value of the test statistic is ”too extreme”, then H 0 is rejected. The probability, under H 0, of observing the given data or something more extreme is called the p- value. Thus we reject H 0 if the p-value is small. The value at which we reject H 0 is called the significance level.

8 Note: There is an asymmetry between H 0 and H 1 : In fact, if the data is inconclusive, we end up not rejecting H 0. If H 0 is true the probability to reject H 0 is (say) 5%. That DOES NOT MEAN we are 95% certain that H 0 is true! How much evidence we have for choosing H 1 over H 0 depends entirely on how much more probable rejection is if H 1 is true.

9 Errors of types I and II The above can be seen as a decision rule for H0 or H1. For any such rule we can compute (if both H0 and H1 are simple hypotheses): P(accept | H 0 )P(accept | H 1 ) P(reject | H 0 )P(reject | H 1 ) Accept H 0 Reject H 0 H 0 true H 1 true TYPE II error TYPE I error Significance 1 - power

10 Significance and power If H 0 is composite, we compute the significance from the simple hypothesis that gives the largest probability of rejecting H 0. If H 1 is composite, we compute a power value for each simple hypothesis. Thus we get a power function.

11 Example 1: Normal distribution with unknown variance Assume Then Thus So a confidence interval for, with significance is given by

12 Example 1 (Hypothesis testing) Hypotheses: Test statistic under H 0 Reject H 0 if or if Alternatively, the p-value for the test can be computed (if ) as the such that

13 Example 1 (cont.) Hypotheses: Test statistic assuming Reject H 0 if Alternatively, the p-value for the test can be computed as the such that

14 Example 1 (cont.) Assume that you want to analyze as above the data in some column of an SPSS table. Use ”Analyze” => ”Compare means” => ”One-sample T Test” You get as output a confidence interval, and a test as the one described above. You may adjust the confidence level using ”Options…”

15 Example 2: Differences between means Assume and We would like to study the difference Four different cases: –Matched pairs –Known population variances –Unknown but equal population variances –Unknown and possibly different pop. variances

16 Known population variances We get Confidence interval for

17 Unknown but equal population variances We get where Confidence interval for

18 Hypothesis testing: Unknown but equal population variances Hypotheses: Test statistic: Reject H 0 if or if ”T test with equal variances”

19 Unknown and possibly unequal population variances We get where Conf. interval for

20 Hypothesis test: Unknown and possibly unequal pop. variances Hypotheses: Test statistic Reject H 0 if or if ”T test with unequal variances”

21 Practical examples: The lengths of children in a class are measured at age 8 and at age 10. Use the data to find an estimate, with confidence limits, on how much children grow between these ages. You want to determine whether a costly operation is generally done more cheaply in France than in Norway. Your data is the actual costs of 10 such operations in Norway and 20 in France.

22 Example 3: Population proportions Assume, so that is a frequency. Then Thus Confidence interval for (approximately, for large n)

23 Example 3 (Hypothesis testing) Hypotheses: Test statistic under H 0, for large n Reject H 0 if or if

24 Example 4: Differences between population proportions Assume and, so that and are frequencies Then Confidence interval for (approximately)

25 Example 4 (Hypothesis testing) Hypotheses: Test statistic where Reject H 0 if

26 Example 5: The variance of a normal distribution Assume Then Thus Confidence interval for

27 Example 6: Comparing variances for normal distributions Assume We get F nx-1,ny-1 is an F distribution with n x -1 and n y -1 degrees of freedom We can use this exactly as before to obtain a confidence interval for and for testing for example if Note: The assumption of normality is crucial!

28 Sample size computations For a sample from a normal population with known variance, the size of the conficence interval for the mean depends only on the sample size. So we can compute the necessary sample size to match a required accuracy Note: If the variance is unknown, it must somehow be estimated on beforehand to do the computation Works also for population proportion estimation, giving an inequality for the required sample size

29 Power computations If you reject H 0, you know very little about the evidence for H 1 versus H 0 unless you study the power of the test. The power is 1 minus the probability of rejecting H 0 given that a hypothesis in H 1 is true. Thus it is a function of the possible hypotheses in H 1. We would like our tests to have as high power as possible.


Download ppt "Confidence intervals and hypothesis testing Petter Mostad 2005.10.03."

Similar presentations


Ads by Google