Download presentation
Presentation is loading. Please wait.
Published byCharleen McCoy Modified over 9 years ago
1
EMIS 7300 SYSTEMS ANALYSIS METHODS FALL 2005 Dr. John Lipp Copyright © 2003-2005 Dr. John Lipp
2
EMIS7300 Fall 2005 Copyright 2003-2005 Dr. John Lipp 7-2 Session 2 Outline Part 1: Correlation and Independence. Part 2: Confidence Intervals. Part 3: Hypothesis Testing. Part 4: Linear Regression.
3
EMIS7300 Fall 2005 Copyright 2003-2005 Dr. John Lipp 7-3 Today’s Topics Hypothesis Testing –Null and Alternate Hypothesis. –Test Statistic and Reference (Null) Distribution. –Acceptance and Rejection Regions. –Type I Error. –Type II Error. – Significance. –Power. –P-value. Confidence Intervals vs. Hypothesis Testing Contingency Tables.
4
EMIS7300 Fall 2005 Copyright 2003-2005 Dr. John Lipp 7-4 Hypothesis Testing Consider the question: “Is one dart gun better than another?” –This is an example of a hypothesis statement. –In statistics a hypothesis statement is generally made regarding population or model parameters. –However, any statistical question can be a legitimate hypothesis statement!!!
5
EMIS7300 Fall 2005 Copyright 2003-2005 Dr. John Lipp 7-5 Hypothesis Testing (cont.) The procedure for determining which hypothesis is true is a test of hypothesis (or hypothesis test). In the case of a dart gun, a statistical translation of “is one dart gun better than another” might be are the variances (or standard deviations equal? That is, H 0 : 1 2 – 2 2 = 0 This is known as the null hypothesis. The null hypothesis involves parameter equality in simple hypothesis tests.
6
EMIS7300 Fall 2005 Copyright 2003-2005 Dr. John Lipp 7-6 Hypothesis Testing (cont.) The “opposite” hypothesis in this example is H 1 : 1 2 – 2 2 0 and is known as the alternate hypothesis. Two other alternate hypotheses are possible: H 1 : 1 2 – 2 2 > 0 H 1 : 1 2 – 2 2 < 0 These are referred to as one-side alternate hypothesis. The alternate hypothesis H 1 : 1 2 – 2 2 0 is called a two- sided hypothesis and is equivalent to the two one-side alternate hypothesis.
7
EMIS7300 Fall 2005 Copyright 2003-2005 Dr. John Lipp 7-7 Hypothesis Testing (cont.) All hypothesis tests involve evaluating a test statistic, appropriate to the particular situation, with sample statistics computed from the sample data. Since the test statistic is computed from random data, it is a random variable with a conditional distribution that is either –Consistent with the null hypothesis being true: (known as the reference distribution or null distribution). –Consistent with the alternate hypothesis being true:
8
EMIS7300 Fall 2005 Copyright 2003-2005 Dr. John Lipp 7-8 Hypothesis Testing (cont.) For example, f Z|H 0 (z|H 0 ) or f Z|H 1 (z|H 1 ) z
9
EMIS7300 Fall 2005 Copyright 2003-2005 Dr. John Lipp 7-9 Hypothesis Testing (cont.) Rejecting the null hypothesis when it is true is called a type I error. The probability of a type I error is denoted and is also known as the significance level of the test. The range of the test statistic values under which the null hypothesis is determined to be valid and accepted is called the acceptance region. Logically, the acceptance region should be chosen to achieve the desired using while being as compact as possible.
10
EMIS7300 Fall 2005 Copyright 2003-2005 Dr. John Lipp 7-10 Hypothesis Testing (cont.) z f Z|H 0 (z|H 0 ) 1- (z 2 ) = / 2 (z 1 ) = / 2 Acceptance Region:
11
EMIS7300 Fall 2005 Copyright 2003-2005 Dr. John Lipp 7-11 Hypothesis Testing (cont.) The critical region (or rejection region) is where the null hypothesis is determined to be invalid and is not accepted. It is the compliment of the acceptance region. Failing to reject the null hypothesis when it is false is called a type II error. The probability of a type II error is denoted The power of a hypothesis test is the probability of accepting the alternate hypothesis when the alternate hypothesis is true. An alternate view of power is the ability of the test to correctly reject a false null hypothesis.
12
EMIS7300 Fall 2005 Copyright 2003-2005 Dr. John Lipp 7-12 Hypothesis Testing (cont.) Because the alternate hypothesis often involves a range of parameter values, or Power can be hard to determine without some knowledge of the legitimate variations in the population’s statistical parameters. z f Z|H 0 (z|H 0 ) or f Z|H 1 (z|H 1 )
13
EMIS7300 Fall 2005 Copyright 2003-2005 Dr. John Lipp 7-13 Hypothesis Testing (cont.) Because of this difficulty in finding , in most hypothesis tests it is the value that is chosen, not the value. The P-value is the smallest value of resulting in rejection of the null hypothesis with the available data. –Unlike , , and Power, the P-value is a statement about the quality of the data, not the test or hypothesis. NOTE: When the sample size gets large, the null hypothesis is almost certain to be rejected because the data won’t exactly fit the model being assumed!!!
14
EMIS7300 Fall 2005 Copyright 2003-2005 Dr. John Lipp 7-14 Hypothesis Testing – Procedure 1.What are the (population) parameters of interest? (Confirm with probability paper, theory, etc. the data’s distribution.) 2.Form the Null Hypothesis. 3.Form the Alternate Hypothesis. (One-sided or two-sided?) 4.Choose acceptable level of significance . 5.Determine the appropriate statistic. 6.Find the acceptance and critical regions. 7.Gather the data and compute the test statistic. 8.Conclusion! (Consider the hypothesis result’s statistical significance vs. the practical significance.)
15
EMIS7300 Fall 2005 Copyright 2003-2005 Dr. John Lipp 7-15 Hypothesis Testing (cont.) Question: Is one dart gun better than the other? 1. 2. 3. 4. 5. 6. 7. 8.
16
EMIS7300 Fall 2005 Copyright 2003-2005 Dr. John Lipp 7-16 Confidence Interval vs. Hypothesis Test Confidence intervals and hypothesis tests often cousins –Two-sided hypothesis test for a mean of 3 for normally distributed data with unknown variance. –Confidence interval for normally distributed data with unknown variance that bounds 3. There are some questions which do not make sense a confidence interval.
17
EMIS7300 Fall 2005 Copyright 2003-2005 Dr. John Lipp 7-17 Contingency Tables Classify a set of n observations using two methods. Each method should have a discrete range of values: –r = number of values (levels) for the first method. –c = number of values (levels) for the second method. Build a table of the frequency of each descriptor combination –The rows are the first classification method. –The columns are the second classification method. Calculate the relative frequency of each descriptor – = the row relative frequencies, i = 1,…, r. – = the column relative frequencies, j = 1,…, c.
18
EMIS7300 Fall 2005 Copyright 2003-2005 Dr. John Lipp 7-18 Contingency Tables (cont.) The result is known as a contingency table. The purpose of the table is to test a hypothesis on the statistical independence of the two classification methods. O ij A2345678910JQK
19
EMIS7300 Fall 2005 Copyright 2003-2005 Dr. John Lipp 7-19 Contingency Tables (cont.) Build a second table by estimating what should have been in the first table had independence been true. Use the formula E ij A2345678910JQK
20
EMIS7300 Fall 2005 Copyright 2003-2005 Dr. John Lipp 7-20 Contingency Tables (cont.) Compute the test statistic which is approximately chi-square distributed with (r - 1)(c - 1) degrees of freedom. Accept independence if 2 < 2 ,(r-1)(c-1). [O ij - E ij ] 2 /E ij A2345678910JQK
21
EMIS7300 Fall 2005 Copyright 2003-2005 Dr. John Lipp 7-21 Homework Use Excel (or your favorite number crunching program) to: 1.Finish the hypothesis test for “Is one dart gun better than the others” by filling in the numbers. 2.Complete the contingency table for the card data.
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.