Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Chi-Square Distribution 1. The student will be able to  Perform a Goodness of Fit hypothesis test  Perform a Test of Independence hypothesis test.

Similar presentations


Presentation on theme: "The Chi-Square Distribution 1. The student will be able to  Perform a Goodness of Fit hypothesis test  Perform a Test of Independence hypothesis test."— Presentation transcript:

1 The Chi-Square Distribution 1

2 The student will be able to  Perform a Goodness of Fit hypothesis test  Perform a Test of Independence hypothesis test 2

3  Chi-square is a distribution test statistics used to determine 3 things  Does our data fit a certain distribution? Goodness-of-fit  Are two factors independent? Test of independence  Does our variance change? Test of single variance 3

4  Notation  new random variable ~  µ = df  2 = 2df  Facts about Chi-square  Nonsymmetrical and skewed right  value is always > zero  curve looks different for different degrees of freedom. As df gets larger curve approaches normal df > 90  mean is located to the right of the peak 4

5  Hypothesis test steps are the same as always with the following changes  Test is always a right-tailed test  Null and alternate hypothesis are in words rather than equations  degrees of freedom = number of intervals - 1  test statistic defined as 5

6 A 6-sided die is rolled 120 times. The results are in the table below. Conduct a hypothesis test to determine if the die is fair. 6 Face ValueFrequency 115 229 316 415 530 615

7  Contradictory hypotheses  H o : observed data fits a Uniform distribution (die is fair)  H a : observed data does not fit a Uniform distribution (die is not fair)  Determine distribution  Chi-square goodness-of-fit  right-tailed test  Perform calculations to find pvalue  enter observed into L1  enter expected into L2 7

8  Perform calculations (cont.)  TI83 Access LIST, MATH, SUM enter sum((L1 - L2) 2 /L2) this is the test statistic For our problem chi-square = 13.6  Access DISTR and chicdf syntax is (test stat, 1  99, df) generate pvalue For our problem pvalue = 0.0184  Make decision  since α > 0.0184, reject null  Concluding statement  There is sufficient evidence to conclude that the observed data does not fit a uniform distribution. (The die is not fair.) 8

9  Hypothesis testing steps the same with the following edit  Null and alternate in words  have a contingency table  expected values are calculated from the table (row total)(column total) sample size  Test statistic same  df = (#columns - 1)(#row - 1)  always right-tailed test 9

10  Conduct a hypothesis test to determine whether there is a relationship between an employees performance in a company’s training program and his/her ultimate success on the job. Use a level of significance of 1%.  H o : Performance in training and success on job are independent  H a : Performance in training and success on job are not independent (or dependent). 10

11  Performance on job versus performance in training Performance on Job 11 Below Average AverageAbove Average TOTAL Poor236029112 Average287960167 Very Good 94963121 TOTAL60188152400 Performance in training

12  Determine distribution  right tailed  chi-square  Perform calculations to find pvalue  Calculator will calculated expected values. We must enter contingency table as a Matrix (ack!) Access MATRIX and edit Matrix A Access Chi-square test Matrix A = observed Matrix B calculator places expected here 12

13  Perform calculations (cont.)  pvalue = 0.0005  Make decision.   = 0.01 > pvalue = 0.0005  reject null hypothesis  Concluding statement.  Performance in training and job success are dependent. 13

14 Linear Regression and Correlation Chapter Objectives 14

15 The student should be able to:  Discuss basic ideas of linear regression and correlation.  Create and interpret a line of best fit.  Calculate and interpret the correlation coefficient.  Find outliers. 15

16  Method for finding the “best fit” line through a scatterplot of paired data  independent variable (x) versus dependent variable (y)  Recall from Algebra  equation of line y = a + bx where a is the y-intercept b is the slope of the line if b>0, slope upward to right if b<0, slope downward to right if b=0, line is horizontal 16

17  The eye-ball method  Draw what looks to you to be the best straight line fit  Pick two points on the line and find the equation of the line  The calculated method  from calculus, we find the line that minimizes the distance each point is from the line that best fits the scatterplot  letting the calculator do the work using LinRegTTest 17 An example

18 Used to determine if the regression line is a “good fit”  ρ is the population correlation coefficient  r is the sample correlation coefficient Formidable equation  see text see text  Calculator does the work r positive - upward to right r negative - downward to right r zero - no correlation 18 Graphs

19 Determining if there is a “good fit”  Gut method if calculated r is close to 1 or -1, there’s a good fit  Hypothesis test (LinRegTest) Ho: ρ = 0 Ha ρ ≠ 0 Ho means here IS NOT a significant linear relationship(correlation) between x and y in the population. Ha means here IS A significant linear relationship (correlation) between x and y in the population To reject Ho means that there is a linear relationship between x and y in the population. Does not mean that one CAUSES the other.  Comparison to critical value Use table end of chaptertable Determine degrees of freedom df = n - 2 If r < negative critical value, then r is significant and we have a good fit If r > positive critical value, then r is significant and we have a good fit 19

20  If the line is determined to be a good fit, the equation can be used to predict y or x values from x or y values  Plug the numbers into the equation  Equation is only valid for the paired data DOMAIN 20

21 Compare 1.9s to |y - yhat|for each (x, y) pair  if |y - yhat| > 1.9s, the point could be an outlier LinRegTest gives us s y – yhat is put into the RESID list when the LinRegTest is done  To see the RESID list: go to STAT, Edit, move cursor to a blank list name and type RESID, the residuals will show up. 21

22 F Distribution and ANOVA 22

23 The student should be able to:  Interpret the F distribution as the number of groups and the sample size change.  Discuss two uses for the F distribution and ANOVA.  Conduct and interpret ANOVA 23

24  What is it good for?  Determines the existence of statistically significant differences among several group means.  Basic assumptions  Each population from which a sample is taken is assumed to be normal.  Each sample is randomly selected and independent.  The populations are assumed to have equal standard deviations (or variances).  The factor is the categorical variable.  The response is the numerical variable.  The Hypotheses  H o : µ 1 =µ 2 =µ 2 =…=µ k  H a : At least two of the group means are not equal  Always a right-tailed test 24

25  Named after Sir Ronald Fisher  F statistic is a ratio (i.e. fraction)  two sets of degrees of freedom (numerator and denominator)  F ~ F df(num),df(denom)  Two estimates of variance are made  Variation between samples Estimate of σ 2 that is the variance of the sample means Variation due to treatment (i.e. explained variation)  Variation within samples Estimate of σ 2 that is the average of the sample variances Variations due to error (i.e. unexplained variation) 25

26  Curve is skewed right.  Different curve for each set of degrees of freedom.  As the dfs for numerator and denominator get larger, the curve approximates the normal distribution  F statistic is greater than or equal to zero  Other uses  Comparing two variances  Two-Way Analysis of Variance 26

27  Formula  MS between – mean square explained by the different groups  MS within – mean square that is due to chance  SS between – sum of squares that represents the variations among different samples  SS within – sum of squares that represents the variation within samples that is due to chance 27

28  Enter the table data by columns into L1, L2, L3….  Do ANOVA test – ANOVA(L1, L2,..)  What the calculator gives  F – the F statistics  p – the pvalue  Factor – the between stuff df = # groups – 1 = k – 1 SS between MS between  Error – the within stuff df = total number of samples – # of groups = N – k SS within MS within 28

29 Four sororities took a random sample of sisters regarding their grade averages for the past term. The results are shown below: Using a significance level of 1%, is there a difference in grade averages among the sororities? 29 Sorority1Sorority 2Sorority 3Sorority 4 2.172.63 3.79 1.851.773.783.45 2.833.254.003.08 1.691.862.552.26 3.332.212.453.18

30  What’s fair game  Chapter 1, Chapter 2., Chapter 3, Chapter 4, Chapter 5, Chapter 6, Chapter 7, Chapter 8, Chapter 9, Chapter 10, Chapter 11, Chapter 12 Chapter 1Chapter 2Chapter 3 Chapter 4Chapter 5Chapter 6 Chapter 7Chapter 8Chapter 9 Chapter 10Chapter 11Chapter 12  42 multiple choice questions  Do problems from each chapter  What to bring with you  Scantron (#2052), pencil, eraser, calculator, 2 sheets of notes (8.5x11 inches, both sides) 30

31  Prepare for the Final exam  It has been a pleasure having you in class. Good luck and Godspeed with whatever path you take in life. 31


Download ppt "The Chi-Square Distribution 1. The student will be able to  Perform a Goodness of Fit hypothesis test  Perform a Test of Independence hypothesis test."

Similar presentations


Ads by Google