STAT 312 Introduction Z-Tests and Confidence Intervals for a

Slides:



Advertisements
Similar presentations
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 9 Inferences Based on Two Samples.
Advertisements

Statistical Inference for Frequency Data Chapter 16.
Chapter 14 Analysis of Categorical Data
Chapter Goals After completing this chapter, you should be able to:
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 14 Goodness-of-Fit Tests and Categorical Data Analysis.
Inferences About Process Quality
5-3 Inference on the Means of Two Populations, Variances Unknown
Psy B07 Chapter 1Slide 1 ANALYSIS OF VARIANCE. Psy B07 Chapter 1Slide 2 t-test refresher  In chapter 7 we talked about analyses that could be conducted.
Estimation and Hypothesis Testing Faculty of Information Technology King Mongkut’s University of Technology North Bangkok 1.
Inferential Statistics: SPSS
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 11-1 Chapter 11 Chi-Square Tests Business Statistics, A First Course 4 th Edition.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Statistical Inferences Based on Two Samples Chapter 9.
Chapter 9: Non-parametric Tests n Parametric vs Non-parametric n Chi-Square –1 way –2 way.
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
A Course In Business Statistics 4th © 2006 Prentice-Hall, Inc. Chap 9-1 A Course In Business Statistics 4 th Edition Chapter 9 Estimation and Hypothesis.
Chap 9-1 Two-Sample Tests. Chap 9-2 Two Sample Tests Population Means, Independent Samples Means, Related Samples Population Variances Group 1 vs. independent.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
AP STATS EXAM REVIEW Chapter 8 Chapter 13 and 14 Chapter 11 and 12 Chapter 9 and Chapter 10 Chapter 7.
6.1 - One Sample One Sample  Mean μ, Variance σ 2, Proportion π Two Samples Two Samples  Means, Variances, Proportions μ 1 vs. μ 2.
Fitting probability models to frequency data. Review - proportions Data: discrete nominal variable with two states (“success” and “failure”) You can do.
Copyright © 2010 Pearson Education, Inc. Slide
Section 10.2 Independence. Section 10.2 Objectives Use a chi-square distribution to test whether two variables are independent Use a contingency table.
Chapter Outline Goodness of Fit test Test of Independence.
Section 12.2: Tests for Homogeneity and Independence in a Two-Way Table.
6.1 - One Sample One Sample  Mean μ, Variance σ 2, Proportion π Two Samples Two Samples  Means, Variances, Proportions μ 1 vs. μ 2.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 12 Tests of Goodness of Fit and Independence n Goodness of Fit Test: A Multinomial.
Jump to first page Inferring Sample Findings to the Population and Testing for Differences.
Statistics 300: Elementary Statistics Section 11-3.
Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.
Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &
Section 10.2 Objectives Use a contingency table to find expected frequencies Use a chi-square distribution to test whether two variables are independent.
STAT 312 Chapter 7 - Statistical Intervals Based on a Single Sample
Inference concerning two population variances
Chapter 9: Non-parametric Tests
Presentation 12 Chi-Square test.
10 Chapter Chi-Square Tests and the F-Distribution Chapter 10
Chapter 11 Chi-Square Tests.
Chapter 4. Inference about Process Quality
STAT 312 Chapter 7 - Statistical Intervals Based on a Single Sample
Math 4030 – 10b Inferences Concerning Variances: Hypothesis Testing
Hypothesis Testing Review
Chapter 12 Tests with Qualitative Data
CHAPTER 11 Inference for Distributions of Categorical Data
Chapter 8 Hypothesis Testing with Two Samples.
Qualitative data – tests of association
Chapter 5 Hypothesis Testing
Elementary Statistics: Picturing The World
Chapter 9 Hypothesis Testing.
Testing for Independence
Chi Square Two-way Tables
Elementary Statistics
AP Stats Check In Where we’ve been… Chapter 7…Chapter 8…
Chapter 11: Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
Chapter 10 Analyzing the Association Between Categorical Variables
STAT Z-Tests and Confidence Intervals for a
Chapter 11 Chi-Square Tests.
Inference on Categorical Data
CHAPTER 6 Statistical Inference & Hypothesis Testing
CHAPTER 6 Statistical Inference & Hypothesis Testing
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
Chapter Outline Goodness of Fit test Test of Independence.
Chapter 11 Chi-Square Tests.
Presentation transcript:

STAT 312 Introduction 9.1 - Z-Tests and Confidence Intervals for a Chapter 9 - Inferences Based on Two Samples Introduction 9.1 - Z-Tests and Confidence Intervals for a Difference Between Two Population Means 9.2 - The Two-Sample T-Test and Confidence Interval 9.3 - Analysis of Paired Data 9.4 - Inferences Concerning a Difference Between Population Proportions 9.5 - Inferences Concerning Two Population Variances (Categorical Data)

Binary Response: P(Success) =  “Test of Independence” “Test of Homogeneity” POPULATION TWO POPULATIONS Two random binary variables I and J Random binary variable I “Do you like olives?” “Do you like anchovies?” 1 = P(Yes to olives) 2 = P(Yes to anchovies) “Do you like Brussel sprouts?”  = P(Yes to Brussel sprouts) Null Hypothesis H0: 1 = 2 “No association exists between liking olives and anchovies.” Null Hypothesis H0: 1 = 2 “No difference in liking Brussel sprouts between two pops.” Alternative Hypothesis HA: 1 ≠ 2 “An association exists between liking olives and anchovies.” Alternative Hypothesis HA: 1 ≠ 2 “There is a difference in liking Brussel sprouts bet two pops.”

Binary Response: P(Success) =  “Test of Independence” “Test of Homogeneity” POPULATION TWO POPULATIONS Two random binary variables I and J Random binary variable I “Do you like olives?” “Do you like anchovies?” 1 = P(Yes to olives) 2 = P(Yes to anchovies) “Do you like Brussel sprouts?”  = P(Yes to Brussel sprouts) Sample, size n1 Sample, size n2 Sample, size n1 Sample, size n2 (Assume “large” sample sizes.)

Sampling Distribution of Sample 1, size n1 Sample 2, size n2 X1 = # Successes X2 = # Successes Problem: s.e. depends on  !! Sampling Distribution of Recall… If n  15 and n (1 –  )  15, then via the Normal Approximation to the Binomial… If n  15 and n (1 –  )  15, then via the Normal Approximation to the Binomial… Solution: Use

Sampling Distribution of Sample 1, size n1 Sample 2, size n2 X1 = # Successes X2 = # Successes If n11  15 and n1 (1 – 1 )  15, then via Normal Approximation to the Binomial If n22  15 and n2 (1 – 2 )  15, then via Normal Approximation to the Binomial Sampling Distribution of Recall from section 4.1 (Discrete Models): Mean(X – Y) = Mean(X) – Mean(Y) and if X and Y are independent… Var(X – Y) = Var(X) + Var(Y)

Sampling Distribution of Sample 1, size n1 Sample 2, size n2 X1 = # Successes X2 = # Successes Sampling Distribution of “Null Distribution” standard error Similar problem as “one proportion” inference s.e.! = 0 under H0 For confidence interval, replace 1 and 2 respectively, by For critical region and p-value, replace 1 and 2 respectively, by….. ???? Null Hypothesis H0: 1 = 2 standard error estimate …so replace their common value by a “pooled” estimate.

Example: Two Proportions (of “Success”) Study Question: “Is there an association between liking Bruce Willis movies and gender, or not?”

Test of Homogeneity or Independence? Example: Two Proportions (of “Success”) Study Question: “Is there an association between liking Bruce Willis movies and gender, or not?” Test of Homogeneity or Independence? Test of Homogeneity (between two populations) Design: Randomly select two large samples of males and females, and record their binary responses (Yes = 1, No = 0) to the question “Do you like Bruce Willis movies?” Let the discrete random variable X = “# Successes” (i.e., “Yes” responses) in each gender of the samples, and use these data to test… Null Hypothesis H0: P(“Yes” among Males) = P(“Yes” among Females), i.e., H0: π1 = π2 where π = P(Success) in each gender population. “No association exists.” π1 – π2 = 0, Data: Sample 1) n1 = 60 males, X1 = 42 Sample 2) n2 = 40 females, X2 = 16 Analysis via Z-test: Point estimates NOTE: This is > 0. Therefore, REJECT H0 Interpretation: A significant association exists at the .05 level between “liking Bruce Willis movies” and gender, with males showing a 30% preference over females, on average.

Test of Homogeneity or Independence Example: Two Proportions (of “Success”) Study Question: “Is there an association between liking Bruce Willis movies and gender, or not?” Test of Homogeneity or Independence Design: Randomly select two large samples of males and females, and record their binary responses (Yes = 1, No = 0) to the question “Do you like Bruce Willis movies?” Let the discrete random variable X = “# Successes” (i.e., “Yes” responses) in each gender of the samples, and use these data to test… Null Hypothesis H0: P(“Yes” among Males) = P(“Yes” among Females), i.e., H0: π1 = π2 where π = P(Success) in each gender population. “No association exists.” π1 – π2 = 0, Data: Sample 1) n1 = 60 males, X1 = 42 Sample 2) n2 = 40 females, X2 = 16 Analysis via Z-test: Point estimates NOTE: This is > 0. Therefore, REJECT H0 Interpretation: A significant association exists at the .05 level; “liking Bruce Willis movies” and gender are dependent, with males showing a 30% preference over females, on average.

Binary Response: P(Success) =  “Test of Independence” “Test of Homogeneity” POPULATION TWO POPULATIONS Two random binary variables I and J Random binary variable I “Do you like olives?” “Do you like anchovies?” Males Females 1 = P(Yes to olives) 2 = P(Yes to anchovies) “Do you like Bruce Willis?”  = P(Yes to Bruce Willis) Null Hypothesis H0: 1 = 2 “No association exists between liking olives and anchovies.” Null Hypothesis H0: 1 = 2 “No difference in liking Bruce Willis between two pops.” Alternative Hypothesis HA: 1 ≠ 2 “An association exists between liking olives and anchovies.” Alternative Hypothesis HA: 1 ≠ 2 “There is a difference in liking Bruce Willis betw two pops.”

Binary Response: P(Success) =  “Test of Independence” “Test of Homogeneity” POPULATION TWO POPULATIONS Two random binary variables I and J Random binary variable I “Gender: Male?” “Do you like Bruce Willis?” Males Females 1 = P(Yes to Male) 2 = P(Yes to Bruce) “Do you like Bruce Willis?”  = P(Yes to Bruce Willis) Null Hypothesis H0: 1 = 2 “No association exists between gender and liking Bruce.” Null Hypothesis H0: 1 = 2 “No difference in liking Bruce Willis between two pops.” Alternative Hypothesis HA: 1 ≠ 2 “An association exists between gender and liking Bruce.” Alternative Hypothesis HA: 1 ≠ 2 “There is a difference in liking Bruce Willis betw two pops.”

Example: Two Proportions (of “Success”) Study Question: “Is there an association between liking Bruce Willis movies and gender, or not?” Design: Randomly select two large samples of males and females, and record their binary responses (Yes = 1, No = 0) to the question “Do you like Bruce Willis movies?” Let the discrete random variable X = “# Successes” (i.e., “Yes” responses) in each gender of the samples, and use these data to test… Null Hypothesis H0: P(“Yes” among Males) = P(“Yes” among Females), i.e., H0: π1 = π2 where π = P(Success) in each gender population. “No association exists.” π1 – π2 = 0, Data: Sample 1) n1 = 60 males, X1 = 42 Sample 2) n2 = 40 females, X2 = 16 ~ ALTERNATE METHOD ~

Example: Two Proportions (of “Success”) Study Question: “Is there an association between liking Bruce Willis movies and gender, or not?” Design: Randomly select two large samples of males and females, and record their binary responses (Yes = 1, No = 0) to the question “Do you like Bruce Willis movies?” Let the discrete random variable X = “# Successes” (i.e., “Yes” responses) in each gender of the samples, and use these data to test… Null Hypothesis H0: P(“Yes” among Males) = P(“Yes” among Females), i.e., H0: π1 = π2 where π = P(Success) in each gender population. “No association exists.” π1 – π2 = 0, Data: Sample 1) n1 = 60 males, X1 = 42 Sample 2) n2 = 40 females, X2 = 16 Observed Males Females Yes 42 16 58 No 18 24 60 40 100 Males Females Yes 42 16 No 60 40 Expected (under H0) Males Females Yes E11 = ? E12 = ? 58 No E21 = ? E22 = ? 42 60 40 100

Recall Probability Tables from Chapter 3…. J = 1 J = 2 I = 1 π11 π12 π11 + π12 I = 2 π21 π22 π21 + π22 π11 + π21 π12 + π22 1 Under the null hypothesis, the binary variable I is statistically independent of the binary variable J, i.e., P(I ∩ J) = P(I) P(J).

 Recall Probability Tables from Chapter 3…. J = 1 J = 2 I = 1 π11 π12 π11 + π12 I = 2 π21 π22 π21 + π22 π11 + π21 π12 + π22 1 Under the null hypothesis, the binary variable I is statistically independent of the binary variable J, e.g., P(“I = 1” ∩ “J = 1”) = P(“I = 1”) P(“J = 1”). Contingency Table  Probability Table J = 1 J = 2 I = 1 E11 E12 R1 I = 2 E21 E22 R2 C1 C2 n J = 1 J = 2 I = 1 E11/n E12/n R1/n I = 2 E21/n E22/n R2/n C1/n C2/n 1 Therefore… , etc.

“Chi-squared” Test Statistic Example: Two Proportions (of “Success”) Study Question: “Is there an association between liking Bruce Willis movies and gender, or not?” Design: Randomly select two large samples of males and females, and record their binary responses (Yes = 1, No = 0) to the question “Do you like Bruce Willis movies?” Let the discrete random variable X = “# Successes” (i.e., “Yes” responses) in each gender of the samples, and use these data to test… Null Hypothesis H0: P(“Yes” among Males) = P(“Yes” among Females), i.e., H0: π1 = π2 where π = P(Success) in each gender population. “No association exists.” π1 – π2 = 0, Data: Sample 1) n1 = 60 males, X1 = 42 Sample 2) n2 = 40 females, X2 = 16 Observed Males Females Yes 42 16 No 60 40 Males Females Yes 42 16 58 No 18 24 60 40 100 “Chi-squared” Test Statistic Expected (under H0) Check: Is the null hypothesis true?  Males Females Yes E11 = ? E12 = ? 58 No E21 = ? E22 = ? 42 60 40 100 34.8 23.2 where “degrees of freedom” df = (# rows – 1)(# cols – 1), = 1 for a 2  2 table. 25.2 16.8

“Chi-squared” Test Statistic Example: Two Proportions (of “Success”) Study Question: “Is there an association between liking Bruce Willis movies and gender, or not?” Design: Randomly select two large samples of males and females, and record their binary responses (Yes = 1, No = 0) to the question “Do you like Bruce Willis movies?” Expected (under H0) Observed Males Females Yes 42 16 58 No 18 24 60 40 100 Males Females Yes 34.8 23.2 58 No 25.2 16.8 42 60 40 100 “Chi-squared” Test Statistic p = ????? = 8.867 on 1 df

The actual p-value = .0029, the same as that found using the Z-test! Because 8.867 is much greater than the α =.05 critical value of 3.841, it follows that p << .05. More precisely, 7.879 < 8.867 < 9.141; hence .0025 < p < .005. α =.05 Yes = c(42, 16) No = c(18, 24) Bruce = rbind(Yes, No) chisq.test(Bruce, correct = F) The actual p-value = .0029, the same as that found using the Z-test! Pearson's Chi-squared test data: Bruce X-squared = 8.867, df = 1, p-value = 0.002904

“Chi-squared” Test Statistic Example: Two Proportions (of “Success”) Study Question: “Is there an association between liking Bruce Willis movies and gender, or not?” Design: Randomly select two large samples of males and females, and record their binary responses (Yes = 1, No = 0) to the question “Do you like Bruce Willis movies?” Expected (under H0) Observed Males Females Yes 42 16 58 No 18 24 60 40 100 Males Females Yes 34.8 23.2 58 No 25.2 16.8 42 60 40 100 “Chi-squared” Test Statistic The α =.05 critical value is 3.841. p = .0029 Recall… = 8.867 on 1 df

Example: Two Proportions (of “Success”) Study Question: “Is there an association between liking Bruce Willis movies and gender, or not?” Design: Randomly select two large samples of males and females, and record their binary responses (Yes = 1, No = 0) to the question “Do you like Bruce Willis movies?” Let the discrete random variable X = “# Successes” (i.e., “Yes” responses) in each gender of the samples, and use these data to test… Null Hypothesis H0: P(“Yes” in Male population) = P(“Yes” in Female population), i.e., H0: π1 = π2 where π = P(Success) in each gender population. “No association exists.” π1 – π2 = 0, Data: Sample 1) n1 = 60 males, X1 = 42 Sample 2) n2 = 40 females, X2 = 16 Analysis via Z-test: Point estimates NOTE: This is > 0. Therefore, REJECT H0 Interpretation: A significant association exists at the .05 level; “liking Bruce Willis movies” and gender are dependent, with males showing a 30% preference over females, on average.

“Chi-squared” Test Statistic Example: Two Proportions (of “Success”) Study Question: “Is there an association between liking Bruce Willis movies and gender, or not?” Design: Randomly select two large samples of males and females, and record their binary responses (Yes = 1, No = 0) to the question “Do you like Bruce Willis movies?” Expected (under H0) Observed Males Females Yes 42 16 58 No 18 24 60 40 100 Males Females Yes 34.8 23.2 58 No 25.2 16.8 42 60 40 100 “Chi-squared” Test Statistic The α =.05 critical value is 3.841. p = .0029 NOTE: (Z-score)2 = (2.9775)2 NOTE: (Z-score)2 = (2.9775)2 Connection between Z-test and Chi-squared test ! = 8.867 on 1 df = 8.867 on 1 df Connection between Z-test and Chi-squared test !

STAT 312 Categorical Data Analysis Introduction Chapter 14 - Goodness-of-Fit Tests and Categorical Data Analysis Introduction 14.1 - Goodness-of-Fit Tests When Category Probabilities Are Completely Specified 14.2 - Goodness-of-Fit Tests for Composite Hypotheses 14.3 - Two-Way Contingency Tables

“Chi-squared” Test Statistic for Categorical Data “degrees of freedom” df = (# rows – 1)(# cols – 1) “Chi-squared” Test Statistic for Categorical Data 2  2 Chi-squared Test is only valid if: Null Hypothesis H0: 1 – 2 = 0. One-sided or nonzero null value  Z-test! Expected Values  5, in order to avoid “spurious significance” due to a possibly inflated Chi-squared value. Use “Fisher’s Exact Test” otherwise. Paired version of 2  2 Chi-squared Test = “McNemar’s Test” Categorical data – contingency table with any number of rows and columns Formal Null Hypothesis difficult to write mathematically in terms of 1, 2,… “Test of Independence” “Test of Homogeneity” “Goodness-of-Fit” Test Informal H0: “No association exists between rows and columns.” 80% of Expected Values  5. Can use “Fisher’s Exact Test” otherwise. CATEGORIES MUST MAKE SENSE! BEWARE OF OVER-LUMPING / OVER-SPLITTING!

“Goodness-of-Fit” Test Categorical data – contingency table with any number of rows and columns “Goodness-of-Fit” Test

“Goodness-of-Fit” Test (Numerical: Continuous) Categorical data – contingency table with any number of rows and columns “Goodness-of-Fit” Test (Numerical: Continuous) Let X = “Hours of sleep before Final Exam” in a population of Stat 312 students. Random sample of n = 425 students; X measured for each student. > x 0.10 0.15 0.29 0.29 0.40 0.48 0.61 0.69 0.70 0.92 0.92 0.93 0.96 0.96 1.31 1.35 1.36 1.59 1.60 1.63 1.67 1.72 1.79 1.81 1.86 1.86 1.87 1.88 1.90 2.03 2.14 2.33 2.47 2.56 2.59 2.59 2.64 2.67 2.68 2.69 2.71 2.73 2.81 2.90 3.01 3.19 3.34 3.37 3.40 3.40 3.48 3.52 3.54 3.55 3.60 3.63 3.68 3.68 3.71 3.72 3.74 3.77 3.78 3.79 3.97 4.00 4.02 4.02 4.03 4.03 4.04 4.06 4.06 4.08 4.09 4.10 4.10 4.10 4.10 4.10 4.12 4.13 4.16 4.16 4.20 4.21 4.21 4.22 4.24 4.25 4.25 4.25 4.25 4.26 4.27 4.28 4.28 4.28 4.29 4.29 4.29 4.30 4.30 4.32 4.33 4.36 4.37 4.37 4.38 4.39 4.44 4.44 4.44 4.47 4.49 4.49 4.50 4.52 4.55 4.55 4.57 4.59 4.59 4.59 4.60 4.61 4.61 4.61 4.61 4.62 4.62 4.62 4.62 4.63 4.63 4.71 4.72 4.75 4.75 4.76 4.77 4.77 4.78 4.81 4.81 4.82 4.82 4.82 4.82 4.85 4.86 4.86 4.88 4.92 4.93 4.93 4.94 4.94 4.96 4.96 4.99 5.00 5.03 5.05 5.06 5.08 5.10 5.12 5.17 5.17 5.18 5.19 5.22 5.22 5.22 5.23 5.27 5.28 5.29 5.30 5.32 5.35 5.38 5.40 5.40 5.41 5.42 5.43 5.46 5.46 5.46 5.47 5.49 5.50 5.50 5.52 5.52 5.54 5.55 5.55 5.55 5.56 5.56 5.59 5.59 5.62 5.63 5.67 5.68 5.68 5.70 5.72 5.74 5.75 5.76 5.76 5.78 5.80 5.82 5.86 5.88 5.88 5.88 5.89 5.90 5.92 5.94 5.95 5.96 5.97 5.98 6.00 6.01 6.02 6.02 6.03 6.03 6.05 6.08 6.10 6.12 6.12 6.12 6.12 6.15 6.17 6.19 6.21 6.23 6.25 6.25 6.26 6.27 6.28 6.28 6.29 6.30 6.31 6.32 6.33 6.34 6.35 6.37 6.38 6.38 6.41 6.41 6.46 6.48 6.49 6.50 6.50 6.51 6.53 6.55 6.56 6.57 6.61 6.63 6.65 6.65 6.67 6.68 6.70 6.71 6.73 6.75 6.76 6.78 6.79 6.80 6.82 6.82 6.84 6.86 6.87 6.89 6.90 6.91 6.92 7.00 7.01 7.03 7.05 .... 10.62 10.64 10.70 10.77 10.81 10.87 10.95 10.9 11.68 11.68 11.70 11.75 11.84 11.93 11.94

“Goodness-of-Fit” Test (Numerical: Continuous) Categorical data – contingency table with any number of rows and columns “Goodness-of-Fit” Test (Numerical: Continuous) Let X = “Hours of sleep before Final Exam” in a population of Stat 312 students. Random sample of n = 425 students; X measured for each student. Null Hypotheses where Pop 1 = Score  90 Pop 2 = Score < 90 But suppose H0: “All students get the same amount of sleep.” Meaningless, since there are not 425 populations, only one value of X for each student in the sample!

“Goodness-of-Fit” Test (Numerical: Continuous) Categorical data – contingency table with any number of rows and columns “Goodness-of-Fit” Test (Numerical: Continuous) Let X = “Hours of sleep before Final Exam” in a population of Stat 312 students. Random sample of n = 425 students; X measured for each student. Null Hypotheses: where Pop 1 = Score  90 Pop 2 = Score < 90 But suppose H0: “All students get the same amount of sleep.” I = 1 2 3 4 5 6 Categorical (but not binary) X < 4 4  X < 5 5  X < 6 6  X < 7 7  X < 8 X  8 425 Observed 65 97 70 69 55 Expected ? 425

“Goodness-of-Fit” Test (Numerical: Continuous) Categorical data – contingency table with any number of rows and columns “Goodness-of-Fit” Test (Numerical: Continuous) H0: “All students get the same amount of sleep.” Categorical (but not binary) X < 4 4  X < 5 5  X < 6 6  X < 7 7  X < 8 X  8 65 97 70 69 55 425 Observed Expected ? 425

“Goodness-of-Fit” Test (Numerical: Continuous) Categorical data – contingency table with any number of rows and columns “Goodness-of-Fit” Test (Numerical: Continuous) H0: “All students get the same amount of sleep.” Categorical (but not binary) X < 4 4  X < 5 5  X < 6 6  X < 7 7  X < 8 X  8 65 97 70 69 55 425 Observed Expected 70.8333 425 In this case, k = 6, so df = 6 – 1 = 5.

“Goodness-of-Fit” Test (Numerical: Continuous) Categorical data – contingency table with any number of rows and columns “Goodness-of-Fit” Test (Numerical: Continuous) REJECT H0 - Not all students get the same amount of sleep! H0: “All students get the same amount of sleep.” More precisely, there is a statistically significant difference between the proportion of students in at least one of these sleep categories, and the others. Categorical (but not binary) X < 4 4  X < 5 5  X < 6 6  X < 7 7  X < 8 X  8 65 97 70 69 55 425 Observed Expected 70.8333 425 p > pchisq(13.7906, 5, lower.tail = F) [1] 0.01699572

> obs = c(65, 97, 70, 69, 69, 55) > chisq.test(obs, correct = F) Chi-squared test for given probabilities data: obs X-squared = 13.7906, df = 5, p-value = 0.01700 > G.of.Fit = chisq.test(x, correct = F) > G.of.Fit$obs [1] 65 97 70 69 69 55 > G.of.Fit$exp [1] 70.83333 70.83333 70.83333 70.83333 70.83333 70.83333 > G.of.Fit$stat X-squared 13.79059 > G.of.Fit$p.value [1] 0.01699580 > obs = c(65, 97, 70, 69, 69, 55) > chisq.test(obs, correct = F) Chi-squared test for given probabilities data: obs X-squared = 13.7906, df = 5, p-value = 0.01700

(Numerical: Continuous) What if…? Categorical data – contingency table with any number of rows and columns (Numerical: Continuous) Let X = “Hours of sleep before Final Exam” in a population of Stat 312 students. Random sample of n = 425 students; X measured for each student. Is there any association between “Number of hours of sleep” and “Final Exam score?” H0: “All students get the same amount of sleep.” Pop 1 = Score  90 Pop 2 = Score < 90 H0: “No such association exists.” Null Hypothesis Observed X < 4 4  X < 5 5  X < 6 6  X < 7 7  X < 8 X  8 65 97 70 69 55 425

(Numerical: Continuous) What if…? Categorical data – contingency table with any number of rows and columns (Numerical: Continuous) Let X = “Hours of sleep before Final Exam” in a population of Stat 312 students. Random sample of n = 425 students; X measured for each student. Is there any association between “Number of hours of sleep” and “Final Exam score?” Pop 1 = Score  90 Pop 2 = Score < 90 H0: “No such association exists.” Null Hypothesis Observed Score X < 4 4  X < 5 5  X < 6 6  X < 7 7  X < 8 X  8  90 < 90 65 97 70 69 55 425 X < 4 4  X < 5 5  X < 6 6  X < 7 7  X < 8 X  8 65 97 70 69 55 425 Expected 2 “Test of Homogeneity” (between two populations)

(Numerical: Continuous) What if…? Categorical data – contingency table with any number of rows and columns (Numerical: Continuous) Let X = “Hours of sleep before Final Exam” in a population of Stat 312 students. Random sample of n = 425 students; X measured for each student. Is there any association between “Number of hours of sleep” and “Final Exam score?” H0: “No such association exists.” Null Hypothesis Let Y = “Final Exam score”

Scatterplot, n = 425

Correlation and Regression Scatterplot, n = 425 Correlation and Regression

(Numerical: Continuous) What if…? Categorical data – contingency table with any number of rows and columns (Numerical: Continuous) Let X = “Hours of sleep before Final Exam” in a population of Stat 312 students. Random sample of n = 425 students; X measured for each student. Pop 1 = Score  90 Pop 2 = Score < 90 Null Hypothesis Is there any significant difference in mean number of hours of sleep between students with Score  90 and Score < 90? T-test!

Back to… Categorical data – contingency table with any number of rows and columns “Goodness-of-Fit” Test (Numerical: Continuous) Let X = “Hours of sleep before Final Exam” in a population of Stat 312 students. Random sample of n = 425 students; X measured for each student. Null Hypothesis H0: “All students get the same amount of sleep.” One way to interpret this is that the Uniform Distribution is a good fit to the data. But what about fitting other models? X < 4 4  X < 5 5  X < 6 6  X < 7 7  X < 8 X  8 65 97 70 69 55 425 Observed Expected 70.8333 425

“Goodness-of-Fit” Test (Numerical: Continuous) Back to… Categorical data – contingency table with any number of rows and columns “Goodness-of-Fit” Test (Numerical: Continuous) Let X = “Hours of sleep before Final Exam” in a population of Stat 312 students. Random sample of n = 425 students; X measured for each student. Null Hypothesis X < 4 4  X < 5 5  X < 6 6  X < 7 7  X < 8 X  8 65 97 70 69 55 425 Observed Expected ? 425

“Goodness-of-Fit” Test (Numerical: Continuous) Back to… Categorical data – contingency table with any number of rows and columns “Goodness-of-Fit” Test (Numerical: Continuous) Let X = “Hours of sleep before Final Exam” in a population of Stat 312 students. Random sample of n = 425 students; X measured for each student. Null Hypothesis > obs = c(65, 97, 70, 69, 69, 55) > H0 = c(.20,.30,.20,.15,.10,.05) > chisq.test(obs, correct = F, p = H0) Chi-squared test for given probabilities data: obs X-squared = 85.2078, df = 5, p-value < 2.2e-16 X < 4 4  X < 5 5  X < 6 6  X < 7 7  X < 8 X  8 65 97 70 69 55 425 Observed Expected 85.00 127.50 63.75 42.50 21.25 425

“Goodness-of-Fit” Test (Numerical: Continuous) Back to… Categorical data – contingency table with any number of rows and columns “Goodness-of-Fit” Test (Numerical: Continuous) Let X = “Hours of sleep before Final Exam” in a population of Stat 312 students. Random sample of n = 425 students; X measured for each student. Null Hypothesis (new model) H0: “These sample proportions come from a normally-distributed population.” > c(mean(obs), sd(obs)) [1] 5.827529 2.351930 X < 4 4  X < 5 5  X < 6 6  X < 7 7  X < 8 X  8 65 97 70 69 55 425 Observed Expected 425

“Goodness-of-Fit” Test (Numerical: Continuous) Back to… Categorical data – contingency table with any number of rows and columns “Goodness-of-Fit” Test (Numerical: Continuous) Let X = “Hours of sleep before Final Exam” in a population of Stat 312 students. Random sample of n = 425 students; X measured for each student. Null Hypothesis (new model) H0: “These sample proportions come from a normally-distributed population.” For each bin, calculate the corresponding normal probabilities: (pnorm) Multiply each probability by n = 425 to obtain the expected values… X < 4 4  X < 5 5  X < 6 6  X < 7 7  X < 8 X  8 65 97 70 69 55 425 Observed Expected 425

Back to… Categorical data – contingency table with any number of rows and columns “Goodness-of-Fit” Test (Numerical: Continuous) Let X = “Hours of sleep before Final Exam” in a population of Stat 312 students. Random sample of n = 425 students; X measured for each student. Null Hypothesis (new model) H0: “These sample proportions come from a normally-distributed population.” But what if the categories (i.e., bins) are unspecified? In practice, the bins are often chosen to have equal areas, to make the probability of randomly selecting any one of them equally likely (in this case, 1/6). X < 4 4  X < 5 5  X < 6 6  X < 7 7  X < 8 X  8 65 97 70 69 55 425 Observed Expected 425

plot(dnorm, from=-3, to=3, lwd = 2) lines(c(-3,3), c(0,0)) for (i in 1:5) { z = qnorm(i/6) print(z) y = dnorm(z) points(z,0, pch=19, col = "red") lines(c(z, z), c(0, y), lty = 2) } -0.9674216 -0.4307273 0 0.4307273 0.9674216 3.552221 4.814489 5.827529 6.840569 8.102837

Back to… Categorical data – contingency table with any number of rows and columns “Goodness-of-Fit” Test (Numerical: Continuous) Let X = “Hours of sleep before Final Exam” in a population of Stat 312 students. Random sample of n = 425 students; X measured for each student. Null Hypothesis (new model) H0: “These sample proportions come from a normally-distributed population.” But what if the categories (i.e., bins) are unspecified? In practice, the bins are often chosen to have equal areas, to make the probability of randomly selecting any one of them equally likely (in this case, 1/6). Obs Exp O1 O2 O3 O4 O5 O6 425 70.8333 425