Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture #8 Thursday, September 15, 2016 Textbook: Section 4.4

Similar presentations


Presentation on theme: "Lecture #8 Thursday, September 15, 2016 Textbook: Section 4.4"— Presentation transcript:

1 Lecture #8 Thursday, September 15, 2016 Textbook: Section 4.4
Statistics 200 Lecture #8 Thursday, September 15, 2016 Textbook: Section 4.4 Objectives (for two categorical variables and their relationship): • Understand Simpson’s paradox: What causes it, why it’s surprising • Formulate null and alternative hypotheses in a testing scenario • Calculate chi-square statistic for a 2x2 table • Interpret p-value derived from a chi-square statistic; make a decision based on this interpretation • Contrast statistical significance and practical significance • Understand how sample size affects statistical significance • Also: Discuss Exam #1

2 On Monday we introduced a LOT of terms:
yes no Total Group1 Group2 2×2 Table One Group: from table Individual Risk: risk for one group Odds: compares two possible outcomes within one group Comparing Two Groups: (single number) Relative Risk: (ratio) Increased Risk: (percent) yes no Total Group1 Group2

3 Todays objectives: Introduce hypothesis testing!
Assess the statistical significance of a two-way table using the chi-square test.

4 Statistical Significance
A statistically significant relationship or difference is one that is large enough to be unlikely to have occurred in the sample if there was no relationship or difference in the population.

5 Statistical significance: Hypotheses
We determine statistical significance by performing something called a hypothesis test. For hypothesis testing, we first create two hypotheses, known as the null and alternative hypotheses. State null and alternative hypotheses before looking at data.

6 Null Hypothesis: H0 Nothing population No No sample null
Starting Position: __________ is happening Nothing Null: With (2 × 2 Tables): In the ______________ there is: ____ relationship between the two variables ____ difference in the two groups population No No Statistical Hypotheses: never include statements about ________ sample null Never trying to prove the ______ is true

7 Alternative Hypothesis: Ha
something Challenging Position: ____________ is happening data Goal: hope the ________ supports Alternative: With (2 × 2 Tables): In the population, there is: _____ relationship between the two variables _____ difference in the two groups a a

8 Selfie Example: State Hypotheses
Yes No Female 90 110 Male 30 70 Research Question: Does the data suggest that there is a relationship between: (sex) and (whether or not a Stat 200 student likes to take “selfies”)? When considering these two variables, in the _____________: Null (H0): there is _____ relationship (i.e. there is no difference in two groups when comparing “sex”) Alternative (Ha): there is ____ relationship (i.e. there is a difference in the two groups when comparing “sex”) population no a

9 We quantify evidence for the alternative using…
Chi-Square statistic Next step: Find chi-square statistic using sample data Used to determine whether there is a significant relationship between two categorical variables. Calculate using sample data

10 Chi-Square Statistic:
actual (observed) counts chi-square statistic quantifies the amount of difference between Yes No Female 90 110 Male 30 70 expected counts Expected Counts: are hypothetical counts are calculated using a _______________ based on the statement found in the ________ hypothesis take the position of _______ relationship ( ____ difference) formula null no no

11 Statistical significance: Chi-Square
This statistic measures the difference in the observed counts and the expected counts in a contingency table. When we talk about expected counts, we’re talking about the counts that we would expect to see if there is no relationship between the variables (the null hypothesis is true).

12 Calculating expected counts:
Yes No Row Total Female 200 Male 100 Column Total 120 180 300 = n Expected count = actual Don’t need: ___________ counts

13 Expected count = Yes No Row Total Female 200 Male 100 Column Total 120 180 300 = n 200 x 120 300 = 80 200 x 180 300 = 120 100 x 120 300 = 40 100 x 120 300 = 60

14 Expected counts in Minitab from raw data: Stat> Tables > Cross tabulation and Chi-Square
Explanatory Variable Response Variable actual data expected counts

15 Minitab Output Actual count Expected count Rows: Sex Columns: Selfie
yes no All female male All Cell Contents: Count Expected count Actual count Expected count Difference between ‘actual’ and ‘expected’ is 10. This is also called the ‘residual’.

16 Use counts to calculate Chi-Square
The Chi-square statistic lets us use a number to describe how different the actual counts are from the expected counts. Each cell in the table contributes: Finally, we add all the contributions together to get the chi-square statistic.

17 Example Calculation: Chi-Square Statistic
Chi-Square statistic Formula: LEAST MOST = = dEach cell contributes to Chi-square. We can tell which one contributes most and least

18 Chi-Square Statistic infinity larger
single number: combines information from all four cells possible values: ___ to _______ If the null is true, we know how this value should behave. We can use this knowledge to learn how unusual our sample would be under the null. the ______________ the value of chi-square, the more unusual our sample would be under the null, and the more evidence we have for our alternative hypothesis. infinity larger

19 Chi-square to p-value P-value
How large of a chi-square statistic do we need to declare significance? Convert to a different question: What is the likelihood that the chi-square statistics could be this large or larger if there is actually no relationship in the population? We can answer this question using a P-value

20 P-value Definition (Interpretation)
Box definition: page 126 textbook or any value more ________ in the direction stated in _________ when assuming the _____ hypothesis is true. The p-value is the likelihood of getting our result extreme null alternative

21 Use Minitab to calculate P-value
Example 1: Chi-Sq = 6.25 Use Minitab to calculate P-value p-value = 0.012 p-value interpretation: The likelihood of getting our chi-square statistic of _____ or any value more extreme (_______________) when assuming there is ____ relationship in the population is _______. 6.25 larger no 0.012

22 When is a p-value small enough to declare statistical significance?
The typical rule is to declare significance when the p-value is less than 0.05. Common Language Statistical Statement Guideline to make decision: Result is: “statistically significant” unlikely due to random chance can reject H0 in favor of Ha p-value < .05 “ not statistically significant” possibly due to random chance can not reject H0 in favor of Ha p-value > .05

23 Example Conclusion: Statistical Significance is Found?
p-value = (from output) < Our Conclusion: Since our p-value of We can __________ H0 in favor of Ha In the population: we ________ claim that there is: _____ statistically significant relationship between sex and whether or not a stat 200 student likes to take “selfies” There is _____ difference between the two groups We can rule out ___________ chance as an explanation for our result reject can a a random

24 Summary :Hypothesis Tests
State in: hypotheses (null and alternative) Complete the necessary calculations: expected counts, chi-square statistic, p-value A one-sentence statement: what does the p-value mean? Compare: the result to an established guideline (.05) and reject or fail to reject H0

25 What if the p-value >.05?
Then the relationship between the two variables is not statistically significant When you are interpreting this, you should say: “We fail to reject the null hypothesis” or “The p-value is too large to declare that there is a statistically significant relationship.” We never ‘accept’ H0, we ‘fail to reject’ H0

26 What influences statistical significance?
The relationship between two variables can have its significance effected by two factors: The strength of the observed relationship. How different are the row percents? The number of people that were studied (sample size).

27 Example: Are women more likely to have dogs?
Has Dog No Dog Total Female 89 56.7% 68 43.3% 157 Male 66 50.8% 64 49.2% 130 155 132 287 Your class data

28 Example: Are women more likely to have dogs?
Has Dog No Dog Total Female 89 56.7% 68 43.3% 157 Male 66 50.8% 64 49.2% 130 155 132 287 Is there evidence here for a relationship between a person’s gender and whether or not they have a dog?

29 Example: Are women more likely to have dogs?
Is there evidence here for a relationship between a person’s gender and whether or not they have a dog? Formulate and test statistical hypotheses H0: There is no relationship in the population between gender and dog ownership. Ha: There is a relationship in the population between gender and dog ownership.

30 Example: Are women more likely to have dogs?
Is there evidence here for a relationship between a person’s gender and whether or not they have a dog? Formulate and test statistical hypotheses Calculate the chi-square statistic and p-value Chi-square statistic: 0.779 P-value: 0.378

31 Example: Are women more likely to have dogs?
Is there evidence here for a relationship between a person’s gender and whether or not they have a dog? Formulate and test statistical hypotheses Calculate the chi-square statistic and p-value Interpretation of p-value: The likelihood of getting a chi-square statistic of or larger is 0.378, assuming that there is truly no relationship in the population.

32 Example: Are women more likely to have dogs?
Is there evidence here for a relationship between a person’s gender and whether or not they have a dog? Formulate and test statistical hypotheses Calculate the chi-square statistic and p-value Make a decision. Based on the p-value of we fail to reject the null hypothesis. We cannot conclude that there is a relationship between gender and dog ownership in the population.

33 Now, what if we had a larger sample with the same row percentages?
Has Dog No Dog Total Female 89 56.7% 68 43.3% 157 Male 66 50.8% 64 49.2% 130 155 132 287 P-value: 0.378 Has Dog No Dog Total Female 890 56.7% 680 43.3% 1570 Male 660 50.8% 640 49.2% 1300 1550 1320 2870 P-value: 0.0053

34 Moral of the example: Sample P-value size
If the row percentages stay the same but the sample size is increased, then we have more evidence against the null hypothesis. This results in a smaller p-value. Sample size P-value

35 Another issue: Practical vs. statistical significance
Remember from one of the morals of statistics that statistical significance does not always mean there is practical significance. For example If drug A is two times as likely to cure a cold as drug B, there will probably be statistical significance. However, if drug B only works 0.001% of the time, and drug A works 0.002% of the time, the difference isn’t practically significant.

36 Review: If you understood today’s lecture, you should be able to solve
4.41, 4.43, 4.45, 4.47, 4.51, 4.53, 4.55abd, 4.61 Recall objectives: • Formulate null and alternative hypotheses in a testing scenario • Calculate chi-square statistic for a 2x2 table • Interpret p-value derived from a chi-square statistic; make a decision based on this interpretation • Contrast statistical significance and practical significance • Understand how sample size affects statistical significance


Download ppt "Lecture #8 Thursday, September 15, 2016 Textbook: Section 4.4"

Similar presentations


Ads by Google