The binomial applied: absolute and relative risks, chi-square

Slides:



Advertisements
Similar presentations
Contingency Tables Prepared by Yu-Fen Li.
Advertisements

Conditional Probability
Analysis of frequency counts with Chi square
Statistics 303 Chapter 9 Two-Way Tables. Relationships Between Two Categorical Variables Relationships between two categorical variables –Depending on.
Chapter 9 Hypothesis Testing.
INFERENTIAL STATISTICS – Samples are only estimates of the population – Sample statistics will be slightly off from the true values of its population’s.
AM Recitation 2/10/11.
CHP400: Community Health Program - lI Research Methodology. Data analysis Hypothesis testing Statistical Inference test t-test and 22 Test of Significance.
1 Tests with two+ groups We have examined tests of means for a single group, and for a difference if we have a matched sample (as in husbands and wives)
1 G Lect 6b G Lecture 6b Generalizing from tests of quantitative variables to tests of categorical variables Testing a hypothesis about a.
Mid-Term Review Final Review Statistical for Business (1)(2)
Contingency tables Brian Healy, PhD. Types of analysis-independent samples OutcomeExplanatoryAnalysis ContinuousDichotomous t-test, Wilcoxon test ContinuousCategorical.
The binomial applied: absolute and relative risks, chi-square.
Inference for 2 Proportions Mean and Standard Deviation.
Section A Confidence Interval for the Difference of Two Proportions Objectives: 1.To find the mean and standard error of the sampling distribution.
1 G Lect 7a G Lecture 7a Comparing proportions from independent samples Analysis of matched samples Small samples and 2  2 Tables Strength.
Section 6.4 Inferences for Variances. Chi-square probability densities.
Chapter 10 Section 5 Chi-squared Test for a Variance or Standard Deviation.
Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &
Copyright © 2009 Pearson Education, Inc t LEARNING GOAL Understand when it is appropriate to use the Student t distribution rather than the normal.
Class Six Turn In: Chapter 15: 30, 32, 38, 44, 48, 50 Chapter 17: 28, 38, 44 For Class Seven: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 Read.
I. CHI SQUARE ANALYSIS Statistical tool used to evaluate variation in categorical data Used to determine if variation is significant or instead, due to.
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 9 Hypothesis Testing.
Chapter 6 Inferences Based on a Single Sample: Estimation with Confidence Intervals Slides for Optional Sections Section 7.5 Finite Population Correction.
Chapter 9: Non-parametric Tests
Presentation 12 Chi-Square test.
Point and interval estimations of parameters of the normally up-diffused sign. Concept of statistical evaluation.
Chapter 11 Chi-Square Tests.
Chapter 4. Inference about Process Quality
8-1 of 23.
STAT 312 Chapter 7 - Statistical Intervals Based on a Single Sample
Association between two categorical variables
Hypothesis Testing Review
Chapter 8: Inference for Proportions
Chapter 9 Hypothesis Testing.
Chapter 9 Hypothesis Testing.
Sections 6-4 & 7-5 Estimation and Inferences Variation
AP Stats Check In Where we’ve been… Chapter 7…Chapter 8…
Testing the Difference Between Two Variances
Saturday, August 06, 2016 Farrokh Alemi, PhD.
Hypothesis testing. Chi-square test
Comparing Populations
Quantitative Methods in HPELS HPELS 6210
Chapter 10 Analyzing the Association Between Categorical Variables
Overview and Chi-Square
Inference for Relationships
STAT 312 Introduction Z-Tests and Confidence Intervals for a
Chapter 11 Chi-Square Tests.
Confidence intervals for the difference between two means: Independent samples Section 10.1.
Analyzing the Association Between Categorical Variables
Chapter 8: Estimating with Confidence
Hypothesis Tests for a Standard Deviation
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Inference for Two Way Tables
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 6 Confidence Intervals.
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 11 Chi-Square Tests.
Statistical Inference for the Mean: t-test
Testing a Claim About a Standard Deviation or Variance
Presentation transcript:

The binomial applied: absolute and relative risks, chi-square

Probability speak (just shorthand!)… P(X) = “the probability of event X” P(D) = “the probability of disease” P(E) = “the probability of exposure” P(~D) = “the probability of not getting the disease” P(~E)= “the probability of not being exposed” P(D/E) = “the probability of disease given exposure” or “the probability of disease among the exposed” P(D/~E) = “the probability of disease given unexposed” or “the probability of disease among the unexposed”

Things that follow a binomial distribution… Cohort study (or cross-sectional): The number of exposed individuals in your sample that develop the disease The number of unexposed individuals in your sample that develop the disease Case-control study: The number of cases that have had the exposure The number of controls that have had the exposure

Cohort study example: You sample 100 smokers and 100 non-smokers and follow them for 5 years to see who develops heart disease.

Seeing it as a binomial… The number of smokers that develop heart disease in your study follows a binomial distribution with N=100, p=pd/e The number of non-smokers that develop heart disease in your study follows a binomial distribution with N=100, pd/~e

A possible outcome: Smoker (E) Non-smoker (~E) Heart disease (D) 21 13   Smoker (E) Non-smoker (~E) Heart disease (D) 21 13 No Disease (~D) 79 87 100

Statistics for these data 1. Risk ratio (relative risk) 2. Difference in proportions (absolute risk) 3. Chi-square test of independence For 2x2 tables, mathematically equivalent to difference in proportions Z test.

1. Risk ratio (relative risk)   Exposure (E) No Exposure (~E) Disease (D) a b No Disease (~D) c d a+c b+d risk to the exposed risk to the unexposed

In probability terms… Risk of disease in the exposed   Exposure (E) No Exposure (~E) Disease (D) a b No Disease (~D) c d a+c b+d Risk of disease in the exposed risk of disease in the unexposed

Risk ratio calculation:   Smoker (E) Non-smoker (~E) Heart disease (D) 21 13 No Disease (~D) 79 87 100 Interpretation: there is a 61% increase in risk of heart disease in smokers vs. nonsmokers

Inferences about risk ratio… Is our observed risk ratio statistically different from 1.0? What is the p-value? I’m going to present statistical inference for odds ratio; risk ratio is similar. So, for now, just get answer from SAS: 95% confidence interval: 0.86 to 3.04 P-value>.05

2. Difference in proportions   Exposure (E) No Exposure (~E) Disease (D) a b No Disease (~D) c d a+c b+d

2. Difference in proportions   Smoker (E) Non-smoker (~E) Heart disease (D) 21 13 No Disease (~D) 79 87 100 Absolute, rather than relative risk difference!

Difference in proportions test Null hypothesis: difference in proportions = 0 Under the null, the groups have the same risk of heart disease (=overall risk in the study): The number of smokers that develop heart disease in your study follows a binomial distribution with N=100, p=.17 The number of non-smokers that develop heart disease in your study follows a binomial distribution with N=100, p=.17

Difference in proportions test Follows a normal because binomial can be approximated with normal Difference in proportions test Null hypothesis: The difference in proportions is 0. Recall, variance of a proportion is p(1-p)/n Use average (or pooled) proportion in standard error formula, because under the null hypothesis, groups have equal proportions.

Z-test applied here… Corresponding two-sided p-value is .131.

Corresponding 95% confidence interval… If the 95% confidence interval crosses the null value (here=0), then p>.05

OR, use computer simulation to make inferences… 1. In SAS, assume infinite population of smokers and non-smokers with equal disease risk, p=.17 (UNDER THE NULL!) 2. Use the random binomial function to randomly select n=100 smokers and n=100 non-smokers, each with p=.17 3. Calculate the observed difference in proportions. 4. Repeat this 1000 times (or some large number of times). 5. Observe the distribution of differences under the null hypothesis.

Computer Simulation Results Empirical standard error is about 5.3%

P-value from our simulation… When we ran this study 1000 times, by chance, we got 72 results as big or bigger than 8%. We also got 82 results as small or smaller than –8%.

P-value From our simulation, we estimate the p-value to be: 154/1000 or .154

3. chi-square test of independence   Smoker (E) Non-smoker (~E) Heart disease (D) 21 13 No Disease (~D) 79 87 100 Null hypothesis: smoking and heart disease are independent

What does it mean to be “independent” in stats? Under independence, P(A&B)=P(A)*P(B) In words the “joint probability” equals the product of the “marginal probabilities.” OR The probability of both A and B happening is equal to the probability of A times the probability of B. If smoking and heart disease are independent, then P(smoker&heart disease)=P(smoker)*P(heart disease)

Calculate expected counts under independence…   Smoker (E) Non-smoker (~E) Heart disease (D) 21 13 No Disease (~D) 79 87 100 IF smoking and heart disease are independent THEN: P(HeartDisesae&Smoker)=P(HeartDisease)*P(Smoker) P(HeartDisease)=34/100=17% P(Smoker)=100/200=50% IF INDEPENDENT, then P(HeartDisease&Smoker) should be 8.5%; 8.5% of 200 = 17

Fill in the expected table…   Smoker (E) Non-smoker (~E) Heart disease (D) 17 No Disease (~D) 100 Marginals are fixed! 17 34 83 83 156 Notice that the rest of the table is determined after you fill in 17 for cell A. There are no degrees of freedom left! (This table has only 1 degree of freedom).

Compare expected and observed counts…   Smoker (E) Non-smoker (~E) Heart disease (D) 17 No Disease (~D) 17 expected 83 83   Smoker (E) Non-smoker (~E) Heart disease (D) 21 13 No Disease (~D) 79 87 observed

Chi-Square test 2.25=1.5-squared. The chi-square test produces exactly the square of the Z-test and the same p-value. Degrees of freedom = (rows-1)*(columns-1)=(2-1)*(2-1)=1 Rule of thumb: if the chi-square statistic is much greater than it’s degrees of freedom, indicates statistical significance. Here 2.25 not quite big enough—p=.131.

Bonus material: The Chi-Square distribution: is sum of squared normal deviates The expected value and variance of a chi-square: E(x)=df Var(x)=2(df)

Case-control study example: You sample 50 stroke patients and 50 controls without stroke and ask about their smoking in the past.

Possible study results:   Smoker (E) Non-smoker (~E) Stroke (D) 15 35 No Stroke (~D) 8 42 50

Statistics for these data 1. Odds ratio (relative risk) 2. Difference in proportions exposed (absolute risk) 3. Chi-square

What’s the risk ratio here?   Smoker (E) Non-smoker (~E) Stroke (D) 15 35 No Stroke (~D) 8 42 50 Tricky: There is no risk ratio, because we cannot calculate the risk of disease!!

The odds ratio… We cannot calculate a risk ratio from a case-control study. BUT, we can calculate a measure called the odds ratio…

Odds vs. Risk If the risk is… Then the odds are… ½ (50%) ¾ (75%) 1/10 (10%) 1/100 (1%) 1:1 3:1 1:9 1:99 Note: An odds is always higher than its corresponding probability, unless the probability is 100%.

The Odds Ratio (OR) a+b=cases c+d=controls   Exposure (E) No Exposure (~E) Disease (D) a b No Disease (~D) c d a+b=cases The proportion of cases to controls are set by the investigator; therefore, they do not represent the risk (probability) of developing disease. c+d=controls Odds of exposure in the cases Odds of exposure in the controls

The Odds Ratio (OR) Odds of disease in the exposed Odds of disease in the unexposed Odds of exposure in the cases Odds of exposure in the controls This expression is mathematically equivalent to: Backward from what we want… The direction of interest!

Proof via Bayes’ Rule (optional) Odds of exposure in the controls Odds of exposure in the cases Bayes’ Rule Odds of disease in the unexposed Odds of disease in the exposed What we want! =

The odds ratio Smoker (E) Non-smoker (~E) Stroke (D) 15 35   Smoker (E) Non-smoker (~E) Stroke (D) 15 35 No Stroke (~D) 8 42 50 Interpretation: there is a 2.25-fold higher odds of stroke in smokers vs. non-smokers.

Inferences about the odds ratio… Does the sampling distribution follow a normal distribution? What is the standard error?

Simulation… 1. In SAS, assume infinite population of cases and controls with equal proportion of smokers (exposure), p=.23 (UNDER THE NULL!) 2. Use the random binomial function to randomly select n=50 cases and n=50 controls each with p=.23 chance of being a smoker. 3. Calculate the observed odds ratio for the resulting 2x2 table. 4. Repeat this 1000 times (or some large number of times). 5. Observe the distribution of odds ratios under the null hypothesis.

Properties of the OR (simulation) (50 cases/50 controls/23% exposed) Under the null, this is the expected variability of the sample ORnote the right skew

Properties of the lnOR Normal!

Properties of the lnOR From the simulation, can get the empirical standard error (~0.5) and p-valuE (~.10)

Properties of the lnOR Or, in general, standard error =

Inferences about the ln(OR)   Smoker (E) Non-smoker (~E) Stroke (D) 15 35 No Stroke (~D) 8 42 50 p=.10

Confidence interval… Final answer: 2.25 (0.85,5.92) Smoker (E)   Smoker (E) Non-smoker (~E) Stroke (D) 15 35 No Stroke (~D) 8 42 50 Final answer: 2.25 (0.85,5.92)

2. Difference in proportions exposed   Smoker (E) Non-smoker (~E) Stroke (D) 15 35 No Stroke (~D) 8 42 50

2. Difference in proportions exposed

3. chi-square test of independence   Smoker (E) Non-smoker (~E) Stroke (D) 15 35 No Stroke (~D) 8 42 Expected count for cell A: proportion: 0.5*.23=.115 count: .115*100= 11.5

expected and observed counts…   Smoker (E) Non-smoker (~E) Stroke (D) 11.5 No Stroke (~D) 38.5 expected 11.5 38.5   Smoker (E) Non-smoker (~E) Stroke (D) 15 35 No Stroke (~D) 8 42 observed

Chi-Square test Not quite sufficient evidence to reject null… 2.78=1.67-squared. Not quite sufficient evidence to reject null…