The binomial applied: absolute and relative risks, chi-square.

Slides:



Advertisements
Similar presentations
Estimation of Means and Proportions
Advertisements

Hypothesis Testing. To define a statistical Test we 1.Choose a statistic (called the test statistic) 2.Divide the range of possible values for the test.
Analysis of frequency counts with Chi square
Copyright ©2011 Brooks/Cole, Cengage Learning More about Inference for Categorical Variables Chapter 15 1.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Categorical Variables Chapter 15.
PSY 307 – Statistics for the Behavioral Sciences
Independent Sample T-test Formula
Sample size computations Petter Mostad
Part III: Inference Topic 6 Sampling and Sampling Distributions
Inferences About Process Quality
Chapter 9 Hypothesis Testing.
5-3 Inference on the Means of Two Populations, Variances Unknown
Calculating sample size for a case-control study
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Hypothesis Testing and T-Tests. Hypothesis Tests Related to Differences Copyright © 2009 Pearson Education, Inc. Chapter Tests of Differences One.
INFERENTIAL STATISTICS – Samples are only estimates of the population – Sample statistics will be slightly off from the true values of its population’s.
Presentation 12 Chi-Square test.
The Chi-Square Test Used when both outcome and exposure variables are binary (dichotomous) or even multichotomous Allows the researcher to calculate a.
How Can We Test whether Categorical Variables are Independent?
AM Recitation 2/10/11.
Hypothesis Testing.
Jeopardy Hypothesis Testing T-test Basics T for Indep. Samples Z-scores Probability $100 $200$200 $300 $500 $400 $300 $400 $300 $400 $500 $400.
Statistics for clinical research An introductory course.
Section 10.1 ~ t Distribution for Inferences about a Mean Introduction to Probability and Statistics Ms. Young.
Chapter 9.3 (323) A Test of the Mean of a Normal Distribution: Population Variance Unknown Given a random sample of n observations from a normal population.
1 Tests with two+ groups We have examined tests of means for a single group, and for a difference if we have a matched sample (as in husbands and wives)
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
1 Level of Significance α is a predetermined value by convention usually 0.05 α = 0.05 corresponds to the 95% confidence level We are accepting the risk.
Statistics Primer ORC Staff: Xin Xin (Cindy) Ryan Glaman Brett Kellerstedt 1.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Statistical Inferences Based on Two Samples Chapter 9.
F OUNDATIONS OF S TATISTICAL I NFERENCE. D EFINITIONS Statistical inference is the process of reaching conclusions about characteristics of an entire.
Topics: Statistics & Experimental Design The Human Visual System Color Science Light Sources: Radiometry/Photometry Geometric Optics Tone-transfer Function.
1 G Lect 6b G Lecture 6b Generalizing from tests of quantitative variables to tests of categorical variables Testing a hypothesis about a.
Mid-Term Review Final Review Statistical for Business (1)(2)
Dr.Shaikh Shaffi Ahamed Ph.D., Dept. of Family & Community Medicine
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
Biostatistics Class 6 Hypothesis Testing: One-Sample Inference 2/29/2000.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Section Inference about Two Means: Independent Samples 11.3.
Contingency tables Brian Healy, PhD. Types of analysis-independent samples OutcomeExplanatoryAnalysis ContinuousDichotomous t-test, Wilcoxon test ContinuousCategorical.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 8 Hypothesis Testing.
Learning Objectives Copyright © 2002 South-Western/Thomson Learning Statistical Testing of Differences CHAPTER fifteen.
Review - Confidence Interval Most variables used in social science research (e.g., age, officer cynicism) are normally distributed, meaning that their.
1 G Lect 7a G Lecture 7a Comparing proportions from independent samples Analysis of matched samples Small samples and 2  2 Tables Strength.
© Copyright McGraw-Hill 2004
More Contingency Tables & Paired Categorical Data Lecture 8.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 11 Analyzing the Association Between Categorical Variables Section 11.2 Testing Categorical.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Section 6.4 Inferences for Variances. Chi-square probability densities.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 12 Tests of Goodness of Fit and Independence n Goodness of Fit Test: A Multinomial.
Introduction to Categorical Data Analysis July 22, 2004
A short introduction to epidemiology Chapter 6: Precision Neil Pearce Centre for Public Health Research Massey University Wellington, New Zealand.
THE CHI-SQUARE TEST BACKGROUND AND NEED OF THE TEST Data collected in the field of medicine is often qualitative. --- For example, the presence or absence.
Hypothesis Testing. Statistical Inference – dealing with parameter and model uncertainty  Confidence Intervals (credible intervals)  Hypothesis Tests.
Dr.Shaikh Shaffi Ahamed Ph.D., Dept. of Family & Community Medicine
Objectives (BPS chapter 12) General rules of probability 1. Independence : Two events A and B are independent if the probability that one event occurs.
Hypothesis Testing and Statistical Significance
Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &
The 2 nd to last topic this year!!.  ANOVA Testing is similar to a “two sample t- test except” that it compares more than two samples to one another.
Statistical analyses for two- way contingency tables HRP 261 January 10, 2005 Read Chapter 2 Agresti.
Statistical Inferences for Population Variances
Inference concerning two population variances
Chapter 9 Hypothesis Testing.
Chapter 4. Inference about Process Quality
Math 4030 – 10b Inferences Concerning Variances: Hypothesis Testing
The binomial applied: absolute and relative risks, chi-square
Hypothesis Testing Review
Chapter 9 Hypothesis Testing.
Chapter 10 Analyzing the Association Between Categorical Variables
Chapter 7 Lecture 3 Section: 7.5.
Statistical Inference for the Mean: t-test
Presentation transcript:

The binomial applied: absolute and relative risks, chi-square

Probability speak (just shorthand!)… P(X) = “the probability of event X” P(D) = “the probability of disease” P(E) = “the probability of exposure” P(~D) = “the probability of not getting the disease” P(~E)= “the probability of not being exposed” P(D/E) = “the probability of disease given exposure” or “the probability of disease among the exposed” P(D/~E) = “the probability of disease given unexposed” or “the probability of disease among the unexposed”

Things that follow a binomial distribution… Cohort study (or cross-sectional): The number of exposed individuals in your sample that develop the disease The number of unexposed individuals in your sample that develop the disease Case-control study: The number of cases that have had the exposure The number of controls that have had the exposure

Cohort study example: You sample 100 smokers and 100 non- smokers and follow them for 5 years to see who develops heart disease. Let’s say the “true” risk of developing heart disease in 5 years is 20% for smokers and 10% for non-smokers. In probability symbols: P(D/E)=.20 P(D/~E)=.10

Seeing it as a binomial… The number of smokers that develop heart disease in your study follows a binomial distribution with N=100, p=.20 The number of non-smokers that develop heart disease in your study follows a binomial distribution with N=100, p=.10

One possible outcome: Smoker (E)Non-smoker (~E) Heart disease (D)229 No Disease (~D)

Another possible outcome: Smoker (E)Non-smoker (~E) Heart disease (D)1715 No Disease (~D)

Another possible outcome: Smoker (E)Non-smoker (~E) Heart disease (D)2113 No Disease (~D) Let’s say these are the data we found!

Statistics for these data 1. Risk ratio (relative risk) 2. Difference in proportions (absolute risk) 3. Chi-square test of independence Mathematically equivalent to difference in proportions for 2x2 tables.

Exposure (E)No Exposure (~E) Disease (D)ab No Disease (~D)cd a+cb+d risk to the exposed risk to the unexposed 1. Risk ratio (relative risk)

Exposure (E)No Exposure (~E) Disease (D)ab No Disease (~D)cd a+cb+d Risk of disease in the exposed risk of disease in the unexposed In probability terms…

Risk ratio calculation: Smoker (E)Non-smoker (~E) Heart disease (D)2113 No Disease (~D)

Inferences about risk ratio… Is our observed risk ratio statistically different from 1.0? What is the p-value? I’m going to present statistical inference for odds ratio; risk ratio is similar. So, for now, just get answer from SAS: Confidence interval: 0.86 to 3.04 P-value>.05

2. Difference in proportions Exposure (E)No Exposure (~E) Disease (D)ab No Disease (~D)cd a+cb+d

2. Difference in proportions Smoker (E)Non-smoker (~E) Heart disease (D)2113 No Disease (~D) Absolute, rather than relative risk difference!

Is this statistically significant? This 8% difference could reflect a true association or it could be a fluke in this particular sample. The question: is 8% bigger or smaller than the expected sampling variability?

Difference in proportions test Null hypothesis: The difference in proportions is 0. Formula for standard error follows directly from binomial. Recall, variance of a proportion is XX Use average proportion in standard error formula, because under the null hypothesis we assume groups have same proportion. Follows a normal because…

What is the standard error under the null hypothesis? Corresponding two-sided p-value is.131.

Corresponding confidence interval…

OR, use computer simulation to get the standard error… 1. In SAS, assume infinite population of smokers and non-smokers with equal disease risk (UNDER THE NULL!) 2. Use the random binomial function to randomly select 100 smokers and 100 non- smokers 3. Calculate the observed difference in proportions. 4. Repeat this 1000 times. 5. Observe the distribution of differences under the null hypothesis.

Computer Simulation Results Standard error is about 5.3%

Difference in proportion test We observed a difference of 8% between smokers and non-smokers.

Hypothesis Testing Step 4: Calculate a p-value

When we ran this study 1000 times, we got 72 result as big or bigger than 8%. P-value from our simulation… We also got 82 results as small or smaller than –8%.

P-value from our simulation…

P-value From our simulation, we estimate the p-value to be: 154/1000 or.154

Reject the null. Alternative hypothesis: There is an association between smoking and heart disease.

Finally, chi-square Smoker (E)Non-smoker (~E) Heart disease (D)2113 No Disease (~D) Null hypothesis: smoking and heart disease are independent

Finally, chi-square Smoker (E)Non-smoker (~E) Heart disease (D)2113 No Disease (~D) Under independence, P(A&B)=P(A)*P(B)

Case-control example… Sample

Statistics for these data 1. Odds ratio (relative risk) 2. Difference in proportions exposed (absolute risk) 3. Chi-square

Odds vs. Risk If the risk is…Then the odds are… ½ (50%) ¾ (75%) 1/10 (10%) 1/100 (1%) Note: An odds is always higher than its corresponding probability, unless the probability is 100%. 1:1 3:1 1:9 1:99

The proportion of cases and controls are set by the investigator; therefore, they do not represent the risk (probability) of developing disease. Exposure (E)No Exposure (~E) Disease (D)ab No Disease (~D)cd The Odds Ratio (OR) a+b=cases c+d=controls Odds of exposure in the cases Odds of exposure in the controls

Inferences about the odds ratio…

Simulation…

Properties of the OR (simulation) (50 cases/50 controls/20% exposed) If the Odds Ratio=1.0 then with 50 cases and 50 controls, of whom 20% are exposed, this is the expected variability of the sample OR  note the right skew

Properties of the lnOR Standard deviation =

Practice problem

Do observed and expected differ more than expected due to chance?

Chi-Square test Degrees of freedom = (rows-1)*(columns-1)=(2-1)*(5-1)=4

The Chi-Square distribution: is sum of squared normal deviates The expected value and variance of a chi- square: E(x)=df Var(x)=2(df)

Chi-Square test Degrees of freedom = (rows-1)*(columns-1)=(2-1)*(5-1)=4 Rule of thumb: if the chi-square statistic is much greater than it’s degrees of freedom, indicates statistical significance. Here 85>>4.

Brain tumorNo brain tumor Own a cell phone Don’t own a cell phone Chi-square example: Cell size of 3 tells us we should opt for Fisher’s exact result in SAS. But doesn’t turn out very different in this case.

Same data, but use Chi-square test Brain tumorNo brain tumor Own Don’t own

Same data, but use Odds Ratio Brain tumorNo brain tumor Own a cell phone Don’t own a cell phone