EPI 809 / Spring 2008 Final Review EPI 809 / Spring 2008 Ch11 Regression and correlation  Linear regression Model, interpretation. Model, interpretation.

Slides:



Advertisements
Similar presentations
AP Statistics Course Review.
Advertisements

M2 Medical Epidemiology
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
KRUSKAL-WALIS ANOVA BY RANK (Nonparametric test)
Departments of Medicine and Biostatistics
EPI 809/Spring Probability Distribution of Random Error.
1 If we live with a deep sense of gratitude, our life will be greatly embellished.
Introduction to Risk Factors & Measures of Effect Meg McCarron, CDC.
Chapter 13 Multiple Regression
Chapter 12 Multiple Regression
The Simple Regression Model
Final Review Session.
EPI 809/Spring Multiple Logistic Regression.
Chapter 11 Multiple Regression.
Categorical Data Analysis: Stratified Analyses, Matching, and Agreement Statistics Biostatistics March 2007 Carla Talarico.
Introduction to Probability and Statistics Linear Regression and Correlation.
PSY 307 – Statistics for the Behavioral Sciences Chapter 19 – Chi-Square Test for Qualitative Data Chapter 21 – Deciding Which Test to Use.
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Linear Regression/Correlation
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
Leedy and Ormrod Ch. 11 Gray Ch. 14
Analytic Epidemiology
Regression and Correlation Methods Judy Zhong Ph.D.
Stratification and Adjustment
ANALYSIS OF VARIANCE. Analysis of variance ◦ A One-way Analysis Of Variance Is A Way To Test The Equality Of Three Or More Means At One Time By Using.
Unit 6: Standardization and Methods to Control Confounding.
Chapter 13: Inference in Regression
Analysis of Categorical Data
Statistics for clinical research An introductory course.
BPS - 3rd Ed. Chapter 211 Inference for Regression.
Statistics & Biology Shelly’s Super Happy Fun Times February 7, 2012 Will Herrick.
Inferences in Regression and Correlation Analysis Ayona Chatterjee Spring 2008 Math 4803/5803.
Hypothesis Testing Field Epidemiology. Hypothesis Hypothesis testing is conducted in etiologic study designs such as the case-control or cohort as well.
1 G Lect 10a G Lecture 10a Revisited Example: Okazaki’s inferences from a survey Inferences on correlation Correlation: Power and effect.
POTH 612A Quantitative Analysis Dr. Nancy Mayo. © Nancy E. Mayo A Framework for Asking Questions Population Exposure (Level 1) Comparison Level 2 OutcomeTimePECOT.
Biostat 200 Lecture 7 1. Hypothesis tests so far T-test of one mean: Null hypothesis µ=µ 0 Test of one proportion: Null hypothesis p=p 0 Paired t-test:
EQT 373 Chapter 3 Simple Linear Regression. EQT 373 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Linear correlation and linear regression + summary of tests
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Contingency tables Brian Healy, PhD. Types of analysis-independent samples OutcomeExplanatoryAnalysis ContinuousDichotomous t-test, Wilcoxon test ContinuousCategorical.
The binomial applied: absolute and relative risks, chi-square.
MBP1010 – Lecture 8: March 1, Odds Ratio/Relative Risk Logistic Regression Survival Analysis Reading: papers on OR and survival analysis (Resources)
Analytical epidemiology Disease frequency Study design: cohorts & case control Choice of a reference group Biases Alain Moren, 2006 Impact Causality Effect.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 13 Multiple Regression Section 13.3 Using Multiple Regression to Make Inferences.
STATISTICAL ANALYSIS FOR THE MATHEMATICALLY-CHALLENGED Associate Professor Phua Kai Lit School of Medicine & Health Sciences Monash University (Sunway.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Review Lecture 51 Tue, Dec 13, Chapter 1 Sections 1.1 – 1.4. Sections 1.1 – 1.4. Be familiar with the language and principles of hypothesis testing.
1 STA 617 – Chp10 Models for matched pairs Summary  Describing categorical random variable – chapter 1  Poisson for count data  Binomial for binary.
Log-linear Models HRP /03/04 Log-Linear Models for Multi-way Contingency Tables 1. GLM for Poisson-distributed data with log-link (see Agresti.
1 G Lect 7a G Lecture 7a Comparing proportions from independent samples Analysis of matched samples Small samples and 2  2 Tables Strength.
Sampling and Statistical Analysis for Decision Making A. A. Elimam College of Business San Francisco State University.
Introduction to Biostatistics, Harvard Extension School, Fall, 2005 © Scott Evans, Ph.D.1 Contingency Tables.
Fall 2002Biostat Inference for two-way tables General R x C tables Tests of homogeneity of a factor across groups or independence of two factors.
Objectives (BPS chapter 12) General rules of probability 1. Independence : Two events A and B are independent if the probability that one event occurs.
BPS - 5th Ed. Chapter 231 Inference for Regression.
Introdcution to Epidemiology for Medical Students Université Paris-Descartes Babak Khoshnood INSERM U1153, Equipe EPOPé (Dir. Pierre-Yves Ancel) Obstetric,
Chapter 11 Linear Regression and Correlation. Explanatory and Response Variables are Numeric Relationship between the mean of the response variable and.
March 28 Analyses of binary outcomes 2 x 2 tables
The binomial applied: absolute and relative risks, chi-square
STAT120C: Final Review.
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Saturday, August 06, 2016 Farrokh Alemi, PhD.
Association, correlation and regression in biomedical research
Linear Regression and Correlation
Categorical Data Analysis
Linear Regression and Correlation
Applied Statistics Using SPSS
Applied Statistics Using SPSS
Presentation transcript:

EPI 809 / Spring 2008 Final Review

EPI 809 / Spring 2008 Ch11 Regression and correlation  Linear regression Model, interpretation. Model, interpretation. Model Coefficient calculation. Model Coefficient calculation. b = L xy / L xx (slope), b 0 = Y – b xb = L xy / L xx (slope), b 0 = Y – b x Assumption, goodness-of-fit, validity. Assumption, goodness-of-fit, validity. Independent error, Gaussian dist. Const. var. Independent error, Gaussian dist. Const. var. Test and inference (t-test). Test and inference (t-test). Multiple regression. F-test vs T-test. Multiple regression. F-test vs T-test.  Pearson correlation Interpretation and inference Interpretation and inference T-test and Fisher’s z-test (transformation). T-test and Fisher’s z-test (transformation). 1. t = r (n-2) 1/2 /(1-r 2 ) 1/2 ~ t n-2 2. Z = ½ ln [(1+r) / (1-r)] ~ Normal mean=Z(r 0 ) and var =1/(n-3) - -

EPI 809 / Spring 2008 Learning Objectives 1. Describe the Linear Regression Model 2. State the Regression Modeling Steps 3. Explain Ordinary Least Squares 4. Compute Regression Coefficients 5. Understand and check model assumptions 6. Predict Response Variable 7. Comments of SAS Output

EPI 809 / Spring 2008 Learning Objectives… 8. Correlation Models 9. Link between a correlation model and a regression model (one indep. Var): b = rS y /S x, and S y 2 = L yy /(n-1) 10. Test of coefficient of Correlation

EPI 809 / Spring 2008 ANOVA  Continuous response, categorical explanatory (indep) var.  Assumption. (Gauss-Markov condition).  Decomposition SS SS total = SS trt + SS error SS total = SS trt + SS error or SS total = SS trt + SS blk + SS error or SS total = SS A + SS B + SS AB + SS error  Estimation vs Prediction (diff. var.)

EPI 809 / Spring 2008 Multiple comparison  Contrast for multiple levels of var. construct contrast according to aim.  Adjustment for multiple comparison  LSD, Bonferroni, Sheffe.

EPI 809 / Spring 2008 Ch 9 Non-parametric tests  Mainly interested in ranking (distribution) Normality of data may be violated.  Sign test, rank sum test, signed-rank test, Kruskal-Wallis test

EPI 809 / Spring 2008 Summary NonparametricParametric Sign Rank testOne sample t-test Wilcoxon Rank – Sum test (Mann-Whitney U test) Two sample t-test Wilcoxon Signed-Rank testTwo paired sample t-test Kruskal-Wallis testMultiple sample test.

EPI 809 / Spring 2008 Ch 10 Categorical Data Analysis

EPI 809 / Spring 2008 Learning Objectives 1. Comparison of binomial proportion using Z and  2 Test. 2. Explain  2 Test for Independence of 2 variables 3. Explain The Fisher’s test for independence 4. McNemar’s tests for correlated data 5. Kappa Statistic 6. Use of SAS Proc FREQ

EPI 809 / Spring 2008 Z Test for Difference in Two Proportions 1.Assumptions Populations Are Independent Populations Are Independent Populations Follow Binomial Distribution Populations Follow Binomial Distribution Normal Approximation Can Be Used for large samples (All Expected Counts  5) Normal Approximation Can Be Used for large samples (All Expected Counts  5) 2. Z-Test Statistic for Two Proportions

EPI 809 / Spring 2008 Sample Distribution for Difference Between Proportions

EPI 809 / Spring 2008  2 Test of Independence Hypotheses & Statistic 1.Hypotheses H 0 : Variables Are Independent H a : Variables Are Related (Dependent) 2. Test Statistic Degrees of Freedom: (r - 1)(c - 1) r Rows & C Columns O: Observed count E: Expected count

EPI 809 / Spring 2008 Fisher’s Exact Test  Hypergeometric distribution  Example: 2x2 table (cell counts a, b, c, d). Assuming fixed marginal totals: M1 = a+b, M2 = c+d, N1 = a+c, N2 = b+d. for convenience assume N1<N2, M1<M2. possible value of a are: 0, 1, …min(M1,N1).  Probability distribution of cell count a follows a hypergeometric distribution: N = a + b + c + d = N1 + N2 = M1 + M2 Pr (x=a) = N1! N2! M1! M2! / [N! a! b! c! d!] Pr (x=a) = N1! N2! M1! M2! / [N! a! b! c! d!] Mean (x) = M1 N1 / N Mean (x) = M1 N1 / N Var (x) = M1 M2 N1 N2 / [N 2 (N-1)] Var (x) = M1 M2 N1 N2 / [N 2 (N-1)] a b M1 c d M2 N1 N2 N

EPI 809 / Spring 2008 Fisher’s Exact Test  Fisher exact test is based on hypergeometric distr.  Probability of observing this specific table given fixed marginal totals is Pr (a=3,b=7, c=5, d=10) = 10!15!8!17!/[25!3!7!5!10!] =  Note the above is not the p-value. Why?  Not the accumulative probability, or not the tail probability.  Notice range of a: [0, min(M1, N1)] for M1<M2 and N1<N2  Tail prob = sum of all values (a = 3, 2, 1, 0).

EPI 809 / Spring 2008 Kappa (  ) Measures of Association  Cohen’s Kappa (  ) Cohen’s  measures the agreement between two variables and is defined by Cohen’s  measures the agreement between two variables and is defined by  = p o - p e 1 - p e Kappa >.75 excellent reproducibility; [.4,.75] good reproducibility; <.4 marginal reproducibility.

EPI 809 / Spring 2008  H 0 :   =   : discordant probabilities.  H a :       Test Statistic: Chi-squares with df = 1.   B – C| - 1 } 2  2 = B + C McNemar’s Test for Correlated (Dependent) Proportions

EPI 809 / Spring 2008 Chapter 13 Design and Analysis Techniques for Epidemiologic Studies

EPI 809 / Spring 2008 Learning Objectives 1. Define study designs 2. Measures of effects for categorical data 3. Confounders and effects modifications 4. Stratified analysis (Mantel Haenszel statistic, multiple logistic regression) 5. Use of SAS Proc FREQ and Proc Logistic

EPI 809 / Spring 2008 Experimental Study   Randomization protects against bias in assignment to groups.   Blinding protects against bias in outcome assessment or measurement.   Control for (major) sources of variability, although not necessarily reflecting real life conditions   Expensive in terms of time and money

EPI 809 / Spring 2008 Observational Study most likely used in Epidemiology   Types of study Cross-sectional study Both expos & outcome random; Case-control study (retrospective) Random expos, fixed outcome; Cohort study (Prospective) Fixed expos, random outcome.

EPI 809 / Spring 2008 Measures of effects  Depends on study design Prospective study: Incidence of disease (risk difference, relative risk, odds ratio of disease) Prospective study: Incidence of disease (risk difference, relative risk, odds ratio of disease) Cross-sectional: Prevalence of disease (risk difference, relative risk, odds ratio of disease) Cross-sectional: Prevalence of disease (risk difference, relative risk, odds ratio of disease) Case-cohort: study of exposure (odds ratio of exposure) Case-cohort: study of exposure (odds ratio of exposure)

EPI 809 / Spring 2008 Measured the attributable risk due to exposure Only for cross-sectional and cohort studies Measured the attributable risk due to exposure Risk difference

EPI 809 / Spring 2008 Only for cross-sectional and cohort studies: Ratio of the probability that the outcome characteristic is present for one group, relative to the other The range of RR is [0,  ). By taking the logarithm, we have (- , +  ) as the range for ln(RR) and a better approximation to normality for the estimated Relative Risk

EPI 809 / Spring 2008 Odds Ratio - Disease  Odds ratio is the odds of the event for exposed divided by the odds of the event for unexposed  Sample odds of the outcome for each group:

EPI 809 / Spring 2008 we fixed the number of cases and controls then ascertained exposure status. The relative risk is therefore not estimable from these data alone. Instead of the relative risk we can estimate the exposure OR which Cornfield (1951) showed equivalent to the disease OR: In other words, the odds ratio can be estimated regardless of the sampling scheme. Odds Ratio-Exposure

EPI 809 / Spring 2008 For rare diseases, the disease odds ratio approximates the relative risk: Since with case-control data we are able to effectively estimate the exposure odds ratio we are then able to equivalently estimate the disease odds ratio which for rare diseases approximates the relative risk. Odds Ratio-Relative risk

EPI 809 / Spring 2008 The odds ratio has [0,  ) as its range. The log odds ratio has (- , +  ) as its range and the normal approximation is better as an approximation to the estimated log odds ratio. Confidence intervals are based upon: Therefore, a (1 -  ) confidence interval for the odds ratio is given by exponentiating the lower and upper bounds. Odds Ratio Odds Ratio

EPI 809 / Spring 2008 RD = p 1 - p 2 = risk difference (null: RD = 0) also known as attributable risk or excess risk measures absolute effect – the proportion of cases among the exposed that can be attributed to exposure RR = p 1 / p 2 = relative risk (null: RR = 1) measures relative effect of exposure bounded above by 1/p 2 OR = [p 1 (1-p 2 )]/[ p 2 (1-p 1 )] = odds ratio (null: OR = 1) range is 0 to  approximates RR for rare events invariant of switching rows and cols key parameter in logistic regression Summary Summary

EPI 809 / Spring 2008 Variation in the magnitude of measure of effect across levels of a third variable. Variation in the magnitude of measure of effect across levels of a third variable. Effect modification is not a bias but useful information Effect modification is not a bias but useful information Effect modifier Happens when RR or OR is different between strata (subgroups of population)

EPI 809 / Spring 2008 Confounding Distortion of measure of effect because of a third factor Distortion of measure of effect because of a third factor Should be prevented or Needs to be controlled for Should be prevented or Needs to be controlled for

EPI 809 / Spring 2008 Confounding Exposure Outcome Third variable Be associated with exposure - without being the consequence of exposure Be associated with outcome - independently of exposure

EPI 809 / Spring 2008 Positive confounding - positively or negatively related to both the disease and exposure Negative confounding - positively related to disease but is negatively related to exposure or the reverse Prevention (Design Stage) Restriction to one stratum or Matching Restriction to one stratum or Matching Control (Analysis Stage) Control (Analysis Stage) Stratified analysis – Mantel Haenszel Stratified analysis – Mantel Haenszel Multivariable analysis – logistic regression. Multivariable analysis – logistic regression. Confounding and Control

EPI 809 / Spring 2008 (1)The Mantel-Haenszel estimate of the odds ratio assumes there is a common odds ratio: OR pool = OR 1 = OR 2 = … = OR K To estimate the common odds ratio we take a weighted average of the stratum-specific odds ratios: MH estimate: Mantel Haenszel Methods common odds ratio

EPI 809 / Spring 2008 ( 2) Test of common odds ratio H o : common OR is 1.0 vs. H a : common OR  A standard error is available for the MH common odds - Standard CI intervals and test statistics are based on the standard normal distribution. (3) Test of effect modification (heterogeneity, interaction) H o : OR 1 = OR 2 = … = OR K H a : not all stratum-specific OR’s are equal Breslow-Day (SAS) homogeneity test can be used Mantel Haenszel Methods

EPI 809 / Spring 2008 Multiple Logistic Regression

EPI 809 / Spring 2008 Multiple Logistic Regression- Formulation The relationship between π and x is S shaped The logit (log-odds) transformation (link function)

EPI 809 / Spring 2008 Interpretation of the parameters  If π is the probability of an event and O is the odds for that event then  The link function in logistic regression gives the log- odds