Biostat Didactic Seminar Series Correlation and Regression Part 2 Robert Boudreau, PhD Co-Director of Methodology Core PITT-Multidisciplinary Clinical.

Slides:



Advertisements
Similar presentations
CORRELATION. Overview of Correlation u What is a Correlation? u Correlation Coefficients u Coefficient of Determination u Test for Significance u Correlation.
Advertisements

Group Comparisons Part 1 Robert Boudreau, PhD Co-Director of Methodology Core PITT-Multidisciplinary Clinical Research Center for Rheumatic and Musculoskeletal.
BA 275 Quantitative Business Methods
Departments of Medicine and Biostatistics
EPI 809/Spring Probability Distribution of Random Error.
Statistical Tests Karen H. Hagglund, M.S.
Chapter 13 Multiple Regression
From last time….. Basic Biostats Topics Summary Statistics –mean, median, mode –standard deviation, standard error Confidence Intervals Hypothesis Tests.
CORRELATION. Overview of Correlation u What is a Correlation? u Correlation Coefficients u Coefficient of Determination u Test for Significance u Correlation.
Chapter 12 Multiple Regression
Test statistic: Group Comparison Jobayer Hossain Larry Holmes, Jr Research Statistics, Lecture 5 October 30,2008.
Chapter 14 Conducting & Reading Research Baumgartner et al Chapter 14 Inferential Data Analysis.
Nemours Biomedical Research Statistics April 2, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
Statistical Analysis SC504/HS927 Spring Term 2008 Week 17 (25th January 2008): Analysing data.
Social Research Methods
Ch. 14: The Multiple Regression Model building
Student’s t statistic Use Test for equality of two means
Summary of Quantitative Analysis Neuman and Robson Ch. 11
Statistical hypothesis testing – Inferential statistics II. Testing for associations.
1 Introduction to biostatistics Lecture plan 1. Basics 2. Variable types 3. Descriptive statistics: Categorical data Categorical data Numerical data Numerical.
Nonparametrics and goodness of fit Petter Mostad
Leedy and Ormrod Ch. 11 Gray Ch. 14
Biostat Didactic Seminar Series Analyzing Binary Outcomes: Analyzing Binary Outcomes: An Introduction to Logistic Regression Robert Boudreau, PhD Co-Director.
Correlation, Regression Covariate-Adjusted Group Comparisons
Estimation and Hypothesis Testing Faculty of Information Technology King Mongkut’s University of Technology North Bangkok 1.
ANALYSIS OF VARIANCE. Analysis of variance ◦ A One-way Analysis Of Variance Is A Way To Test The Equality Of Three Or More Means At One Time By Using.
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 12 Describing Data.
Simple Linear Regression
Statistics for clinical research An introductory course.
Understanding Multivariate Research Berry & Sanders.
7.1 - Motivation Motivation Correlation / Simple Linear Regression Correlation / Simple Linear Regression Extensions of Simple.
Descriptive Statistics e.g.,frequencies, percentiles, mean, median, mode, ranges, inter-quartile ranges, sds, Zs Describe data Inferential Statistics e.g.,
Choosing and using statistics to test ecological hypotheses
Statistics & Biology Shelly’s Super Happy Fun Times February 7, 2012 Will Herrick.
Biostatistics – A Revisit What are they? Why do we need them? Their relevance and importance.
Biostat 200 Lecture 7 1. Hypothesis tests so far T-test of one mean: Null hypothesis µ=µ 0 Test of one proportion: Null hypothesis p=p 0 Paired t-test:
Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,
RESULTS & DATA ANALYSIS. Descriptive Statistics  Descriptive (describe)  Frequencies  Percents  Measures of Central Tendency mean median mode.
Group Comparisons Part 3: Nonparametric Tests, Chi-squares and Fisher Exact Robert Boudreau, PhD Co-Director of Methodology Core PITT-Multidisciplinary.
Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Linear correlation and linear regression + summary of tests
Descriptive Statistics Examining Your Data Robert Boudreau, PhD Co-Director of Methodology Core PITT-Multidisciplinary Clinical Research Center for Rheumatic.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT OSMAN BIN SAIF Session 26.
STATISTICAL ANALYSIS FOR THE MATHEMATICALLY-CHALLENGED Associate Professor Phua Kai Lit School of Medicine & Health Sciences Monash University (Sunway.
Chapter 16 Data Analysis: Testing for Associations.
Experimental Research Methods in Language Learning Chapter 10 Inferential Statistics.
Going from data to analysis Dr. Nancy Mayo. Getting it right Research is about getting the right answer, not just an answer An answer is easy The right.
Inferential Statistics. Coin Flip How many heads in a row would it take to convince you the coin is unfair? 1? 10?
Introduction to Basic Statistical Tools for Research OCED 5443 Interpreting Research in OCED Dr. Ausburn OCED 5443 Interpreting Research in OCED Dr. Ausburn.
28. Multiple regression The Practice of Statistics in the Life Sciences Second Edition.
Statistics for Neurosurgeons A David Mendelow Barbara A Gregson Newcastle upon Tyne England, UK.
Principles of statistical testing
XIAO WU DATA ANALYSIS & BASIC STATISTICS.
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
Jump to first page Inferring Sample Findings to the Population and Testing for Differences.
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
Approaches to quantitative data analysis Lara Traeger, PhD Methods in Supportive Oncology Research.
Beginners statistics Assoc Prof Terry Haines. 5 simple steps 1.Understand the type of measurement you are dealing with 2.Understand the type of question.
Data Workshop H397. Data Cleaning  Inputting data  Missing Values  Converting String Variables  Creating Scales  Creating Dummy Variables.
Appendix I A Refresher on some Statistical Terms and Tests.
Bivariate analysis. * Bivariate analysis studies the relation between 2 variables while assuming that other factors (other associated variables) would.
Correlation & Simple Linear Regression Chung-Yi Li, PhD Dept. of Public Health, College of Med. NCKU 1.
Chapter 11: Linear Regression E370, Spring From Simple Regression to Multiple Regression.
CHAPTER 7 Linear Correlation & Regression Methods
Statistics.
Ass. Prof. Dr. Mogeeb Mosleh
Nazmus Saquib, PhD Head of Research Sulaiman AlRajhi Colleges
Introductory Statistics
Presentation transcript:

Biostat Didactic Seminar Series Correlation and Regression Part 2 Robert Boudreau, PhD Co-Director of Methodology Core PITT-Multidisciplinary Clinical Research Center for Rheumatic and Musculoskeletal Diseases Core Director for Biostatistics Center for Aging and Population Health Center for Aging and Population Health Dept. of Epidemiology, GSPH Dept. of Epidemiology, GSPH

Previous Biostat Didactics Fall 2009 – Spring Descriptive Statistics: Examining Your Data Data types: Qualitative (Categorical), Ordinal, Quantitative Data types: Qualitative (Categorical), Ordinal, Quantitative Mean, SD, medians, quartiles, IQR, skewness, histograms, boxplots Mean, SD, medians, quartiles, IQR, skewness, histograms, boxplots 2. Group Comparisons: Part 1 Normal dist (mean, SD: 68%, 95%, 99% interpretation) Normal dist (mean, SD: 68%, 95%, 99% interpretation) t-dist, degrees of freedom (n-1) t-dist, degrees of freedom (n-1) Confidence interval for the mean Confidence interval for the mean 3. Group Comparisons: Part 2 Comparing means: Two-sample independent t-test Comparing means: Two-sample independent t-test pooled and unequal variance (Satterthwaite) versions pooled and unequal variance (Satterthwaite) versions interpretation of p-values, type I (false positive) and type II error interpretation of p-values, type I (false positive) and type II error

Previous Biostat Didactics Fall 2009 – Spring Group Comparisons Part 3: Nonparametric Tests, Chi-squares and Fisher Exact Comparing groups having small sample sizes (< 20) or with non-normal distributions Comparing groups having small sample sizes (< 20) or with non-normal distributions => Use Wilcoxon Rank-Sum Test (nonparametric) (based on rank-order when sorted rather than (based on rank-order when sorted rather than on actual numeric values) on actual numeric values) Comparing groups in the % falling into diff categories Comparing groups in the % falling into diff categories => Use Chi-square, Fisher’s Exact (if any cell n Use Chi-square, Fisher’s Exact (if any cell n < 5)

Previous Biostat Didactics Fall 2009 – Spring Correlation, Regression and Covariate-Adjusted Group Comparisons Pearson vs Spearman correlation Pearson vs Spearman correlation => linear vs monotone association Regression: interpretation of beta coefficients Regression: interpretation of beta coefficients Standard errors, p-values Standard errors, p-values Continuous predictor => beta coeff is a slope Continuous predictor => beta coeff is a slope Dichotomous (e.g. group “dummy” 0,1 valued variable) Dichotomous (e.g. group “dummy” 0,1 valued variable) => beta coeff is difference in response vs “referent” treatment_group = 1knockout mouse = 0wild mouse (referent) = 0wild mouse (referent) Adjusting for important covars when comparing groups Adjusting for important covars when comparing groups

Flow chart for group comparisons Measurements to be compared continuous Distribution approx normal or N ≥ 20? NoYes Non-parametrics T-tests discrete ( binary, nominal, ordinal with few values) Chi-square Fisher’s Exact

Flow chart for regression models (includes adjusted group comparisons) Outcome variable continuous or dichotomous? dichotomouscontinuous Time-to-event available (or relevant)? NoYes Multiple logistic regression Cox proportional hazards regression Predictor variable categorical? NoYes (e.g. groups) Multiple linear regression ANCOVA (Multiple linear regression - using dummy variable(s) for categorical var(s)

Analysis From Last Didactic … In Health, Aging and Body Composition Knee-OA Substudy: In Health, Aging and Body Composition Knee-OA Substudy:  Examine Association between SxRxKOA (knee OA) and CRP adjusted for BMI. Motivation: Sowers M, Hochberg M et. al. C-reactive protein as a biomarker of emergent osteoarthritis. Osteoarthritis and Cartilage Volume 10, Issue 8, August 2002, Pages Conclusion: “CRP is highly associated with Knee OA; however, its high correlation with obesity limits its utility as an exclusive marker for knee OA”

All White Females in HABC (N=844) [includes SxRxKOA (n=93); also rest of parent study cohort] N=5 N=5 had CRP > 30 (max=63.2)

log CRP

White Females Difference in average logCRP: 0.76 – 0.43 = 0.33 Knee OA P-value No (n=752)Yes (n=92) Mean (SD) Equal varsUnequal logCRP0.43 (0.83)0.76 (0.58)0.0002< BMI25.4 (4.3)28.8 (5.2)< logCRP SD’s were signif diff (p<0.0001) => Use Satterthwaite unequal variance test

Two-Group Unadjusted Comparison Of Means Using Regression with Dummy-coded Groups * No OA is “referent” group (i.e. kneeOA=0) HABCID logCRP kneeOA BMI proc reg data=kneeOA_vs_noOA; model logCRP= KneeOA; where female=1 and white=1; run;

White Females: 2-Group Comparison Using Dummy-coded Groups * No OA is “referent” group (KneeOA=0); proc reg data=kneeOA_vs_noOA; model logCRP= KneeOA; where female=1 and white=1; run; Note: Regression using Dummy (0, 1) for group variable (e.g. KneeOA=0,1) In regression, equal (pooled) variance is assumed “No OA” mean “kneeOA” mean difference from referent Same p-value as equal variance t-test

Model: logCRP= *kneeOA (intercept) KneeOA=0  logCRP= *0 = KneeOA=1  logCRP= *1 = proc reg data=kneeOA_vs_noOA; model logCRP= KneeOA; where female=1 and white=1; run;

ANCOVA (Analysis of Covariance) Compare logCRP adjusted for BMI 

proc reg data=kneeOA_vs_noOA; model logCRP=KneeOA bmi; where female=1 and white=1; run; Note: Equal BMI slopes in each group is being modeled  Unadjusted diff Was 0.33 BMI partially “explains” this difference

{ Unadjusted Mean Difference Notice: At any BMI level, the mean logCRP difference between KneeOA vs Not is smaller than the unadjusted difference

logCRP between KneeOA vs Not Adjusted for BMI, Age and Anti-inflammatory Meds Note: age is not significant (caveat: narrow HABC study age range: 69-80)

White Females: 2-Group Comparison Using Dummy-coded Groups * No OA is “referent” group (KneeOA=0); proc reg data=kneeOA_vs_noOA; model logCRP= KneeOA; where female=1 and white=1; run; Note: Regression using Dummy (0, 1) for group variable (e.g. KneeOA=0,1) In regression, equal (pooled) variance is assumed “No OA” mean “kneeOA” mean difference from referent

Pearson Correlation Pearson Correlation = a measure of linear association

Pearson vs Spearman Correlation Spearman: A measure of rank order correlation Works for any general trend that is increasing or decreasing and not necessarily linear

Pearson vs Spearman Correlation Spearman: A measure of rank order correlation Works for any general trend that is increasing or decreasing and not necessarily linear Equals Pearson Correlation using the ranks of the observations instead of actual values Heuristically: Spearman measures degree that low goes with low, middle with middle, high with high

Effect of Centering BMI at 25 proc reg data=kneeOA_vs_noOA; model logCRP=bmi_minus25; where female=1 and white=1 and kneeOA=1; run;  logCRP= *(BMI-25) = at BMI=25 (see graphic)

Effect of Centering BMI at 25  Model 2: logCRP= *(BMI-25) = * *BMI = *BMI

{ Unadjusted Mean Difference

ANCOVA (Analysis of Covariance) Centering BMI at 25 proc reg data=kneeOA_vs_noOA; model logCRP=KneeOA bmi_minus25; where female=1 and white=1; run; Note: Equal BMI slopes in each group is being modeled 

Check of ANCOVA Assumption: Equality of BMI slopes: KneeOA vs Not proc reg data=knee_vs_noOA; model logCRP=KneeOA bmi BMI_x_KneeOA; model logCRP=KneeOA bmi BMI_x_KneeOA; where female=1 and white=1; where female=1 and white=1; run; (“interaction term”) HABCID logCRP kneeOA BMI BMI_x_KneeOA

Check of ANCOVA Assumption: Equality of BMI slopes: KneeOA vs Not proc reg data=knee_vs_noOA; model logCRP=KneeOA bmi BMI_x_KneeOA; model logCRP=KneeOA bmi BMI_x_KneeOA; where female=1 and white=1; where female=1 and white=1;run; The “BMI” slopes are not signif different (p=0.8019) => they are parallel

Thank you Questions, comments, suggestions or insights? Questions, comments, suggestions or insights? Remaining time: Open consultation … Remaining time: Open consultation …