Osteoarthritis Initiative Analytic Strategies for the OAI Data December 6, 2007 Charles E. McCulloch, Division of Biostatistics, Dept of Epidemiology and.

Slides:



Advertisements
Similar presentations
Brief introduction on Logistic Regression
Advertisements

Simple Logistic Regression
Departments of Medicine and Biostatistics
Lecture 6 (chapter 5) Revised on 2/22/2008. Parametric Models for Covariance Structure We consider the General Linear Model for correlated data, but assume.
Repeated Measures, Part 3 May, 2009 Charles E. McCulloch, Division of Biostatistics, Dept of Epidemiology and Biostatistics, UCSF.
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 14 Using Multivariate Design and Analysis.

Data mining and statistical learning, lecture 5 Outline  Summary of regressions on correlated inputs  Ridge regression  PCR (principal components regression)
Clustered or Multilevel Data
Chapter 11 Survival Analysis Part 2. 2 Survival Analysis and Regression Combine lots of information Combine lots of information Look at several variables.
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Regression and Correlation
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
Biostat Didactic Seminar Series Analyzing Binary Outcomes: Analyzing Binary Outcomes: An Introduction to Logistic Regression Robert Boudreau, PhD Co-Director.
GEE and Generalized Linear Mixed Models
Introduction to Multilevel Modeling Using SPSS
Correlation, Regression Covariate-Adjusted Group Comparisons
SAS Lecture 5 – Some regression procedures Aidan McDermott, April 25, 2005.
ANCOVA Lecture 9 Andrew Ainsworth. What is ANCOVA?
Inference for regression - Simple linear regression
Chapter 13: Inference in Regression
Simple Linear Regression
Statistics for clinical research An introductory course.
Lecture 8: Generalized Linear Models for Longitudinal Data.
Repeated Measures, Part I April, 2009 Charles E. McCulloch, Division of Biostatistics, Dept of Epidemiology and Biostatistics, UCSF.
Andrew Thomson on Generalised Estimating Equations (and simulation studies)
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
HSRP 734: Advanced Statistical Methods June 19, 2008.
Biostatistics Case Studies 2007 Peter D. Christenson Biostatistician Session 3: Incomplete Data in Longitudinal Studies.
Biostat Didactic Seminar Series Correlation and Regression Part 2 Robert Boudreau, PhD Co-Director of Methodology Core PITT-Multidisciplinary Clinical.
EVIDENCE ABOUT DIAGNOSTIC TESTS Min H. Huang, PT, PhD, NCS.
Multilevel Data in Outcomes Research Types of multilevel data common in outcomes research Random versus fixed effects Statistical Model Choices “Shrinkage.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Biostatistics Case Studies 2008 Peter D. Christenson Biostatistician Session 5: Choices for Longitudinal Data Analysis.
Data Analysis – Statistical Issues Bernd Genser, PhD Instituto de Saúde Coletiva, Universidade Federal da Bahia, Salvador
Different Distributions David Purdie. Topics Application of GEE to: Binary outcomes: – logistic regression Events over time (rate): –Poisson regression.
Linear correlation and linear regression + summary of tests
7. Comparing Two Groups Goal: Use CI and/or significance test to compare means (quantitative variable) proportions (categorical variable) Group 1 Group.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Epidemiologic design from a sampling perspective Epidemiology II Lecture April 14, 2005 David Jacobs.
Section 9-1: Inference for Slope and Correlation Section 9-3: Confidence and Prediction Intervals Visit the Maths Study Centre.
Multiple regression. Example: Brain and body size predictive of intelligence? Sample of n = 38 college students Response (Y): intelligence based on the.
Statistical Issues in the Analysis of Patient Outcomes April 11, 2003 Elizabeth Garrett Oncology Biostatistics Acknowledgement: Thanks to Ron Brookmeyer.
Guidelines for Appropriate OAI Data Use December 6, 2007 Yuqing Zhang, Boston University.
Going from data to analysis Dr. Nancy Mayo. Getting it right Research is about getting the right answer, not just an answer An answer is easy The right.
1 STA 617 – Chp10 Models for matched pairs Summary  Describing categorical random variable – chapter 1  Poisson for count data  Binomial for binary.
Psychology 202a Advanced Psychological Statistics November 12, 2015.
Sample Size Determination
Statistical inference Statistical inference Its application for health science research Bandit Thinkhamrop, Ph.D.(Statistics) Department of Biostatistics.
POPLHLTH 304 Regression (modelling) in Epidemiology Simon Thornley (Slides adapted from Assoc. Prof. Roger Marshall)
Instrument design Essential concept behind the design Bandit Thinkhamrop, Ph.D.(Statistics) Department of Biostatistics and Demography Faculty of Public.
Armando Teixeira-Pinto AcademyHealth, Orlando ‘07 Analysis of Non-commensurate Outcomes.
Week 7: General linear models Overview Questions from last week What are general linear models? Discussion of the 3 articles.
Beginners statistics Assoc Prof Terry Haines. 5 simple steps 1.Understand the type of measurement you are dealing with 2.Understand the type of question.
Logistic Regression Logistic Regression - Binary Response variable and numeric and/or categorical explanatory variable(s) –Goal: Model the probability.
Meta-analysis of observational studies Nicole Vogelzangs Department of Psychiatry & EMGO + institute.
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 13: Multiple, Logistic and Proportional Hazards Regression.
Bivariate analysis. * Bivariate analysis studies the relation between 2 variables while assuming that other factors (other associated variables) would.
Methods of Presenting and Interpreting Information Class 9.
Bootstrap and Model Validation
B&A ; and REGRESSION - ANCOVA B&A ; and
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Stats Club Marnie Brennan
Comparisons among methods to analyze clustered multivariate biomarker predictors of a single binary outcome Xiaoying Yu, PhD Department of Preventive Medicine.
Review for Exam 2 Some important themes from Chapters 6-9
BY: Mohammed Hussien Feb 2019 A Seminar Presentation on Longitudinal data analysis Bahir Dar University School of Public Health Post Graduate Program.
Yu Du, PhD Research Scientist Eli Lilly and Company
Presentation transcript:

Osteoarthritis Initiative Analytic Strategies for the OAI Data December 6, 2007 Charles E. McCulloch, Division of Biostatistics, Dept of Epidemiology and Biostatistics, UCSF

Osteoarthritis Initiative Outline 1.Introduction and examples. 2.General analysis considerations. 3.Accommodating correlations between knees within a person 4.Accommodating correlations over time 5.Analyzing change. 6.Questions from the participants.

Osteoarthritis Initiative Introduction Analysis technique depends on nature of the outcome variable and research question.  Binary: logistic regression (e.g., presence of osteophytes) –Odds ratios, area under ROC curve  Numeric: linear regression (e.g., WOMAC pain)  Also – time to event (Cox model or pooled logistic regression), count outcomes (Poisson regression) Methods need to be modified if there are clustered data or repeated measures.

Osteoarthritis Initiative Prototypical examples Example 1: (cross sectional) Is KOOS quality of life related to BMI at baseline? Example 2: (clustered by knee) Is difference between men and women in the WOMAC pain score the same for those with and without symptomatic knee OA at baseline?

Osteoarthritis Initiative Prototypical examples Example 3: (clustered data) Is the presence of osteophytes at baseline predicted by knee pain? Example 4: (longitudinal/change) Is the 18 month change in WOMAC pain score the same or different for those with symptomatic knee OA at baseline?

Osteoarthritis Initiative Ex 1: Is KOOS QoL related to baseline BMI?

Osteoarthritis Initiative Ex 1: Is KOOS QoL related to baseline BMI? Analysis: linear regression. Regression coefficient is with a SE of 0.08 and a p-value of < Not clustered data.

Osteoarthritis Initiative Potential classes of research questions Classify diseased/non-diseased individuals [Diagnostic, Burden]. Predict onset or progression of disease [Prognostic]. Model/predict change over time [Efficacy of Intervention]. [Biomarker classification from Bauer, et al, 2006]

Osteoarthritis Initiative Accommodating clustered or repeated measures data Important to accommodate clustering and repeated measures. Otherwise SEs, p-values and confidence intervals can be incorrect, sometimes grossly so. Not possible to predict how the results will change when the proper analysis is used.

Osteoarthritis Initiative Efficiency of analyses of clustered data For between person predictors (e.g. BMI), the proper, clustered-data (e.g.,outcome measured on two knees) analysis will usually have larger SEs.  Intuition: for between person predictors an analysis that assumes all knees are independent over-represents the information content. For within person predictors (e.g., knee- specific), the proper, clustered-data analysis will usually have smaller SEs.  Intuition: Using each person as their own control increases efficiency.

Osteoarthritis Initiative Ex 2: Is there a sex by baseline SX OA interaction for the WOMAC pain score? When analyzing knees, effect of failing to allow correlation between a person’s knees

Osteoarthritis Initiative Accommodating clustered or repeated measures data Many methods exist to accommodate  Mixed models (e.g., SAS Proc MIXED, NLMIXED)  Repeated measures ANOVA (e.g., SAS Proc GLM)  Alternating logistic regression (in SAS Proc GENMOD)  Generalized Estimating Equations (GEEs). Invoked in SAS using Proc GENMOD using the REPEATED statement.

Osteoarthritis Initiative Accommodating clustered or repeated measures data Repeated measures/clustering is an issue for the outcome variable, not the predictor. Example: Are days missed from work predicted by knee pain (separate values for left and right knee). Does not have repeated measures on the outcome. Can accommodate by including both left and right knee values as predictors or by calculating summary measure(s) (e.g., average knee pain).

Osteoarthritis Initiative Desirable features for an analysis method Can accommodate a variety of outcome types (e.g., binary and numeric). Can accommodate clustering by knee, person (over time) and perhaps even different regions of interest (ROI) within a knee. Does not require extensive modeling of the correlation over time or between knees or between ROI in the knee.

Osteoarthritis Initiative Recommended analysis strategy - GEEs Works with many types of outcomes. Robust variance estimate – obviates need to model correlation structure. Works well with not too many repeated measures per subject and a large number of subjects. So ideal for analyses incorporating multiple knees and time points. Somewhat less good if there are also multiple ROI per knee treated as outcomes (e.g., tibial and femoral cartilage loss).

Osteoarthritis Initiative Recommended analysis strategy - GEEs Accommodates unbalanced data, e.g., some subjects contribute one knee while others contribute two. Accommodates unequally spaced data, e.g., missed visits. BUT – always be wary of the pattern of missing data. If the fact that the data are missing is informative (e.g., those with missed visits are in extreme pain), virtually no standard statistical method will get the right answer.

Osteoarthritis Initiative Ex 2: Is there a sex by baseline SX OA interaction for the WOMAC pain score? Effect of different analysis methods:

Osteoarthritis Initiative Ex 2: Does pain predict presence of osteophytes at baseline? The odds of an osteophyte increase by 12.5% with each increase in pain score of 1 (0-10 scale). Odds ratio of (95% CI 1.102, 1.149). Accounts for clustering by subject.

Osteoarthritis Initiative Analyzing change with longitudinal data Including a variable for time (or visit) describes the change over time, e.g., progression. Inclusion of time (or visit) interactions with baseline predictors allows analysis of whether baseline predictors are associated with change over time. Inclusion of a time-varying predictor (e.g., MRI findings at sequential visits) allows analysis of whether change in that predictor is associated with change in the outcome.

Osteoarthritis Initiative Analyzing change with longitudinal data Can use lagged variables to ask if prior values of risk factors predict later onset of disease (Is it prognostic?) Helps to strengthen inference of causation.

Osteoarthritis Initiative Ex 3: Does 18 month change in WOMAC pain depend on baseline SX K OA? Include a SX K OA by visit interaction in the model.

Osteoarthritis Initiative Analyzing change over time: What about analyzing change scores? An excellent and simple method when there are only two time points of interest and most subjects have complete data. Not as attractive with multiple time points or unbalanced data. Some loss of efficiency. If you do analyze change scores, be very wary of adjusting for baseline values of the change scores. Doing so will usually bias estimates of change.

Osteoarthritis Initiative Analyzing change over time: What about analyzing change scores?

Osteoarthritis Initiative Analyzing change over time: What about analyzing change scores? Using a longitudinal analysis the difference in change (BL to 12 month visit) between the OA and non-OA groups is with a SE of and a p-value of Using the change score analysis the difference is with a SE of a p- value of

Osteoarthritis Initiative Data layouts for longitudinal/clustered data For longitudinal analyses: “long format”

Osteoarthritis Initiative Data layouts for longitudinal/clustered data For change score analyses: “wide format”

Osteoarthritis Initiative Questions and Answers? Contact information: