Data Analysis in Practice- Based Research Stephen Zyzanski, PhD Department of Family Medicine Case Western Reserve University School of Medicine October.

Slides:



Advertisements
Similar presentations
Multiple Regression and Model Building
Advertisements

Qualitative predictor variables
Statistical Analysis Overview I Session 2 Peg Burchinal Frank Porter Graham Child Development Institute, University of North Carolina-Chapel Hill.
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
EPI 809/Spring Probability Distribution of Random Error.
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
Multiple regression analysis
School of Veterinary Medicine and Science Multilevel modelling Chris Hudson.
ANOVA: ANalysis Of VAriance. In the general linear model x = μ + σ 2 (Age) + σ 2 (Genotype) + σ 2 (Measurement) + σ 2 (Condition) + σ 2 (ε) Each of the.
Clustered or Multilevel Data
Topic 3: Regression.
Overview of Meta-Analytic Data Analysis. Transformations Some effect size types are not analyzed in their “raw” form. Standardized Mean Difference Effect.
Today Concepts underlying inferential statistics
This Week Continue with linear regression Begin multiple regression –Le 8.2 –C & S 9:A-E Handout: Class examples and assignment 3.
Summary of Quantitative Analysis Neuman and Robson Ch. 11
Linear Regression/Correlation
Two-Way Analysis of Variance STAT E-150 Statistical Methods.
Analysis of Clustered and Longitudinal Data
3nd meeting: Multilevel modeling: introducing level 1 (individual) and level 2 (contextual) variables + interactions Subjects for today:  Intra Class.
Objectives of Multiple Regression
Introduction to Multilevel Modeling Using SPSS
Overview of Meta-Analytic Data Analysis
Advanced Business Research Method Intructor : Prof. Feng-Hui Huang Agung D. Buchdadi DA21G201.
Sampling and Nested Data in Practice- Based Research Stephen Zyzanski, PhD Department of Family Medicine Case Western Reserve University School of Medicine.
From GLM to HLM Working with Continuous Outcomes EPSY 5245 Michael C. Rodriguez.
How do Lawyers Set fees?. Learning Objectives 1.Model i.e. “Story” or question 2.Multiple regression review 3.Omitted variables (our first failure of.
Inferences in Regression and Correlation Analysis Ayona Chatterjee Spring 2008 Math 4803/5803.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved Chapter 13 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
Hierarchical Linear Modeling (HLM): A Conceptual Introduction Jessaca Spybrook Educational Leadership, Research, and Technology.
Introduction Multilevel Analysis
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,
Correlation and Regression SCATTER DIAGRAM The simplest method to assess relationship between two quantitative variables is to draw a scatter diagram.
Testing Multiple Means and the Analysis of Variance (§8.1, 8.2, 8.6) Situations where comparing more than two means is important. The approach to testing.
Effect Size Estimation in Fixed Factors Between-Groups ANOVA
Effect Size Estimation in Fixed Factors Between- Groups Anova.
Problems with the Design and Implementation of Randomized Experiments By Larry V. Hedges Northwestern University Presented at the 2009 IES Research Conference.
Funded through the ESRC’s Researcher Development Initiative Prof. Herb MarshMs. Alison O’MaraDr. Lars-Erik Malmberg Department of Education, University.
Multilevel Data in Outcomes Research Types of multilevel data common in outcomes research Random versus fixed effects Statistical Model Choices “Shrinkage.
Data Analysis – Statistical Issues Bernd Genser, PhD Instituto de Saúde Coletiva, Universidade Federal da Bahia, Salvador
Chapter 10: Analyzing Experimental Data Inferential statistics are used to determine whether the independent variable had an effect on the dependent variance.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
7. Comparing Two Groups Goal: Use CI and/or significance test to compare means (quantitative variable) proportions (categorical variable) Group 1 Group.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,
HLM Models. General Analysis Strategy Baseline Model - No Predictors Model 1- Level 1 Predictors Model 2 – Level 2 Predictors of Group Mean Model 3 –
ICCS 2009 IDB Workshop, 18 th February 2010, Madrid 1 Training Workshop on the ICCS 2009 database Weighting and Variance Estimation picture.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Model Building and Model Diagnostics Chapter 15.
Environmental Modeling Basic Testing Methods - Statistics III.
Term 4, 2006 BIO656--Multilevel Models 1 PART 3. Term 4, 2006 BIO656--Multilevel Models 2 NEED TO INCORPORATE ALL UNCERTAINTIES The Z versus t distribution.
Multilevel Modeling. Multilevel Question Turns out the Simple Random Sampling is very expensive Travel to Moscow, Idaho to give survey to a single student.
Sample Size Determination
Sampling and Nested Data in Practice-Based Research Stephen Zyzanski, PhD Department of Family Medicine Case Western Reserve University School of Medicine.
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
Instructor: Dr. Amery Wu
Essentials of Business Statistics: Communicating with Numbers By Sanjiv Jaggia and Alison Kelly Copyright © 2014 by McGraw-Hill Higher Education. All rights.
Multiple Regression Learning Objectives n Explain the Linear Multiple Regression Model n Interpret Linear Multiple Regression Computer Output n Test.
Biostatistics Case Studies Peter D. Christenson Biostatistician Session 3: Missing Data in Longitudinal Studies.
Jump to first page Inferring Sample Findings to the Population and Testing for Differences.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Multiple Regression Chapter 14.
Today’s lesson (Chapter 12) Paired experimental designs Paired t-test Confidence interval for E(W-Y)
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 10: Comparing Models.
Using Multilevel Modeling in Institutional Research
Linear Mixed Models in JMP Pro
Chapter 13 Created by Bethany Stubbe and Stephan Kogitz.
From GLM to HLM Working with Continuous Outcomes
Multiple Regression Chapter 14.
Chapter Fourteen McGraw-Hill/Irwin
Presentation transcript:

Data Analysis in Practice- Based Research Stephen Zyzanski, PhD Department of Family Medicine Case Western Reserve University School of Medicine October 2008

Multilevel Data Statistical analyses that fail to recognize the hierarchical structure of the data, or the dependence among observations within the same clinician, yield inflated Type I errors in testing the effects of interventions.

Multilevel Data Inflation of the Type I error rate implies that interventions effects are more likely to be claimed than actually exist. Unless ICC is accounted for in the analysis, the Type I error rate will be inflated, often substantially.

Multilevel Data When ICC>0, this violates the assumption of independence. Usual analysis methods are not appropriate for group- randomized trials. Application of usual methods of analysis will result in a standard error that is too small and a p-value that overstates the significance of the results

Traditional Response to Nesting Ignore nesting or groups Conduct analysis with aggregated data –Use clinician as the unit of analysis Spread group data across lower units –Patients of a given clinician get the same value for clinician level variables

Analysis of Aggregated Data Analyses of aggregated data at higher levels of a hierarchy can produce different results from analyses at the individual level. Sample size will become very small and statistical power is substantially reduced Aggregation bias (meaning changed after aggregation)

Miscalculation of Standard Errors Nested data violate assumptions about independence of observations Exaggerated degrees of freedom for group data (e.g., clinicians) when spread across lower units (patients) Increased likelihood of Type I error due to unrealistically small confidence intervals

Reduction in Standard Error Basic formula for standard error of a mean is: Standard Error = Standard Deviation Sq. Rt. Sample Size If data are for 100 clinicians spread across 1000 patients, the standard error for clinician variables will be too small (roughly 1/3 its actual size in this example)

Example of Two-Group Analysis The primary aim of many trials is to compare two groups of patients with respect to their mean values on a quantitative outcome variable

Example of Two-Group Analysis Testing mean differences for statistical significance, in group trials, requires the computation of standard errors that take into account randomization by groups.

Analysis example Assume we have 32 clinicians, 16 randomized to Intervention and 16 to Control conditions Intervention is a weight loss program and the outcome is BMI at 2 years. Mean (I) = 25.62; Mean (C) = Sample (I) = 1929; Sample (C) = 2205 (4134)

Standard t-test t = M 1 -M 2 Sq. Rt. (Var (1/N 1 + 1/N 2 )) = = 0.36 = (p =0.02) (df = 4132) P=0.02 is too small when ICC>0

Adjusted two-sample t-test t = M 1 -M 2 Sq. Rt. (Var (C 1 /N 1 + C 2 /N 2 )) ICC = 0.02; C 1 =VIF/Grp1 = (1 + (N 1 -1)p) = = 0.36 = (p =0.21) (df = 30)

Post Hoc Correction for Analyses that Ignore the Group Effect. The VIF can be used to correct the inflation in the test statistic generated by the observation-level analysis. Test statistics such as F-and chi-square tests are corrected by dividing the test by the VIF. Test statistics such as t or z-tests are corrected by dividing the test by the square root of the VIF.

Post Hoc Correction Correction = t/VIF; where t=2.37, and VIF=1+(M-1)p = 1+(129-1)(.02) = 3.56 Sq. Rt. of 3.56 = 1.89 Correction: 2.37/1.89 = 1.25 (computed 1.27)

Example of Adjusting for Clustering from the DOPC Study Outcome: % time physicians spent chatting with adult pts. Hypothesis: No pt. gender difference in time spent chatting Mean percent time spent with: Male Patients: 8.2%; (N = 1203) Female patients: 7.2%; (N = 2181) t = 3.30, p = The intra-class correlation for chatting was: 0.15 The VIF for males was: 2.75 and 3.70 for females After adjusting for clustering: t = 1.89, p = 0.08

Multilevel Models This example illustrates a method for adjusting individual level analyses for clustering based on a simple extension of the standard two-sample t-test. We now move to a more comprehensive, but computationally more extensive, approach called Multilevel Modeling

What is Multilevel Modeling? A general framework for investigating nested data with complex error structures Multilevel models incorporate higher level (clinician) predictors into the analysis Multilevel models provide a methodology for connecting the levels together, i.e., to analyze variables from different levels simultaneously, while adjusting for the various dependencies.

Multilevel Models Combining variables from different levels into a single statistical model is a more complicated problem than estimating and correcting for design effects.

Multilevel Models Multilevel models are also known as: random-effects models, mixed- effects models, variance-components models, contextual models, or hierarchical linear models

Multilevel Models Use of information across multiple units of analysis to improve estimation of effects. Statistically partitioning variance and covariance components across levels Tests for cross-level effects (moderator)

A Multilevel Approach Specifies a patient-level model within clinicians. Level 1 model Treats regression coefficients as random variables at the clinician level Models the mean effect and variance in effects as a function of a clinician-level model

Correlates of Alcohol Consumption  S.E. P value Intercept <.001 Individual Coefficients Distance to Outlet Age <.001 Female <.001 Education Black <.001 Census Tract Coefficients Mean Distance to Outlets Mean Age Percent Female Mean Education Percent Black Percent Variance Explained Within Census Tracts 8.9ICC=11.5% Between Census Tracts 80.3 (Scribner, 2000)

Gender Differences in CV Risk Factors Management Using Multiple Levels With Interaction Analysis ManagementPatient genderPhysician genderPatient & MD interaction Weight management 1.Obesity documented 2.Physical activity advice F>M p = 0.001, OR = 1.8 F>M p = 0.032, OR = 2.21 Hypertension management 1.Advice for diet/wt loss 2.DM medication 3.Aspirin Therapy 4.ACEI/ARB therapy 5.BP <130/85 6.Physical activity advice F>M p = 0.07, OR = 2.5 F<M p = 0.03, OR = 0.49 F<M p = , OR = 0.3 F>M p = , OR = 6.55 P = P = 0.05

Software Packages MBDP-V ( VARCL ( SAS Proc Mix ( MLwiN ( HLM (

Take Home Messages Clustered data inflate stand errors & p-values Standard statistical analyses are invalid Post hoc corrections for clustering Multilevel data require multilevel analyses MM designed to analyze variables from different levels simultaneously & cross-level interactions Computationally extensive, requiring expertise Parameters to be estimated increase rapidly Missing data at Level-2 more problematic