Multilevel Data in Outcomes Research Types of multilevel data common in outcomes research Random versus fixed effects Statistical Model Choices “Shrinkage.

Slides:



Advertisements
Similar presentations
Randomized Complete Block and Repeated Measures (Each Subject Receives Each Treatment) Designs KNNL – Chapters 21,
Advertisements

Test of (µ 1 – µ 2 ),  1 =  2, Populations Normal Test Statistic and df = n 1 + n 2 – 2 2– )1– 2 ( 2 1 )1– 1 ( 2 where ] 2 – 1 [–
Correlation, Reliability and Regression Chapter 7.
Lesson 10: Linear Regression and Correlation
Kin 304 Regression Linear Regression Least Sum of Squares
By Zach Andersen Jon Durrant Jayson Talakai
Statistical Analysis Overview I Session 2 Peg Burchinal Frank Porter Graham Child Development Institute, University of North Carolina-Chapel Hill.
Katie Reed EPSSA Methods Workshop. School environment New Latino destinations Immigrant Incorporation Importance of “context of reception” for immigrants’
Lecture 4 Linear random coefficients models. Rats example 30 young rats, weights measured weekly for five weeks Dependent variable (Y ij ) is weight for.
School of Veterinary Medicine and Science Multilevel modelling Chris Hudson.
Complex Surveys Sunday, April 16, 2017.
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 14 Using Multivariate Design and Analysis.
QUANTITATIVE DATA ANALYSIS
1-1 Regression Models  Population Deterministic Regression Model Y i =  0 +  1 X i u Y i only depends on the value of X i and no other factor can affect.

Sampling.
Clustered or Multilevel Data
1 EPI235: Epi Methods in HSR May 3, 2007 L10 Outcomes and Effectiveness Research 4: HMO/Network (Dr. Schneeweiss) Methodologic issues in benchmarking physician.
More about Correlations. Spearman Rank order correlation Does the same type of analysis as a Pearson r but with data that only represents order. –Ordinal.
Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of Suppose.
Analysis of Individual Variables Descriptive – –Measures of Central Tendency Mean – Average score of distribution (1 st moment) Median – Middle score (50.
Today Concepts underlying inferential statistics
1 BA 555 Practical Business Analysis Review of Statistics Confidence Interval Estimation Hypothesis Testing Linear Regression Analysis Introduction Case.
Stat 112: Lecture 9 Notes Homework 3: Due next Thursday
Correlation and Regression Analysis
Foster Care Reunification: The use of hierarchical modeling to account for sibling and county correlation Emily Putnam-Hornstein, MSW Center for Social.
Analysis of Clustered and Longitudinal Data
Chapter 12 Inferential Statistics Gay, Mills, and Airasian
Introduction to Multilevel Modeling Using SPSS
Introduction to Linear Regression and Correlation Analysis
Chapter 11 Simple Regression
MAT 254 – Probability and Statistics Sections 1,2 & Spring.
Sampling and Nested Data in Practice- Based Research Stephen Zyzanski, PhD Department of Family Medicine Case Western Reserve University School of Medicine.
Montecarlo Simulation LAB NOV ECON Montecarlo Simulations Monte Carlo simulation is a method of analysis based on artificially recreating.
Slide 1 Estimating Performance Below the National Level Applying Simulation Methods to TIMSS Fourth Annual IES Research Conference Dan Sherman, Ph.D. American.
Introduction Multilevel Analysis
Why Is It There? Getting Started with Geographic Information Systems Chapter 6.
Introduction to Linear Regression
Tests and Measurements Intersession 2006.
Testing Hypotheses about Differences among Several Means.
Introduction to Multilevel Modeling Stephen R. Porter Associate Professor Dept. of Educational Leadership and Policy Studies Iowa State University Lagomarcino.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
The Campbell Collaborationwww.campbellcollaboration.org C2 Training: May 9 – 10, 2011 Introduction to meta-analysis.
Multilevel Modeling Software Wayne Osgood Crime, Law & Justice Program Department of Sociology.
Regression. Types of Linear Regression Model Ordinary Least Square Model (OLS) –Minimize the residuals about the regression linear –Most commonly used.
Right Hand Side (Independent) Variables Ciaran S. Phibbs June 6, 2012.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 13-1 Introduction to Regression Analysis Regression analysis is used.
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
Chapter 10 For Explaining Psychological Statistics, 4th ed. by B. Cohen 1 A perfect correlation implies the ability to predict one score from another perfectly.
Introduction to Secondary Data Analysis Young Ik Cho, PhD Research Associate Professor Survey Research Laboratory University of Illinois at Chicago Fall,
Right Hand Side (Independent) Variables Ciaran S. Phibbs.
Evaluating Risk Adjustment Models Andy Bindman MD Department of Medicine, Epidemiology and Biostatistics.
Single-Factor Studies KNNL – Chapter 16. Single-Factor Models Independent Variable can be qualitative or quantitative If Quantitative, we typically assume.
Data Analysis in Practice- Based Research Stephen Zyzanski, PhD Department of Family Medicine Case Western Reserve University School of Medicine October.
General Linear Model.
Term 4, 2006BIO656--Multilevel Models 1 PART 07 Evaluating Hospital Performance.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 11: Models Marshall University Genomics Core Facility.
Sampling and Nested Data in Practice-Based Research Stephen Zyzanski, PhD Department of Family Medicine Case Western Reserve University School of Medicine.
Analysis of Experiments
1 Statistics 262: Intermediate Biostatistics Regression Models for longitudinal data: Mixed Models.
Chapter 8 Relationships Among Variables. Outline What correlational research investigates Understanding the nature of correlation What the coefficient.
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
Bivariate Regression. Bivariate Regression analyzes the relationship between two variables. Bivariate Regression analyzes the relationship between two.
Correlation & Simple Linear Regression Chung-Yi Li, PhD Dept. of Public Health, College of Med. NCKU 1.
Multiple Regression.
Basic Estimation Techniques
Basic Estimation Techniques
Multiple Regression.
ELEMENTS OF HIERARCHICAL REGRESSION LINEAR MODELS
Simple Linear Regression
Presentation transcript:

Multilevel Data in Outcomes Research Types of multilevel data common in outcomes research Random versus fixed effects Statistical Model Choices “Shrinkage Estimates” versus Fixed Effects Example of CA State CABG data

What are multilevel data? Gathering individual observations into larger groups does not create clustered data –Individual observations from a simple, random sample are never multilevel Multilevels are a result of sampling/design – Usually from stages/levels in obtaining the individual units of observation – Repeated measures is a type of multilevel data

Other Names for Multilevel Data Hierarchical models Clustered data (but different from cluster analysis) Components of Variance models Contextual Models Micro and macro level data

Multilevel Data in Outcomes Research Two levels: –Hospitals and patients –Physicians and patients Three levels: –Hospitals, physicians, and patients –Physicians, patients, and repeated measures Four levels: –National Health Interview Survey

National Health Interview Survey Highest level: Select Primary Sampling Units (MSA’s, counties, groups of counties) Next level: Stratify PSU’s by Census blocks and select Secondary Sampling Units (clusters of households) Next level: Select Households within SSU’s Lowest level: Interview individuals in the households (some all, others a sample)

Characteristics of Multilevel Data Measurements within level are correlated (eg, measures on same person are more alike than measurements across persons) Variables can be measured at each level Standard statistical models and tests are incorrect The variance of the outcome can be attributed to each level

Two Parts of Multilevel Data Variance Outcome = Patient Satisfaction Score Variance in the patient score divides into two parts: (1) the variance between physicans =  2 B (2) the variance within the physicians =  2 W So the total variance =  2 B +  2 W MD3: mean=74MD2: mean=58MD1: mean=81 Level 2: Physicians Level 1: Patients

Intraclass Correlation Coefficient (ICC) The intraclass correlation coefficient (ICC) is a measure of the correlation among the individual observations within the clusters It is calculated by the ratio of the between cluster variance to the total variance:  2 B / (  2 B +  2 W )

Intraclass Correlation Coefficient (ICC) Take extreme case where each MD’s patients have the same score = no variance within the physicians. So, ICC =  2 B /  2 B +  2 W =  2 B /  2 B + 0 = 1 = perfect correlation within the clusters. MD3: mean=74MD2: mean=58MD1: mean=

Intraclass Correlation Coefficient (ICC) A different case where each MD’s patients have very different scores = most of the variance is within the physicians (ie, between patients, not physicians). ICC is close to 0. MD3: mean=74MD2: mean=68MD1: mean=

Implications of ICC for Analysis When the ICC is close to 0, most of the variation is explained by patient level measures Less difference between results from ordinary regression and multilevel models May be less important to use a statistical model that allows variables for physician characteristics

Implications of ICC for Analysis When the ICC is close to 1, most of the variation is explained by physician level measures Using a statistical model that removes physician effects leaves little variation to explain Important to use a statistical model that allows variables for physician characteristics

Methods of Analyzing Multilevel Data 1.Regression model ignoring higher level variables 2.Regression model with an indicator variable for each level 2 unit (minus one) 3.Conditional regression model 4.Regression model with generalized estimating equations (GEE model) 5.Random or mixed effects regression model

Choice of Analysis Model: Three Main Considerations What is the research question? How many observations are there at each level of the data? How important is controlling unmeasured confounding at the higher level?

Fixed versus Random Effects Effects are random when the units are a sample of a larger population –have variation because sampled; another sample would give different data Effects are fixed if they represent all possible members of a population: –eg, male/female; treatment groups; all the regions of the U.S.

Fixed versus Random Effects Effects treated as fixed or random depending on the research question Random effects: generalize from the sample to a larger population Random effects: reduce variation due to small sample size by fitting a distribution Fixed effects: Control for unmeasured confounding at the higher level

Methods of Analyzing Multilevel Data Fixed Effects Models: - Regression model with an indicator variable for each level 2 unit (minus one) - Conditional regression model Random Effects Models: - Regression model with generalized estimating equations (GEE) - Random or mixed effects regression model

What are “shrinkage estimates”? Also called Bayesian or empiric Bayesian estimates (Iezzoni text) or Best linear unbiased prediction estimates (SAS) Can only be obtained from a random effects (not GEE) regression model Variance of the higher level variable is modeled as if from a specified distribution (usually normal, but other possible)

A Simple Random Effects Model A simple random effects model is: y ij =  +  j + e ij, where  = overall mean,  j = difference for MD, and e ij = individual error Model says there is random variation from the mean score at the level of MD’s plus variation at the level of patients Bayesian estimates are the individual  j’s obtained from the overall distribution

Example of Shrinkage Estimates In Patient Outcomes Research Team study of patient satisfaction with MD treatment for diabetes, raw mean patient scores by MD ranged from 53.4 to 87.1 The random effects shrinkage estimates of the mean patient scores by MD ranged from 60.4 to 78.6 –Random effects shrinkage estimates are closer to the overall mean

Controversy in Outcomes Research Report Cards rank hospitals or physicians Data used has at least two levels (hospitals or physicians and their patients) Controversy is over the choice of statistical model for evaluating variation at the hospital or physician level

Methods of Analyzing Hospital (or MD) Mortality Variance Ignore hospital, run ordinary regression then predict average for each hospital Remove hospital effect with indicator variables for hospitals (fixed effects model) then predict average for each hospital Run random effects regression and obtain the Bayesian/shrinkage estimates for each hospital

Shrinkage estimates and CA State CABG Data Unadjusted estimate for each hosptial is estimated as from a normal distribution More weight is given to hospitals with more CABG patients –Hospitals with smaller numbers move closer to the mean in modeling a normal distribution Estimates somewhat software dependent

Shrinkage Estimates: Software Obtaining shrinkage estimates involves some software choices –Not all software provides them –STATA by itself doesn’t provide them –Different likelihood methods of fitting models STATA add-on GLLAMM (free download) SAS –For linear outcome, PROC MIXED –For non-linear, PROC NLMIXED and GLIMMIX Some other software for multilevel data