Analysis of Mismeasured Data David Yanez Department of Biostatistics University of Washington July 5, 2005 Biost/Stat 579.

Slides:



Advertisements
Similar presentations
1 Regression as Moment Structure. 2 Regression Equation Y =  X + v Observable Variables Y z = X Moment matrix  YY  YX  =  YX  XX Moment structure.
Advertisements

11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Multiple Regression Analysis
1 1 Chapter 5: Multiple Regression 5.1 Fitting a Multiple Regression Model 5.2 Fitting a Multiple Regression Model with Interactions 5.3 Generating and.
Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.
Hypothesis Testing Steps in Hypothesis Testing:
Simulation of “forwards-backwards” multiple imputation technique in a longitudinal, clinical dataset Catherine Welch 1, Irene Petersen 1, James Carpenter.
LSU-HSC School of Public Health Biostatistics 1 Statistical Core Didactic Introduction to Biostatistics Donald E. Mercante, PhD.
Probability & Statistical Inference Lecture 9
3.2 OLS Fitted Values and Residuals -after obtaining OLS estimates, we can then obtain fitted or predicted values for y: -given our actual and predicted.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
11 Simple Linear Regression and Correlation CHAPTER OUTLINE
Linear regression models
1 Chapter 2 Simple Linear Regression Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
Introduction to Regression Analysis
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
Correcting for measurement error in nutritional epidemiology Ruth Keogh MRC Biostatistics Unit MRC Centre for Nutritional Epidemiology in Cancer Prevention.
Econ Prof. Buckles1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 1. Estimation.
Additional Topics in Regression Analysis
Statistical Analysis SC504/HS927 Spring Term 2008 Session 5: Week 20: 15 th February OLS (2): assessing goodness of fit, extension to multiple regression.
Multivariate Data Analysis Chapter 4 – Multiple Regression.
Lesson #32 Simple Linear Regression. Regression is used to model and/or predict a variable; called the dependent variable, Y; based on one or more independent.
FIN357 Li1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 1. Estimation.
Topic 3: Regression.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
C82MCP Diploma Statistics School of Psychology University of Nottingham 1 Linear Regression and Linear Prediction Predicting the score on one variable.
Chapter 7 Correlational Research Gay, Mills, and Airasian
Simple Linear Regression Analysis
Relationships Among Variables
Lecture 5 Correlation and Regression
Introduction to Multilevel Modeling Using SPSS
Inference for regression - Simple linear regression
Chapter 11 Simple Regression
Hypothesis Testing in Linear Regression Analysis
09/15/05William Wu / MS meeting1 Measurement error and measurement model with an example in dietary data.
Understanding Multivariate Research Berry & Sanders.
1 FORECASTING Regression Analysis Aslı Sencer Graduate Program in Business Information Systems.
Model Building III – Remedial Measures KNNL – Chapter 11.
Amsterdam Rehabilitation Research Center | Reade Multiple regression analysis Analysis of confounding and effectmodification Martin van de Esch, PhD.
Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,
Correlation and Regression SCATTER DIAGRAM The simplest method to assess relationship between two quantitative variables is to draw a scatter diagram.
Random Regressors and Moment Based Estimation Prepared by Vera Tabakova, East Carolina University.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
1 11 Simple Linear Regression and Correlation 11-1 Empirical Models 11-2 Simple Linear Regression 11-3 Properties of the Least Squares Estimators 11-4.
Path Analysis and Structured Linear Equations Biologists in interested in complex phenomena Entails hypothesis testing –Deriving causal linkages between.
Copyright © 2006 The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin The Two-Variable Model: Hypothesis Testing chapter seven.
Sean Canavan David Hann Oregon State University The Presence of Measurement Error in Forestry.
1 Prof. Dr. Rainer Stachuletz Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 1. Estimation.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 11: Models Marshall University Genomics Core Facility.
Assumptions of Multiple Regression 1. Form of Relationship: –linear vs nonlinear –Main effects vs interaction effects 2. All relevant variables present.
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 7: Regression.
Chapter 8 Relationships Among Variables. Outline What correlational research investigates Understanding the nature of correlation What the coefficient.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Multiple Regression Chapter 14.
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
Uses of Diagnostic Tests Screen (mammography for breast cancer) Diagnose (electrocardiogram for acute myocardial infarction) Grade (stage of cancer) Monitor.
The “Big Picture” (from Heath 1995). Simple Linear Regression.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 10: Comparing Models.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Multivariate Analysis Lec 4
More on Specification and Data Issues
LESSON 24: INFERENCES USING REGRESSION
Simple Linear Regression
Multiple Regression Chapter 14.
Simple Linear Regression
Tutorial 1: Misspecification
Clinical prediction models
Presentation transcript:

Analysis of Mismeasured Data David Yanez Department of Biostatistics University of Washington July 5, 2005 Biost/Stat 579

Outline  Background  Examples  Assessing the extent of the bias  Approaches to data analysis

Background  Loose Definition -  Measurement error is the difference between a measured value and its true value. It typically results from shortcomings in measurement processes (e.g., equipment, short-term variation, recall, human error, etc. ).  Misconceptions :  Bias due to measurement error does not diminish as the sample size increases.  Bias due to measurement error does not always lead to an attenuation to the null.

Examples  Nurses’ Health Study  Investigate the association between breast cancer and (alcohol, nutrition) intake.  Cardiovascular Health Study  Investigate the association between carotid IMT (wall thickness) and CVD risk factors (smoking status, systolic bp, diabetes).  Predicting MI risk using cholesterol, systolic bp, carotid IMT, age, gender, race, smoking status, alcohol and fat intake.  Investigate 3-year change in carotid IMT and CVD risk factors.

Example  Illustration of an additive measurement error model. Filled circles are the true (Y,X) data and the steeper line is the OLS fit to the data. The empty circles and attenuated line are the OLS fit of the observed data (Y,W).  Model: Y = X + e,e ~ (0,.25), X ~ (0, 1), W = X + U,U ~ (0, 1)

Example – Simple Regression  The above illustration is an example of attenuation bias. We have, W = X + U, U ~ (0,  u 2 ), cov(X,U) = 0.  1 * =  1  x 2 /(  x 2 +  u 2 ) =  1 ( = reliability ratio)  What if the observed variable, W, was not unbiased for X (e.g., dietary intake of saturated fat)? W =  0 +  1 X + U, U ~ (0,  u 2 ), cov(X,U)=  xu  1 * = (  1  1  x 2 +  xu )/(  1 2  x 2 +  u 2 )  Residual variances are also adversely affected.  We have Var(Y|W) > Var(Y|X).

Example – Multiple Regression  Suppose you have a situation where there is a single predictor measured with error (e.g., carotid IMT) in multiple regression.  Y =  0 +  1 X +  2 A +  3 G + e, W = X + U, U ~ (0,  u 2 ), cov(X,U) = 0.  One can show that the OLS estimates for  1,  2 and  3 will be biased, i.e.,   1 * = 1  1, where 1 =  x|A,G 2 /(  x|A,G 2 +  u 2 ),   i * =  j +  1 (1- 1 )  j, j = 2,3; where  E[X|A,G] =  0 +  2 A +  3 G,  The coefficients for age and gender will be biased unless they are uncorrelated with the true carotid IMT.

Assessing the extent of the bias Data sources:  Internal subsets of the primary data.  External or independent studies.  Validation data – subset in which X is observed directly.  Replication data – replicates of W are available.  Instrumental data – a variable T is observed in addition to W.  Internal data are preferred to external data.  Assumptions about data transportability need to be made when comparing data from different studies.  Validation data are preferred to replication or instrumental data.

Assessing the extent of the bias  Is the error model known?  Typically not, but plausible models could be used to assess the amount of error and direction of the bias.  Example: Study of association between breast cancer and alcohol intake and fat intake  Under-reporting fat or alcohol intake may reduce the amount of measurement error bias (assuming true association is positive).  Example: Study of association between STDs and number of sexual partners  Over-reporting number of partners may increase the amount of measurement error bias (assuming true association is positive).  In both the above examples, the observed association will be attenuated toward the null.

Approaches to data analysis Bias correction methods  Method of Moments:  Components of bias known (e.g.,  2 u, ): simple.  Bias components unknown: estimate bias terms compute SE estimates (bootstrap, sandwich)  Corrected estimating equations (Huang, Wang, 2000).  Related methods: Regression calibration (Carroll et al. 1995, Ch. 3)  Choice of method depends in part on type of auxiliary data available and assumptions one is willing to make.

Approaches to data analysis Sensitivity analyses:  In the absence of auxiliary data, one could specify a range of values for components of bias to see whether the significance of association changes.  Example: Association between change in carotid IMT versus age, gender, diabetes, smoking and baseline IMT.

Approaches to data analysis  Analysis of data conditional on observed variables (similar to analysis of incomplete data).  May be analysis of interest (e.g., prediction of carotid IMT)  Exercise caution in interpreting results.  Observed associations may differ greatly from associations of the unobserved variables.  Sensitivity analysis may be useful in guessing bounds on degree of association (IMT analysis).  Study designs (e.g., randomized trials) can, to some extend, remedy some ills caused by measurement error.