Introduction to Regression with Measurement Error STA302: Fall/Winter 2013.

Slides:



Advertisements
Similar presentations
STA305 Spring 2014 This started with excerpts from STA2101f13
Advertisements

Introduction to Regression with Measurement Error STA431: Spring 2015.
Exploratory Factor Analysis
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Review of the Basic Logic of NHST Significance tests are used to accept or reject the null hypothesis. This is done by studying the sampling distribution.
Independent & Dependent Variables
Regression Part II One-factor ANOVA Another dummy variable coding scheme Contrasts Multiple comparisons Interactions.
Logistic Regression STA302 F 2014 See last slide for copyright information 1.
The Multiple Regression Model Prepared by Vera Tabakova, East Carolina University.
The Simple Linear Regression Model: Specification and Estimation
Class 5: Thurs., Sep. 23 Example of using regression to make predictions and understand the likely errors in the predictions: salaries of teachers and.
Simulation Modeling and Analysis Session 12 Comparing Alternative System Designs.
Probability & Statistics for Engineers & Scientists, by Walpole, Myers, Myers & Ye ~ Chapter 11 Notes Class notes for ISE 201 San Jose State University.
Simple Linear Regression Analysis
One-way Between Groups Analysis of Variance
Data Analysis Statistics. Inferential statistics.
Business Statistics - QBM117 Statistical inference for regression.
Introduction to Regression with Measurement Error STA431: Spring 2013.
1 A MONTE CARLO EXPERIMENT In the previous slideshow, we saw that the error term is responsible for the variations of b 2 around its fixed component 
Choosing Statistical Procedures
Normal Linear Model STA211/442 Fall Suggested Reading Davison’s Statistical Models, Chapter 8 The general mixed linear model is defined in Section.
Regression and Correlation Methods Judy Zhong Ph.D.
Chapter 11 Simple Regression
Normal Linear Model STA211/442 Fall Suggested Reading Faraway’s Linear Models with R, just start reading. Davison’s Statistical Models, Chapter.
Double Measurement Regression STA431: Spring 2013.
Understanding Multivariate Research Berry & Sanders.
Maximum Likelihood See Davison Ch. 4 for background and a more thorough discussion. Sometimes.
Chapter 15 Correlation and Regression
Review of Chapters 1- 5 We review some important themes from the first 5 chapters 1.Introduction Statistics- Set of methods for collecting/analyzing data.
Montecarlo Simulation LAB NOV ECON Montecarlo Simulations Monte Carlo simulation is a method of analysis based on artificially recreating.
Chapter 14 Monte Carlo Simulation Introduction Find several parameters Parameter follow the specific probability distribution Generate parameter.
Factorial ANOVA More than one categorical explanatory variable STA305 Spring 2014.
Logistic Regression STA2101/442 F 2014 See last slide for copyright information.
Linear correlation and linear regression + summary of tests
Regression Part II One-factor ANOVA Another dummy variable coding scheme Contrasts Multiple comparisons Interactions.
ANOVA Assumptions 1.Normality (sampling distribution of the mean) 2.Homogeneity of Variance 3.Independence of Observations - reason for random assignment.
Adjusted from slides attributed to Andrew Ainsworth
Experimental Psychology PSY 433 Appendix B Statistics.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
The Simple Linear Regression Model: Specification and Estimation ECON 4550 Econometrics Memorial University of Newfoundland Adapted from Vera Tabakova’s.
Confirmatory Factor Analysis Part Two STA431: Spring 2013.
Categorical Independent Variables STA302 Fall 2013.
1 B IVARIATE AND MULTIPLE REGRESSION Estratto dal Cap. 8 di: “Statistics for Marketing and Consumer Research”, M. Mazzocchi, ed. SAGE, LEZIONI IN.
STA302: Regression Analysis. Statistics Objective: To draw reasonable conclusions from noisy numerical data Entry point: Study relationships between variables.
Chapter 8: Simple Linear Regression Yang Zhenlin.
1/15/2016Marketing Research2  How do you test the covariation between two continuous variables?  Most typically:  One independent variable  and: 
Linear Correlation (12.5) In the regression analysis that we have considered so far, we assume that x is a controlled independent variable and Y is an.
Categorical Independent Variables STA302 Fall 2015.
STA302: Regression Analysis. Statistics Objective: To draw reasonable conclusions from noisy numerical data Entry point: Study relationships between variables.
The Sample Variation Method A way to select sample size This slide show is a free open source document. See the last slide for copyright information.
Logistic Regression For a binary response variable: 1=Yes, 0=No This slide show is a free open source document. See the last slide for copyright information.
Chapter 14 Introduction to Regression Analysis. Objectives Regression Analysis Uses of Regression Analysis Method of Least Squares Difference between.
Classical Test Theory Psych DeShon. Big Picture To make good decisions, you must know how much error is in the data upon which the decisions are.
Simple Linear Regression and Correlation (Continue..,) Reference: Chapter 17 of Statistics for Management and Economics, 7 th Edition, Gerald Keller. 1.
Inference about the slope parameter and correlation
The simple linear regression model and parameter estimation
Exploratory Factor Analysis
Multiple Imputation using SOLAS for Missing Data Analysis
Logistic Regression STA2101/442 F 2017
Poisson Regression STA 2101/442 Fall 2017
Interactions and Factorial ANOVA
Logistic Regression Part One
Multinomial Logit Models
Simple Linear Regression - Introduction
CHAPTER 29: Multiple Regression*
Inference about the Slope and Intercept
Review of Hypothesis Testing
Inference about the Slope and Intercept
Exploratory Factor Analysis. Factor Analysis: The Measurement Model D1D1 D8D8 D7D7 D6D6 D5D5 D4D4 D3D3 D2D2 F1F1 F2F2.
Presentation transcript:

Introduction to Regression with Measurement Error STA302: Fall/Winter 2013

Measurement Error Snack food consumption Exercise Income Cause of death Even amount of drug that reaches animal’s blood stream in an experimental study Is there anything that is not measured with error?

For categorical variables Classification error is common

Simple additive model for measurement error: Continuous case How much of the variation in the observed variable comes from variation in the quantity of interest, and how much comes from random noise?

Reliability is the squared correlation between the observed variable and the latent variable (true score). First, recall

Reliability

Reliability is the proportion of the variance in the observed variable that comes from the latent variable of interest, and not from random error.

The consequences of ignoring measurement error in the explanatory (x) variables

Measurement error in the response variable is a less serious problem: Re-parameterize Can’t know everything, but all we care about is β 1 anyway.

Measurement error in the explanatory variables True model Naïve model

True Model (More detail)

Reliabilities Reliability of W 1 is Reliability of W 2 is

Test X 2 controlling for (holding constant) X 1 That's the usual conditional model

Unconditional: Test X 2 controlling for X 1 Hold X 1 constant at fixed x 1

Controlling Type I Error Probability Type I error is to reject H 0 when it is true, and there is actually no effect or no relationship Type I error is very bad. That’s why Fisher called it an “error of the first kind.” False knowledge is worse than ignorance.

Simulation study: Use pseudo- random number generation to create data sets Simulate data from the true model with β 2 =0 Fit naïve model Test H 0 : β 2 =0 at α = 0.05 using naïve model Is H 0 rejected five percent of the time?

A Big Simulation Study (6 Factors) Sample size: n = 50, 100, 250, 500, 1000 Corr(X 1,X 2 ): ϕ 12 = 0.00, 0.25, 0.75, 0.80, 0.90 Variance in Y explained by X 1 : 0.25, 0.50, 0.75 Reliability of W 1 : 0.50, 0.75, 0.80, 0.90, 0.95 Reliability of W 2 : 0.50, 0.75, 0.80, 0.90, 0.95 Distribution of latent variables and error terms: Normal, Uniform, t, Pareto 5x5x3x5x5x4 = 7,500 treatment combinations

Within each of the 5x5x3x5x5x4 = 7,500 treatment combinations 10,000 random data sets were generated For a total of 75 million data sets All generated according to the true model, with β 2 =0 Fit naïve model, test H 0 : β 2 =0 at α = 0.05 Proportion of times H 0 is rejected is a Monte Carlo estimate of the Type I Error probability

Look at a small part of the results Both reliabilities = 0.90 Everything is normally distributed β 0 = 1, β 1 =1, β 2 =0 (H 0 is true)

Marginal Mean Type I Error Rates

Summary Ignoring measurement error in the independent variables can seriously inflate Type I error probability. The poison combination is measurement error in the variable for which you are “controlling,” and correlation between latent independent variables. If either is zero, there is no problem. Factors affecting severity of the problem are (next slide)

Factors affecting severity of the problem As the correlation between X 1 and X 2 increases, the problem gets worse. As the correlation between X 1 and Y increases, the problem gets worse. As the amount of measurement error in X 1 increases, the problem gets worse. As the amount of measurement error in X 2 increases, the problem gets less severe. As the sample size increases, the problem gets worse. Distribution of the variables does not matter much.

As the sample size increases, the problem gets worse. For a large enough sample size, no amount of measurement error in the independent variables is safe, assuming that the latent independent variables are correlated.

The problem applies to other kinds of regression, and various kinds of measurement error Logistic regression Proportional hazards regression in survival analysis Log-linear models: Test of conditional independence in the presence of classification error Median splits Even converting X 1 to ranks inflates Type I Error rate

If X 1 is randomly assigned Then it is independent of X 2 : Zero correlation. So even if an experimentally manipulated variable is measured (implemented) with error, there will be no inflation of Type I error rate. If X 2 is randomly assigned and X 1 is a covariate observed with error (very common), then again there is no correlation between X 1 and X 2, and so no inflation of Type I error rate. Measurement error may decrease the precision of experimental studies, but in terms of Type I error it creates no problems. This is good news!

Need a statistical model that includes measurement error

Copyright Information This slide show was prepared by Jerry Brunner, Department of Statistics, University of Toronto. It is licensed under a Creative Commons Attribution - ShareAlike 3.0 Unported License. Use any part of it as you like and share the result freely. These Powerpoint slides will be available from the course website: