Evaluating Theoretical Models R-squared represents the proportion of the variance in Y that is accounted for by the model. When the model doesn’t do any.

Slides:



Advertisements
Similar presentations
Objectives 10.1 Simple linear regression
Advertisements

Some terminology When the relation between variables are expressed in this manner, we call the relevant equation(s) mathematical models The intercept and.
3.2 OLS Fitted Values and Residuals -after obtaining OLS estimates, we can then obtain fitted or predicted values for y: -given our actual and predicted.
Review of the Basic Logic of NHST Significance tests are used to accept or reject the null hypothesis. This is done by studying the sampling distribution.
Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
Statistics Measures of Regression and Prediction Intervals.
 Coefficient of Determination Section 4.3 Alan Craig
Regression, Correlation. Research Theoretical empirical Usually combination of the two.
Ch.6 Simple Linear Regression: Continued
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Simple Regression. Major Questions Given an economic model involving a relationship between two economic variables, how do we go about specifying the.
Linear Regression and Binary Variables The independent variable does not necessarily need to be continuous. If the independent variable is binary (e.g.,
Turning Point At the beginning of the course, we discussed three ways in which mathematics and statistics can be used to facilitate psychological science.
Regression Analysis Using Excel. Econometrics Econometrics is simply the statistical analysis of economic phenomena Here, we just summarize some of the.
Understanding the General Linear Model
1 Lecture 2: ANOVA, Prediction, Assumptions and Properties Graduate School Social Science Statistics II Gwilym Pryce
Part 1 Cross Sectional Data
From last time….. Basic Biostats Topics Summary Statistics –mean, median, mode –standard deviation, standard error Confidence Intervals Hypothesis Tests.
LINEAR REGRESSION: Evaluating Regression Models. Overview Standard Error of the Estimate Goodness of Fit Coefficient of Determination Regression Coefficients.
9. SIMPLE LINEAR REGESSION AND CORRELATION
Econ Prof. Buckles1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 1. Estimation.
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Lecture 19 Simple linear regression (Review, 18.5, 18.8)
Linear Regression 2 Sociology 5811 Lecture 21 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
Introduction to Regression Analysis, Chapter 13,
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Lecture 16 Correlation and Coefficient of Correlation
Lecture 15 Basics of Regression Analysis
February  Study & Abstract StudyAbstract  Graphic presentation of data. Graphic presentation of data.  Statistical Analyses Statistical Analyses.
The Chi-Square Distribution 1. The student will be able to  Perform a Goodness of Fit hypothesis test  Perform a Test of Independence hypothesis test.
Introduction to Linear Regression and Correlation Analysis
EDUC 200C Friday, October 26, Goals for today Homework Midterm exam Null Hypothesis Sampling distributions Hypothesis testing Mid-quarter evaluations.
Ms. Khatijahhusna Abd Rani School of Electrical System Engineering Sem II 2014/2015.
Basic linear regression and multiple regression Psych Fraley.
Lecture 22 Dustin Lueker.  The sample mean of the difference scores is an estimator for the difference between the population means  We can now use.
Ch4 Describing Relationships Between Variables. Pressure.
Regression Analysis. Scatter plots Regression analysis requires interval and ratio-level data. To see if your data fits the models of regression, it is.
Business Research Methods William G. Zikmund Chapter 23 Bivariate Analysis: Measures of Associations.
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
BIOL 582 Lecture Set 11 Bivariate Data Correlation Regression.
Ch4 Describing Relationships Between Variables. Section 4.1: Fitting a Line by Least Squares Often we want to fit a straight line to data. For example.
Testing Hypotheses about Differences among Several Means.
Multiple Linear Regression. Purpose To analyze the relationship between a single dependent variable and several independent variables.
1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
Linear Trend Lines = b 0 + b 1 X t Where is the dependent variable being forecasted X t is the independent variable being used to explain Y. In Linear.
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
CROSS-VALIDATION AND MODEL SELECTION Many Slides are from: Dr. Thomas Jensen -Expedia.com and Prof. Olga Veksler - CS Learning and Computer Vision.
11/23/2015Slide 1 Using a combination of tables and plots from SPSS plus spreadsheets from Excel, we will show the linkage between correlation and linear.
Statistics for Business and Economics 8 th Edition Chapter 11 Simple Regression Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Ch.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
Copyright © 2014 by McGraw-Hill Higher Education. All rights reserved. Essentials of Business Statistics: Communicating with Numbers By Sanjiv Jaggia and.
6. Simple Regression and OLS Estimation Chapter 6 will expand on concepts introduced in Chapter 5 to cover the following: 1) Estimating parameters using.
Regression Analysis: Part 2 Inference Dummies / Interactions Multicollinearity / Heteroscedasticity Residual Analysis / Outliers.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 3 Association: Contingency, Correlation, and Regression Section 3.3 Predicting the Outcome.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 11: Models Marshall University Genomics Core Facility.
Answering Descriptive Questions in Multivariate Research When we are studying more than one variable, we are typically asking one (or more) of the following.
Significance Tests for Regression Analysis. A. Testing the Significance of Regression Models The first important significance test is for the regression.
Regression Analysis: A statistical procedure used to find relations among a set of variables B. Klinkenberg G
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 10: Comparing Models.
The simple linear regression model and parameter estimation
Statistics 200 Lecture #6 Thursday, September 8, 2016
Inference for Regression
Regression Analysis.
CHAPTER 3 Describing Relationships
Correlation 10/27.
Correlation and Simple Linear Regression
Regression Analysis PhD Course.
Correlation and Simple Linear Regression
F test for Lack of Fit The lack of fit test..
Presentation transcript:

Evaluating Theoretical Models R-squared represents the proportion of the variance in Y that is accounted for by the model. When the model doesn’t do any better than guessing the mean (i.e., if we assume X does not cause Y), R 2 will equal zero. When the model is perfect (i.e., it accounts for the data perfectly), R 2 will equal 1.00.

Why is R 2 useful? R 2 is useful because it is a standard metric for interpreting model fit. –It doesn’t matter how large the variance of Y is because everything is evaluated relative to the variance of Y –Set end-points: 1 is perfect and 0 is as bad as a model can be.

Why is R 2 useful? Finally, and importantly, we can begin to compare the relative fit of alternative models Why is this useful? When we began our discussion of modeling, we noted that there are ways to estimate parameter values, assuming the basic model is correct. Now, we can begin to address the question of whether the basic model is correct (or, more specifically, how good it is) by studying the model’s R 2 and comparing it to the R 2 of competing models.

Example Data Person x y

Model with no x The most basic model we can study is one in which Y- hat = M y Recall, that the predicted values yield a horizontal line centered at the mean of Y (-4 in this example)

Model with no x The variance of Y is 18 (rounded) The dotted lines here represent the error in prediction If we square these errors, we find the average squared error to be approximately 18 Thus, R 2 for this model is 1- (18/18) or 0.

Model with linear term Next, let’s see what happens if we study a linear model of form Y-hat = a + bX The average squared error in this example is R 2 is.44 (1 – (10/18)). The linear model accounts for 44% of the variance in Y.

Model with a quadratic term Next, let’s see what happens if we study a model of form Y-hat = a + bX 2 The average squared error in this example is approximately 8. R 2 is.55 (1 – (8/18)). The quadratic model accounts for 55% of the variance in Y (11% more than the linear model).

Model with linear and quadratic terms Next, let’s see what happens if we study a linear + quad model of form Y-hat = a + bX + cX 2 The average squared error in this case is about.10. R 2 is.99 (1 – (.10/18)). The linear + quadratic model accounts for 99% of the variance in Y (44% more than the quadratic model alone).

Summary of model comparisons Summary of the fit statistics for the various models ModelR 2 No X.00 Linear.44 Quadratic.55 Linear + Quad.99

Summary So, it looks like the model that combines the linear and the quadratic terms is the best model, of the four that we studied. It accounts for the data almost perfectly (99% of the variance in Y was explained by the model) Note: Even if the model does a decent job at explaining the variation in Y, it isn’t proper to conclude that it is correct. It might be the best model of those that were articulated, even if it is not literally correct.

Residual term The part of Y that is unexplained by the model is called residual or error variance, and is often represented an an explicit variable in the model. This variable is often called the residual or error term, and is typically denoted by the Greek symbol epsilon or the Roman letter E. The variance of the residual scores is identical to the proportion of variance in Y that is unexplained by the model. If the model is good, the residual variance will be very small.

Residual Term DATA = MODEL + RESIDUAL

Y=a+ bX+ bX 2 + E A  -22  B  -12  C  02  D  12  E  22 22 222 E is approximate

In the next class we will discuss three reasons why the error variance is greater than zero. –errors of measurement –sampling error –incorrect model