A Short Introduction to Curve Fitting and Regression by Brad Morantz

Slides:



Advertisements
Similar presentations
Managerial Economics in a Global Economy
Advertisements

Chapter 12 Simple Linear Regression
Forecasting Using the Simple Linear Regression Model and Correlation
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Linear regression models
Simple Linear Regression and Correlation
Regression Analysis Using Excel. Econometrics Econometrics is simply the statistical analysis of economic phenomena Here, we just summarize some of the.
1-1 Regression Models  Population Deterministic Regression Model Y i =  0 +  1 X i u Y i only depends on the value of X i and no other factor can affect.
Chapter 10 Simple Regression.
Statistics for Business and Economics
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
The Simple Regression Model
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.
Pengujian Parameter Koefisien Korelasi Pertemuan 04 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Chapter Topics Types of Regression Models
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
Pertemua 19 Regresi Linier
Chapter 7 Forecasting with Simple Regression
Ordinary Least Squares
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
Regression and Correlation Methods Judy Zhong Ph.D.
Introduction to Linear Regression and Correlation Analysis
Chapter 11 Simple Regression
CPE 619 Simple Linear Regression Models Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama.
Simple Linear Regression Models
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
1 FORECASTING Regression Analysis Aslı Sencer Graduate Program in Business Information Systems.
Statistics for Business and Economics Chapter 10 Simple Linear Regression.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
Managerial Economics Demand Estimation. Scatter Diagram Regression Analysis.
© 2003 Prentice-Hall, Inc.Chap 13-1 Basic Business Statistics (9 th Edition) Chapter 13 Simple Linear Regression.
© 2001 Prentice-Hall, Inc. Statistics for Business and Economics Simple Linear Regression Chapter 10.
Introduction to Linear Regression
Examining Relationships in Quantitative Research
Go to Table of Content Single Variable Regression Farrokh Alemi, Ph.D. Kashif Haqqi M.D.
Summary of introduced statistical terms and concepts mean Variance & standard deviation covariance & correlation Describes/measures average conditions.
Regression Regression relationship = trend + scatter
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 13-1 Introduction to Regression Analysis Regression analysis is used.
LECTURE 9 Tuesday, 24 FEBRUARY STA291 Fall Administrative 4.2 Measures of Variation (Empirical Rule) 4.4 Measures of Linear Relationship Suggested.
KNN Ch. 3 Diagnostics and Remedial Measures Applied Regression Analysis BUSI 6220.
Curve Fitting Introduction Least-Squares Regression Linear Regression Polynomial Regression Multiple Linear Regression Today’s class Numerical Methods.
Simple Linear Regression The Coefficients of Correlation and Determination Two Quantitative Variables x variable – independent variable or explanatory.
ESTIMATION METHODS We know how to calculate confidence intervals for estimates of  and  2 Now, we need procedures to calculate  and  2, themselves.
Chapter 14 Introduction to Regression Analysis. Objectives Regression Analysis Uses of Regression Analysis Method of Least Squares Difference between.
Correlation and Regression Basic Concepts. An Example We can hypothesize that the value of a house increases as its size increases. Said differently,
1 AAEC 4302 ADVANCED STATISTICAL METHODS IN AGRICULTURAL RESEARCH Part II: Theory and Estimation of Regression Models Chapter 5: Simple Regression Theory.
STA302/1001 week 11 Regression Models - Introduction In regression models, two types of variables that are studied:  A dependent variable, Y, also called.
Estimating standard error using bootstrap
The simple linear regression model and parameter estimation
Chapter 4: Basic Estimation Techniques
Chapter 4 Basic Estimation Techniques
Regression Analysis AGEC 784.
LECTURE 13 Thursday, 8th October
Correlation and Simple Linear Regression
Basic Estimation Techniques
3.1 Examples of Demand Functions
Correlation and Simple Linear Regression
Basic Estimation Techniques
CHAPTER 29: Multiple Regression*
Regression Models - Introduction
Correlation and Simple Linear Regression
Simple Linear Regression
Least-Squares Regression
Simple Linear Regression and Correlation
Microeconometric Modeling
Regression Models - Introduction
Presentation transcript:

A Short Introduction to Curve Fitting and Regression by Brad Morantz

Overview What can regression do for me? Example Model building Error Metrics OLS Regression Robust Regression Correlation Regression Table Limitations Another method - Maximum Likelihood

Example You have the mass, payload mass, and distance of a number of missiles. You need an equation for maximum distance based upon mass of missile & payload Regression will let you build a model, based upon these observations that will be of the form Max distance = B1 * total mass + B2 * payload mass + constant With this equation, you can answer questions

Model Building Why build a model? Two ways Regression is the latter It can help us to forecast Can assist in design Two ways Causal factors Data driven Regression is the latter Model based upon observations

MSE Residual Difference between model and actual e = Y - Ŷ If we summed them, the negative and positive would cancel out If we took Absolute value, would not be differentiable about origin So we take sum of squares MSE is mean of sum of squares Plot of values Line is the model Dots are the actual Wait a second, mean of sum of square errors. Sounds a lot like mean of sum of squared differences. MSE is an estimator of the variance, if we assume that the model line is the estimate of the central value. MSE is a good way to see how well a line fits a set of points. Good metric for quality of fit.

RMSE RMSE = sqrt(MSE) If we accept that MSE is estimator of variance, then RMSE is estimator of standard deviation. It is also a metric of how much error there is, or how well a model fits a set of data MSE and RMSE are often used as cost functions in many mathematical operations Cost function is function that program tries to optimize in solving the problem. So in this case, it tries to minimize error as the goal of the operation and builds a model that has the smallest MSE.

OLS Regression Ordinary Least Squares Yhat = B0 + B1X1 + . . . . + BnXn + e B0 is the Y intercept Bn is the coefficient for each variate Xn is the variate e is the error The program calculates the coefficients to produce a line/model with the least amount of squared error (MSE) Depending on what the results are, some experts do transformations on a particular variate, like maybe taking the square or the log, or square root, or such, and using that as the variate. It is now harder to interpret the results, because the reverse transformation has to be done.

Robust Regression Based on work by Kaufmann & Rousseuw Uses median instead of mean Not standard practice No automatic routines to do it Is less affected by outliers The big advantage is that it is not affected by outliers/extreme values. May sometimes create a better model, especially with smaller data sets where the outlier tends to ‘pull’ the mean.

Simple Test of Model Plot out the residuals Look at this plot Are the residuals approximately constant? Or do you see a trend that they are growing or attenuating? (heteroscedasticity) If it is this latter situation, the causal factors are changing and the model will need to be modified If they are approximately constant over the range that you will be working with, then it is OK. You should also look to see that there is about as much error on one side of the line as the other, otherwise it is a biased model

Linear Correlation Coefficient Shows the relationship between two variables Positive means that they travel in the same direction Negative means that they go in opposite directions Like covariance, but it has no scale Can use for comparisons Range is from -1 to +1 Zero means no relationship, or independent Rxx = XTX (which becomes easy to implement in a matrix language like Fortran or Matlab) This is unscaled so it can be used to compare different things. Covariance is scaled, so can not be used for comparing different models

Correlation and Dependence If two items are correlated, then r ≠ 0 They are not independent If they are independent, then r = 0 P(a) = P(a|b); b is of no effect on a Need to have some theory to support above Old standard subject in regression class is seeming correlation between skirt length and stock market, yet there is NO theory to connect the two. The correlation matrix is (should be) similar about the diagonal as correlation between X1 and Y1 should be the same as between Y1 and X1

R2 Pearsonian correlation coefficient Square of r, the correlation coefficient Fraction of the variability in the system that is explained by this model Real world is usually 10% to 70% Adjusted R2 Only use for model building Penalizes for additional causal variables This tells us how well the model explains the data and how good the model is. The adjusted R square is because typically the more factors that are put into the model, the better the fit. But that can sometimes cause problems, so the adjusted one penalizes for additional variables and can be used while trying to add variables, as in step-wise regression.

Reading a Regression Table This residual plot looks acceptable About as much on top as on bottom The R Square is very good, so is the adjusted R Square This model is very explanatory This regression needs to get rerun, with the intercept turned off and forcing the line through the origin This regression had but one independent variable, but could have had more. This also shows the confidence interval for the coefficients. Notice that the intercept has a low t statistic and the interval INCLUDES zero. That tells us that we need to rerun the program with intercept forced thru zero The model is good as the F value is >> 3 and the significance is << than 0.05 X is a good coefficient, as it is statistically significant, but the intercept is not with a P of 0.68 and a t <3

Limitations of OLS Built on the assumption of linear relationship Error grows when non-linear Can use variable transformation Harder to interpret Limited to one dependent variable Limited to range of data, can not accurately interpolate out of it

Maximum Likelihood Can calculate regression coefficients If probability distribution of error terms is available Calculates the minimum variance unbiased estimators Calculates the best way to fit a mathematical model to the data Choose the estimate for unknown population parameter which would maximize the probability of getting what we have