Linear Regression 1Michael Sokolov / Numerical Methods for Chemical Engineers / Linear Regression Michael Sokolov ETH Zurich, Institut für Chemie- und.

Slides:



Advertisements
Similar presentations
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Advertisements

Chapter 12 Simple Linear Regression
Linear Regression 1Daniel Baur / Numerical Methods for Chemical Engineers / Linear Regression Daniel Baur ETH Zurich, Institut für Chemie- und Bioingenieurwissenschaften.
Regression Analysis Simple Regression. y = mx + b y = a + bx.
Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
Simple Linear Regression. Start by exploring the data Construct a scatterplot  Does a linear relationship between variables exist?  Is the relationship.
Chapter 12 Simple Linear Regression
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Response Surface Method Principle Component Analysis
2DS00 Statistics 1 for Chemical Engineering Lecture 3.
Chapter 10 Simple Regression.
Chapter 12 Simple Regression
BA 555 Practical Business Analysis
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
The Simple Regression Model
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
Chapter Topics Types of Regression Models
Probability & Statistics for Engineers & Scientists, by Walpole, Myers, Myers & Ye ~ Chapter 11 Notes Class notes for ISE 201 San Jose State University.
Simple Linear Regression Analysis
Non Linear Regression Y i = f(  x i ) +  i Marco Lattuada Swiss Federal Institute of Technology - ETH Institut für Chemie und Bioingenieurwissenschaften.
Ch. 14: The Multiple Regression Model building
Analysis of Individual Variables Descriptive – –Measures of Central Tendency Mean – Average score of distribution (1 st moment) Median – Middle score (50.
Linear Regression Y i =  0 +  1 x i +  i Marco Lattuada Swiss Federal Institute of Technology - ETH Institut für Chemie und Bioingenieurwissenschaften.
This Week Continue with linear regression Begin multiple regression –Le 8.2 –C & S 9:A-E Handout: Class examples and assignment 3.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
1 BA 555 Practical Business Analysis Review of Statistics Confidence Interval Estimation Hypothesis Testing Linear Regression Analysis Introduction Case.
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Introduction to Regression Analysis, Chapter 13,
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Introduction to Linear Regression and Correlation Analysis
Regression Analysis Regression analysis is a statistical technique that is very useful for exploring the relationships between two or more variables (one.
CPE 619 Simple Linear Regression Models Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama.
Simple Linear Regression Models
© 2004 Prentice-Hall, Inc.Chap 15-1 Basic Business Statistics (9 th Edition) Chapter 15 Multiple Regression Model Building.
Chapter 14 Simple Regression
1 1 Slide © 2005 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 1 Slide © 2004 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Statistics for Business and Economics 7 th Edition Chapter 11 Simple Regression Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch.
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Chapter 5: Regression Analysis Part 1: Simple Linear Regression.
Byron Gangnes Econ 427 lecture 3 slides. Byron Gangnes A scatterplot.
Regression Analysis Week 8 DIAGNOSTIC AND REMEDIAL MEASURES Residuals The main purpose examining residuals Diagnostic for Residuals Test involving residuals.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
Simple Linear Regression (SLR)
Solution of Nonlinear Functions
Lecture 10: Correlation and Regression Model.
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Principal Component Analysis (PCA)
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
Chapter 12 Simple Linear Regression n Simple Linear Regression Model n Least Squares Method n Coefficient of Determination n Model Assumptions n Testing.
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
Stat 112 Notes 14 Assessing the assumptions of the multiple regression model and remedies when assumptions are not met (Chapter 6).
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS St. Edward’s University.
Bivariate Regression. Bivariate Regression analyzes the relationship between two variables. Bivariate Regression analyzes the relationship between two.
Nonlinear Regression 1Michael Sokolov / Numerical Methods for Chemical Engineers / Nonlinear Regression Michael Sokolov ETH Zurich, Institut für Chemie-
The simple linear regression model and parameter estimation
Lecture 11: Simple Linear Regression
Regression Analysis AGEC 784.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Slides by JOHN LOUCKS St. Edward’s University.
Stats Club Marnie Brennan
Prepared by Lee Revere and John Large
Multiple Regression Models
Regression and Correlation of Data
St. Edward’s University
Presentation transcript:

Linear Regression 1Michael Sokolov / Numerical Methods for Chemical Engineers / Linear Regression Michael Sokolov ETH Zurich, Institut für Chemie- und Bioingenieurwissenschaften ETH Hönggerberg / HCI F128 – Zürich

Linear regression model  As inputs for our model we use two vectors x and Y, where  x i is the i-th observation  Y i is the i-th response  The model reads:  At this point, we make a fundamental assumption:  As outputs from our regression we get estimated values for the regression parameters: 2Michael Sokolov / Numerical Methods for Chemical Engineers / Linear Regression The errors are mutually independent and normally distributed with mean zero and variance σ 2 : A regression is called linear if it is linear in the parameters!

The errors ε 3Michael Sokolov / Numerical Methods for Chemical Engineers / Linear Regression  Since the errors are assumed to be normally distributed, the following is true for the expectation values and variance of the model responses

Example: Boiling Temperature and Pressure 4Michael Sokolov / Numerical Methods for Chemical Engineers / Linear Regression

Parameter estimation 5Michael Sokolov / Numerical Methods for Chemical Engineers / Linear Regression  = confidence interval

Residuals 6Michael Sokolov / Numerical Methods for Chemical Engineers / Linear Regression Outlier

Removing the Outlier 7Michael Sokolov / Numerical Methods for Chemical Engineers / Linear Regression

Goodness of fit measures  Coefficient of determination  Total sum of squares  Sum of squares due to regression  Sum of squares due to error 8Michael Sokolov / Numerical Methods for Chemical Engineers / Linear Regression R 2 = 1   i = 0 R 2 = 0  regression does not explain variation of Y

The LinearModel and dataset classes  Matlab 2012 features two classes that are designed specifically for statistical analysis and linear regression  dataset  creates an object that holds data and meta-data like variable names, options for inclusion / exclusion of data points, etc.  LinearModel  is constructed from datasets or X, Y pairs (as with the regress function) and a model description  automatically does linear regression and holds all important regression outputs like parameter estimates, residuals, confidence intervals etc.  includes several useful functions like plots, residual analysis, exclusion of parameters etc. 9Michael Sokolov / Numerical Methods for Chemical Engineers / Linear Regression

Classes in Matlab  Classes define a set of properties (variables) and methods (functions) which operate on those properties  This is useful for bundling information together with ways of treating and modifying this information mdl = LinearModel.fit(X,Y);  When a class is instantiated, an object of this class is created which can be used with the methods of the class, e.g. mdl = LinearModel.fit(X,Y); mdl.Coefficients  Properties can be accessed with the dot operator, like with structs (e.g. mdl.Coefficients ) plot(mdl)mdl.plot()  Methods can be called either with the dot operator, or by having an object of the class as first input argument (e.g. plot(mdl) or mdl.plot() ) 10Michael Sokolov / Numerical Methods for Chemical Engineers / Linear Regression

Working with LinearModel and dataset  First, we define our observed and measured variables, giving them appropriate names, since these names will be used by the dataset and the LinearModel as meta-data 11Michael Sokolov / Numerical Methods for Chemical Engineers / Linear Regression

Working with LinearModel and dataset  Next, we construct the dataset from our variables 12Michael Sokolov / Numerical Methods for Chemical Engineers / Linear Regression

Working with LinearModel and dataset  After defining the relationship between our data (a model), we can use the dataset and the model to construct a LinearModel object  This will automatically fit the data, perform residual analysis and much more 13Michael Sokolov / Numerical Methods for Chemical Engineers / Linear Regression

LinearModel: Plot  Now that we have the model, we have many analysis and plotting tools at our disposal 14Michael Sokolov / Numerical Methods for Chemical Engineers / Linear Regression

Linear Model: Tukey-Anscombe Plot  Plot residuals vs. fitted values; These should be randomly distributed around 0 15Michael Sokolov / Numerical Methods for Chemical Engineers / Linear Regression Outlier?

LinearModel: Cook’s Distance  The Cook’s distance measures the effect of removing one measurement from the data 16Michael Sokolov / Numerical Methods for Chemical Engineers / Linear Regression

Linear Model: Removing the Outlier  After identifying an outlier, it can be easily removed 17Michael Sokolov / Numerical Methods for Chemical Engineers / Linear Regression

Multiple linear regression  Approximate model  Residuals  Least squares 18Michael Sokolov / Numerical Methods for Chemical Engineers / Linear Regression

Assignment 1  The data file asphalt.dat (online), contains data from a degradation experiment for different concrete mixtures [1]  The rutting (erosion) in inches per million cars (RUT) is measured as a function of  viscosity (VISC)  percentage of asphalt in the surface course (ASPH)  percentage of asphalt in the base course (BASE)  an operating mode 0 or 1 (RUN)  percentage (*10) of fines in the surface course (FINES)  percentage of voids in the surface course (VOIDS) 19Michael Sokolov / Numerical Methods for Chemical Engineers / Linear Regression [1] R.V. Hogg and J. Ledolter, Applied Statistics for Engineers and Physical Scientists, Maxwell Macmillan International Editions, 1992, p.393.

Assignment 1 (Continued) 1.Find online the file readVars.m that will read the data file and assign the variables RUT, VISC, ASPH, BASE, RUN, FINES and VOIDS; You can copy and paste this script into your own file. 2.Create a dataset using the variables from 1. 3.Set the RUN variable to be a discrete variable ds.RUN = nominal(ds.RUN);  Assuming your dataset is called ds, use ds.RUN = nominal(ds.RUN); 4.Create a modelspec string  To include multiple variables in the modelspec, use the plus sign  How many dependent and independent variables does you problem contain? mdl1LinearModel.fit 5.Fit your model ( mdl1 ) using LinearModel.fit, display the model output and plot the model. 20Michael Sokolov / Numerical Methods for Chemical Engineers / Linear Regression

Assignment 1 (Continued) 6.Which variables most likely have the largest influence? 7.Generate the Tukey-Anscombe plot. Is there any indication of nonlinearity, non-constant variance or of a skewed distribution of residuals? plotAllResponses 8.Plot the adjusted responses for each variable, using the plotAllResponses function you can find online. What do you observe? 9.Try and transform the system by defining  logRUT = log10(RUT); logVISC = log10(VISC); 10.Define a new dataset and modelspec using the transformed variables. 21Michael Sokolov / Numerical Methods for Chemical Engineers / Linear Regression

Assignment 1 (Continued) 11.Fit a new model with the transformed variables and repeat the analysis from before (steps 6.-8.). step mdl3 = step(mdl2, 'nsteps', 20); 12.With the new model, try to remove variables that have a small influence. To do this systematically, use the function step, which will remove and/or add variables one at a time: mdl3 = step(mdl2, 'nsteps', 20);  Which variables have been removed and which of the remaining ones most likely have the largest influence?  Do you think variable removal is helpful to improve general conclusions (in other words avoid overfitting)?  How could you compare the quality of the three models? Is the root mean squared error of help?  How could you determine SST, SSR and SSE of your models (at least 2 options)?  How could you improve the models? Think about synergic effects. 22Michael Sokolov / Numerical Methods for Chemical Engineers / Linear Regression