2DS00 Statistics 1 for Chemical Engineering Lecture 3
Week schedule Week 1: Measurement and statistics Week 2: Error propagation Week 3: Simple linear regression analysis Week 4: Multiple linear regression analysis Week 5: Nonlinear regression analysis
Detailed contents of week 3 Least Squares Method simple linear regression –parameter estimates –residuals –confidence intervals –significance test –influential points –lack-of-fit
Least Squares measurements of time and distance estimate speed (assuming constant speed)
Tijd (sec) Gemeten afstand Berekende afstand Gemeten – Berekende afstand Kwadraat Kwadratensom Table of measurements and squares
Visualisation of sums of squares
Types of regression analysis Linear means linear in coefficients, not linear functions! Simple linear regression Multiple linear regression Non-linear regression
Surface tension nitrobenzene measurements of temperature and surface tension temperature ranges from 40 to 200 o C scatter plot indicates linear relation
Regression analysis of nitrobenzene example
Confidence intervals parameter estimates: estimate +/- t 14-2;0,025 standard error predicted values (extrapolation is dangerous, most accurate predictions at mean of independent variable)
Extrapolation
Significance testing
Model: Y i = 0 + 1 x 1 + i ssumptions: the model is linear (+ enough terms) the i 's are normally distributed with =0 and constant variance 2 the i 's are independent. Simple Linear regression: model assumptions
Normality checking + independence check normality by considering residuals apply both graphical checks and Shapiro-Wilks check independence by using the Durbin – Watson test also check residuals by plotting them against time
Residuals use studentized residuals in order to obtain universal scale e versushomogeneity of variance e versuslinearity e versus timeindependence of errors e versus x i homogeneity of variance
Lack-of-fit test if multiple measurements are available, then we may test whether model may be improved significantly test is based on two different ways of computing standard deviation note difference with testing of model is significant
Influential points regression lines tend to go to remote points: see
Check-list 1. apply regression analysis 2.check whether regression is signficant. If applicable, apply lack-of-fit test 3.study residual plots for constant variance 4.check for outliers 5.check normality of residuals (graphical checks, Shapiro-Wilks) 6.check independence of residuals (residual plots, Durbin – Watson) 7.check for influential points
Causality and regression Significant regression results do not imply causal relation ! Statistical results must be explained (afterwards) by chemical theory.