8 Assumptions on the residuals the xi 's are not random variablesthey are known with a high precisionthe ei 's have a constant variancehomoscedasticitythe ei 's are independentthe ei 's are normally distributednormality
19 Residual variance by construction but The residual variance is defined bystandard error of estimate
20 Example Dep Var: HPLC N: 18 Multiple R: Squared multiple R: 0.991Adjusted squared multiple R: 0.991Standard error of estimate : 8.282Effect Coefficient Std Error t P(2 Tail)CONSTANTCONCENT
21 Questions How to obtain the best straight line ? Is this straight line the best curve to use ?How to use this straight line ?
22 Is this model the best one to use ? Tools to check the mean model :scatterplot residuals vs fitted valuestest(s)Tools to check the variance model :scatterplot residuals vs fitted valuesProbability plot (Pplot)
23 Checking the mean model scatterplot residuals vs fitted valuesstructure in the residualschange the mean modelNo structure in the residualsOK
24 Checking the mean model : tests Two casesNo replicationTry a polynomial model(quadratic first)ReplicationsTest of lack of fit
25 Without replication try another mean model and test the improvement Example :If the test on c is significant (c 0) then keep this modelDep Var: HPLC N: 18Multiple R: Squared multiple R: 0.991Adjusted squared multiple R: 0.991Standard error of estimate: 8.539Effect Coefficient Std Error t P(2 Tail)CONSTANTCONCENTCONCENT*CONCENT
26 With replications Perform a test of lack of fit Principle : compare to Departure from linearityPure errorPrinciple : comparetoif->then change the model
27 Test of lack of fit : how to do it ? Three steps1) Linear regression2) One way ANOVA3)ifthen change the model
28 Test of lack of fit : example Three steps1) Linear regression2) One way ANOVADep Var: HPLC N: 18Analysis of VarianceSource Sum-of-Squares df Mean-Square F-ratio PCONCENTError3)ifWe keep the straight line
29 Checking the variance model : homoscedasticity scatterplot residuals vs fitted valuesNo structure in the residualsbut heteroscedasticitychange the model (criterion)homoscedasticityOK
30 What to do with heteroscedasticity ? scatterplot residuals vs fitted values :modelize the dispersion.The standard deviation of the residuals increaseswith : it increases with x
31 What to do with heteroscedasticity ? Estimate again the slope and the intercept but withweights proportionnal to the variance.withand check that the weight residuals (as definedabove) are homoscedastic
32 Checking the variance model : normality Expected value for normal distributionExpected value for normal distributionNo curvature :NormalityCurvature : non normalityis it so important ?
33 What to do with non normality ? Try to modelize the distribution of residualsIn general, it is difficult with few observationsIf enough observations are available,the non normality does not affect too muchthe result.
34 An interesting indice R² R² = square correlation coefficient= % of dispersion of the Yi's explainedby the straight line (the model)0 R² 1If R² = 1, all the ei = 0, the straight line explain all the variation of the Yi'sIf R² = 0, the slope is = 0, the straight line does not explain any variation of the Yi's
35 An interesting indice R² R² and R (correlation coefficient) are not designed to measure linearity !Example :Multiple R: 0.990Squared multiple R: 0.980Adjusted squared multiple R: 0.980
36 Questions How to obtain the best straight line ? Is this straight line the best curve to use ?How to use this straight line ?
37 How to use this straight line ? Direct use : for a given xpredict the mean Yconstruct a confidence interval of the mean Yconstruct a prediction interval of YReverse use calibration (approximate results): for a given Ypredict the mean xconstruct a confidence interval of the mean xconstruct a prediction interval of X
49 What you should no longer believe One can fit the straight line by inverting x and YIf the correlation coefficient is high, the straight line is the best modelNormality of the xi's is required to perform a regressionNormality of the ei's is essential to perform a good regression