8Assumptions on the residuals the xi 's are not random variablesthey are known with a high precisionthe ei 's have a constant variancehomoscedasticitythe ei 's are independentthe ei 's are normally distributednormality
19Residual variance by construction but The residual variance is defined bystandard error of estimate
20Example Dep Var: HPLC N: 18 Multiple R: Squared multiple R: 0.991Adjusted squared multiple R: 0.991Standard error of estimate : 8.282Effect Coefficient Std Error t P(2 Tail)CONSTANTCONCENT
21Questions How to obtain the best straight line ? Is this straight line the best curve to use ?How to use this straight line ?
22Is this model the best one to use ? Tools to check the mean model :scatterplot residuals vs fitted valuestest(s)Tools to check the variance model :scatterplot residuals vs fitted valuesProbability plot (Pplot)
23Checking the mean model scatterplot residuals vs fitted valuesstructure in the residualschange the mean modelNo structure in the residualsOK
24Checking the mean model : tests Two casesNo replicationTry a polynomial model(quadratic first)ReplicationsTest of lack of fit
25Without replication try another mean model and test the improvement Example :If the test on c is significant (c 0) then keep this modelDep Var: HPLC N: 18Multiple R: Squared multiple R: 0.991Adjusted squared multiple R: 0.991Standard error of estimate: 8.539Effect Coefficient Std Error t P(2 Tail)CONSTANTCONCENTCONCENT*CONCENT
26With replications Perform a test of lack of fit Principle : compare to Departure from linearityPure errorPrinciple : comparetoif->then change the model
27Test of lack of fit : how to do it ? Three steps1) Linear regression2) One way ANOVA3)ifthen change the model
28Test of lack of fit : example Three steps1) Linear regression2) One way ANOVADep Var: HPLC N: 18Analysis of VarianceSource Sum-of-Squares df Mean-Square F-ratio PCONCENTError3)ifWe keep the straight line
29Checking the variance model : homoscedasticity scatterplot residuals vs fitted valuesNo structure in the residualsbut heteroscedasticitychange the model (criterion)homoscedasticityOK
30What to do with heteroscedasticity ? scatterplot residuals vs fitted values :modelize the dispersion.The standard deviation of the residuals increaseswith : it increases with x
31What to do with heteroscedasticity ? Estimate again the slope and the intercept but withweights proportionnal to the variance.withand check that the weight residuals (as definedabove) are homoscedastic
32Checking the variance model : normality Expected value for normal distributionExpected value for normal distributionNo curvature :NormalityCurvature : non normalityis it so important ?
33What to do with non normality ? Try to modelize the distribution of residualsIn general, it is difficult with few observationsIf enough observations are available,the non normality does not affect too muchthe result.
34An interesting indice R² R² = square correlation coefficient= % of dispersion of the Yi's explainedby the straight line (the model)0 R² 1If R² = 1, all the ei = 0, the straight line explain all the variation of the Yi'sIf R² = 0, the slope is = 0, the straight line does not explain any variation of the Yi's
35An interesting indice R² R² and R (correlation coefficient) are not designed to measure linearity !Example :Multiple R: 0.990Squared multiple R: 0.980Adjusted squared multiple R: 0.980
36Questions How to obtain the best straight line ? Is this straight line the best curve to use ?How to use this straight line ?
37How to use this straight line ? Direct use : for a given xpredict the mean Yconstruct a confidence interval of the mean Yconstruct a prediction interval of YReverse use calibration (approximate results): for a given Ypredict the mean xconstruct a confidence interval of the mean xconstruct a prediction interval of X
49What you should no longer believe One can fit the straight line by inverting x and YIf the correlation coefficient is high, the straight line is the best modelNormality of the xi's is required to perform a regressionNormality of the ei's is essential to perform a good regression