Presentation is loading. Please wait.

Presentation is loading. Please wait.

Linear Regression Didier Concordet ECVPT Workshop April 2011 Ecole Nationale Vétérinaire de Toulouse Can be downloaded at

Similar presentations


Presentation on theme: "Linear Regression Didier Concordet ECVPT Workshop April 2011 Ecole Nationale Vétérinaire de Toulouse Can be downloaded at"— Presentation transcript:

1 Linear Regression Didier Concordet d.concordet@envt.fr ECVPT Workshop April 2011 Ecole Nationale Vétérinaire de Toulouse Can be downloaded at http://www.biostat.envt.fr/

2 2 An example

3 3 About the straight line Y= a + b x Y x a b>0 b<0 Y x ab=0 Y x a=0 b>0 a = interceptb = slope

4 4 Questions How to obtain the best straight line ? Is this straight line the best curve to use ? How to use this straight line ?

5 5 How to obtain the best straight line ? write a (statistical) model estimate the parameters graphical inspection of data Proceed in three main steps

6 6 Write a model A statistical model Mean model : functionnal relationship Variance model : Assumptions on the residuals

7 7 Write a model = residual (error term) Mean model

8 8 Assumptions on the residuals the x i 's are not random variables they are known with a high precision the i 's have a constant variance homoscedasticity the i 's are independent the i 's are normally distributed normality

9 9 Homoscedasticity homoscedasticity heteroscedasticity

10 10 Normality x Y

11 11 Estimate the parameters A criterion is needed to estimate parameters A statistical model A criterion

12 12 How to estimate the "best" a et b ? Intuitive criterion : minimum compensation Reasonnable criterion : minimum Least squares criterion (L.S.) Linear model Homoscedasticity Normality

13 13 The least squares criterion

14 14 Result of optimisation andchange with samples andare random variables

15 15 Balance sheet True mean straight line Estimated straight line or Mean predicted value for the i th observation i th residual

16 16 Example Dep Var: HPLC N: 18 Effect Coefficient Std Error t P(2 Tail) CONSTANT 20.046 3.682 5.444 0.000 CONCENT 2.916 0.069 42.030 0.000 Intercept Slope Estimated straight line

17 17 Example

18 18 Example

19 19 Residual variance by construction but The residual variance is defined by standard error of estimate

20 20 Example Dep Var: HPLC N: 18 Multiple R: 0.996 Squared multiple R: 0.991 Adjusted squared multiple R: 0.991 Standard error of estimate : 8.282 Effect Coefficient Std Error t P(2 Tail) CONSTANT 20.046 3.682 5.444 0.000 CONCENT 2.916 0.069 42.030 0.000

21 21 Questions How to obtain the best straight line ? Is this straight line the best curve to use ? How to use this straight line ?

22 22 Is this model the best one to use ? Tools to check the mean model : scatterplot residuals vs fitted values test(s) Tools to check the variance model : scatterplot residuals vs fitted values Probability plot (Pplot)

23 23 Checking the mean model scatterplot residuals vs fitted values 0 No structure in the residuals OK 0 structure in the residuals change the mean model

24 24 Checking the mean model : tests Two cases Replications Test of lack of fit No replication Try a polynomial model (quadratic first)

25 25 Without replication Example : try another mean model and test the improvement If the test on c is significant (c 0) then keep this model Dep Var: HPLC N: 18 Multiple R: 0.996 Squared multiple R: 0.991 Adjusted squared multiple R: 0.991 Standard error of estimate: 8.539 Effect Coefficient Std Error t P(2 Tail) CONSTANT 21.284 6.649 3.201 0.006 CONCENT 2.842 0.335 8.486 0.000 CONCENT *CONCENT 0.001 0.003 0.227 0.824

26 26 With replications Perform a test of lack of fit Principle : compareto if>then change the model- Departure from linearity Pure error

27 27 Test of lack of fit : how to do it ? Three steps 1) Linear regression 2) One way ANOVA 3) if then change the model

28 28 Test of lack of fit : example Three steps 1) Linear regression 2) One way ANOVA 3) if We keep the straight line Dep Var: HPLC N: 18 Analysis of Variance Source Sum-of-Squares df Mean-Square F-ratio P CONCENT 121251.776 5 24250.355 289.434 0.000 Error 1005.427 12 83.786

29 29 Checking the variance model : homoscedasticity scatterplot residuals vs fitted values 0 homoscedasticity OK No structure in the residuals but heteroscedasticity change the model (criterion) 0

30 30 What to do with heteroscedasticity ? scatterplot residuals vs fitted values : modelize the dispersion. 0 The standard deviation of the residuals increases with : it increases with x

31 31 What to do with heteroscedasticity ? Estimate again the slope and the intercept but with weights proportionnal to the variance. and check that the weight residuals (as defined above) are homoscedastic with

32 32 Checking the variance model : normality 0 No curvature : Normality Curvature : non normality is it so important ? 0 Expected value for normal distribution

33 33 What to do with non normality ? Try to modelize the distribution of residuals In general, it is difficult with few observations If enough observations are available, the non normality does not affect too much the result.

34 34 An interesting indice R² R² = square correlation coefficient = % of dispersion of the Y i 's explained by the straight line (the model) 0 R² 1 If R² = 1, all the i = 0, the straight line explain all the variation of the Y i 's If R² = 0, the slope is = 0, the straight line does not explain any variation of the Y i 's

35 35 An interesting indice R² R² and R (correlation coefficient) are not designed to measure linearity ! Example : Multiple R: 0.990 Squared multiple R: 0.980 Adjusted squared multiple R: 0.980

36 36 Questions How to obtain the best straight line ? Is this straight line the best curve to use ? How to use this straight line ?

37 37 How to use this straight line ? Direct use : for a given x –predict the mean Y –construct a confidence interval of the mean Y –construct a prediction interval of Y Reverse use calibration (approximate results): for a given Y –predict the mean x –construct a confidence interval of the mean x –construct a prediction interval of X

38 38 For a given x predict the mean Y Example :

39 39 Confidence interval of the mean Y There is a probability 1- that a+bx belongs to this interval

40 40 Confidence interval of the mean Y L U 30

41 41 Example

42 42 Prediction interval of Y 100(1- of the measurements carried-out for this x belongs to this interval

43 43 Prediction interval of Y L U 30

44 44 Example

45 45 Reverse use : for a given Y=y 0 predict the mean X Example :

46 46 For a given Y=y 0 a confidence interval of the mean X Y0Y0 X L U

47 47 Confidence interval of the mean X There is a probability 1- that the mean X belongs to [ L, U ] L and U are so that

48 48 Example

49 49 What you should no longer believe One can fit the straight line by inverting x and Y If the correlation coefficient is high, the straight line is the best model Normality of the i 's is essential to perform a good regression Normality of the x i 's is required to perform a regression


Download ppt "Linear Regression Didier Concordet ECVPT Workshop April 2011 Ecole Nationale Vétérinaire de Toulouse Can be downloaded at"

Similar presentations


Ads by Google