Download presentation

Presentation is loading. Please wait.

Published byLandon Fraser Modified over 3 years ago

1
Linear Regression Didier Concordet d.concordet@envt.fr ECVPT Workshop April 2011 Ecole Nationale Vétérinaire de Toulouse Can be downloaded at http://www.biostat.envt.fr/

2
2 An example

3
3 About the straight line Y= a + b x Y x a b>0 b<0 Y x ab=0 Y x a=0 b>0 a = interceptb = slope

4
4 Questions How to obtain the best straight line ? Is this straight line the best curve to use ? How to use this straight line ?

5
5 How to obtain the best straight line ? write a (statistical) model estimate the parameters graphical inspection of data Proceed in three main steps

6
6 Write a model A statistical model Mean model : functionnal relationship Variance model : Assumptions on the residuals

7
7 Write a model = residual (error term) Mean model

8
8 Assumptions on the residuals the x i 's are not random variables they are known with a high precision the i 's have a constant variance homoscedasticity the i 's are independent the i 's are normally distributed normality

9
9 Homoscedasticity homoscedasticity heteroscedasticity

10
10 Normality x Y

11
11 Estimate the parameters A criterion is needed to estimate parameters A statistical model A criterion

12
12 How to estimate the "best" a et b ? Intuitive criterion : minimum compensation Reasonnable criterion : minimum Least squares criterion (L.S.) Linear model Homoscedasticity Normality

13
13 The least squares criterion

14
14 Result of optimisation andchange with samples andare random variables

15
15 Balance sheet True mean straight line Estimated straight line or Mean predicted value for the i th observation i th residual

16
16 Example Dep Var: HPLC N: 18 Effect Coefficient Std Error t P(2 Tail) CONSTANT 20.046 3.682 5.444 0.000 CONCENT 2.916 0.069 42.030 0.000 Intercept Slope Estimated straight line

17
17 Example

18
18 Example

19
19 Residual variance by construction but The residual variance is defined by standard error of estimate

20
20 Example Dep Var: HPLC N: 18 Multiple R: 0.996 Squared multiple R: 0.991 Adjusted squared multiple R: 0.991 Standard error of estimate : 8.282 Effect Coefficient Std Error t P(2 Tail) CONSTANT 20.046 3.682 5.444 0.000 CONCENT 2.916 0.069 42.030 0.000

21
21 Questions How to obtain the best straight line ? Is this straight line the best curve to use ? How to use this straight line ?

22
22 Is this model the best one to use ? Tools to check the mean model : scatterplot residuals vs fitted values test(s) Tools to check the variance model : scatterplot residuals vs fitted values Probability plot (Pplot)

23
23 Checking the mean model scatterplot residuals vs fitted values 0 No structure in the residuals OK 0 structure in the residuals change the mean model

24
24 Checking the mean model : tests Two cases Replications Test of lack of fit No replication Try a polynomial model (quadratic first)

25
25 Without replication Example : try another mean model and test the improvement If the test on c is significant (c 0) then keep this model Dep Var: HPLC N: 18 Multiple R: 0.996 Squared multiple R: 0.991 Adjusted squared multiple R: 0.991 Standard error of estimate: 8.539 Effect Coefficient Std Error t P(2 Tail) CONSTANT 21.284 6.649 3.201 0.006 CONCENT 2.842 0.335 8.486 0.000 CONCENT *CONCENT 0.001 0.003 0.227 0.824

26
26 With replications Perform a test of lack of fit Principle : compareto if>then change the model- Departure from linearity Pure error

27
27 Test of lack of fit : how to do it ? Three steps 1) Linear regression 2) One way ANOVA 3) if then change the model

28
28 Test of lack of fit : example Three steps 1) Linear regression 2) One way ANOVA 3) if We keep the straight line Dep Var: HPLC N: 18 Analysis of Variance Source Sum-of-Squares df Mean-Square F-ratio P CONCENT 121251.776 5 24250.355 289.434 0.000 Error 1005.427 12 83.786

29
29 Checking the variance model : homoscedasticity scatterplot residuals vs fitted values 0 homoscedasticity OK No structure in the residuals but heteroscedasticity change the model (criterion) 0

30
30 What to do with heteroscedasticity ? scatterplot residuals vs fitted values : modelize the dispersion. 0 The standard deviation of the residuals increases with : it increases with x

31
31 What to do with heteroscedasticity ? Estimate again the slope and the intercept but with weights proportionnal to the variance. and check that the weight residuals (as defined above) are homoscedastic with

32
32 Checking the variance model : normality 0 No curvature : Normality Curvature : non normality is it so important ? 0 Expected value for normal distribution

33
33 What to do with non normality ? Try to modelize the distribution of residuals In general, it is difficult with few observations If enough observations are available, the non normality does not affect too much the result.

34
34 An interesting indice R² R² = square correlation coefficient = % of dispersion of the Y i 's explained by the straight line (the model) 0 R² 1 If R² = 1, all the i = 0, the straight line explain all the variation of the Y i 's If R² = 0, the slope is = 0, the straight line does not explain any variation of the Y i 's

35
35 An interesting indice R² R² and R (correlation coefficient) are not designed to measure linearity ! Example : Multiple R: 0.990 Squared multiple R: 0.980 Adjusted squared multiple R: 0.980

36
36 Questions How to obtain the best straight line ? Is this straight line the best curve to use ? How to use this straight line ?

37
37 How to use this straight line ? Direct use : for a given x –predict the mean Y –construct a confidence interval of the mean Y –construct a prediction interval of Y Reverse use calibration (approximate results): for a given Y –predict the mean x –construct a confidence interval of the mean x –construct a prediction interval of X

38
38 For a given x predict the mean Y Example :

39
39 Confidence interval of the mean Y There is a probability 1- that a+bx belongs to this interval

40
40 Confidence interval of the mean Y L U 30

41
41 Example

42
42 Prediction interval of Y 100(1- of the measurements carried-out for this x belongs to this interval

43
43 Prediction interval of Y L U 30

44
44 Example

45
45 Reverse use : for a given Y=y 0 predict the mean X Example :

46
46 For a given Y=y 0 a confidence interval of the mean X Y0Y0 X L U

47
47 Confidence interval of the mean X There is a probability 1- that the mean X belongs to [ L, U ] L and U are so that

48
48 Example

49
49 What you should no longer believe One can fit the straight line by inverting x and Y If the correlation coefficient is high, the straight line is the best model Normality of the i 's is essential to perform a good regression Normality of the x i 's is required to perform a regression

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google