Ecole Nationale Vétérinaire de Toulouse Linear Regression

Slides:



Advertisements
Similar presentations
Numbers Treasure Hunt Following each question, click on the answer. If correct, the next page will load with a graphic first – these can be used to check.
Advertisements

Scenario: EOT/EOT-R/COT Resident admitted March 10th Admitted for PT and OT following knee replacement for patient with CHF, COPD, shortness of breath.
Chapter 4 Sampling Distributions and Data Descriptions.
Angstrom Care 培苗社 Quadratic Equation II
AP STUDY SESSION 2.
1
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
Copyright © 2011, Elsevier Inc. All rights reserved. Chapter 6 Author: Julia Richards and R. Scott Hawley.
Author: Julia Richards and R. Scott Hawley
STATISTICS Joint and Conditional Distributions
STATISTICS Linear Statistical Models
STATISTICS INTERVAL ESTIMATION Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.
Properties Use, share, or modify this drill on mathematic properties. There is too much material for a single class, so you’ll have to select for your.
Objectives: Generate and describe sequences. Vocabulary:
David Burdett May 11, 2004 Package Binding for WS CDL.
We need a common denominator to add these fractions.
1 RA I Sub-Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Casablanca, Morocco, 20 – 22 December 2005 Status of observing programmes in RA I.
Custom Statutory Programs Chapter 3. Customary Statutory Programs and Titles 3-2 Objectives Add Local Statutory Programs Create Customer Application For.
CALENDAR.
1 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt BlendsDigraphsShort.
Chapter 7 Sampling and Sampling Distributions
1 Click here to End Presentation Software: Installation and Updates Internet Download CD release NACIS Updates.
Simple Linear Regression 1. review of least squares procedure 2
Biostatistics Unit 5 Samples Needs to be completed. 12/24/13.
Break Time Remaining 10:00.
Turing Machines.
Table 12.1: Cash Flows to a Cash and Carry Trading Strategy.
PP Test Review Sections 6-1 to 6-6
1 The Blue Café by Chris Rea My world is miles of endless roads.
Bright Futures Guidelines Priorities and Screening Tables
EIS Bridge Tool and Staging Tables September 1, 2009 Instructor: Way Poteat Slide: 1.
Bellwork Do the following problem on a ½ sheet of paper and turn in.
Exarte Bezoek aan de Mediacampus Bachelor in de grafische en digitale media April 2014.
Copyright © 2013, 2009, 2006 Pearson Education, Inc. 1 Section 5.5 Dividing Polynomials Copyright © 2013, 2009, 2006 Pearson Education, Inc. 1.
Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.
1 RA III - Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Buenos Aires, Argentina, 25 – 27 October 2006 Status of observing programmes in RA.
Basel-ICU-Journal Challenge18/20/ Basel-ICU-Journal Challenge8/20/2014.
1..
Adding Up In Chunks.
MaK_Full ahead loaded 1 Alarm Page Directory (F11)
1 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt Synthetic.
Subtraction: Adding UP
: 3 00.
1 hi at no doifpi me be go we of at be do go hi if me no of pi we Inorder Traversal Inorder traversal. n Visit the left subtree. n Visit the node. n Visit.
Analyzing Genes and Genomes
1 Let’s Recapitulate. 2 Regular Languages DFAs NFAs Regular Expressions Regular Grammars.
Speak Up for Safety Dr. Susan Strauss Harassment & Bullying Consultant November 9, 2012.
Essential Cell Biology
Converting a Fraction to %
Chapter Thirteen The One-Way Analysis of Variance.
Chapter 8 Estimation Understandable Statistics Ninth Edition
Clock will move after 1 minute
PSSA Preparation.
Copyright © 2013 Pearson Education, Inc. All rights reserved Chapter 11 Simple Linear Regression.
Experimental Design and Analysis of Variance
Essential Cell Biology
Immunobiology: The Immune System in Health & Disease Sixth Edition
Simple Linear Regression Analysis
©The McGraw-Hill Companies, Inc. 2008McGraw-Hill/Irwin Linear Regression and Correlation Chapter 13.
Physics for Scientists & Engineers, 3rd Edition
Correlation and Linear Regression
Multiple Regression and Model Building
Energy Generation in Mitochondria and Chlorplasts
Select a time to count down from the clock above
16. Mean Square Estimation
Heibatollah Baghi, and Mastee Badii
1 Dr. Scott Schaefer Least Squares Curves, Rational Representations, Splines and Continuity.
STAT E-150 Statistical Methods
1 Decidability continued…. 2 Theorem: For a recursively enumerable language it is undecidable to determine whether is finite Proof: We will reduce the.
Presentation transcript:

Didier Concordet d.concordet@envt.fr Ecole Nationale Vétérinaire de Toulouse Linear Regression Didier Concordet d.concordet@envt.fr ECVPT Workshop April 2011 Can be downloaded at http://www.biostat.envt.fr/

An example

About the straight line Y= a + b x Y x a b>0 b<0 b=0 a=0 a = intercept b = slope

Questions How to obtain the best straight line ? Is this straight line the best curve to use ? How to use this straight line ?

How to obtain the best straight line ? Proceed in three main steps write a (statistical) model estimate the parameters graphical inspection of data

A statistical model Write a model Mean model : functionnal relationship Variance model : Assumptions on the residuals

Write a model Mean model = residual (error term)

Assumptions on the residuals the xi 's are not random variables they are known with a high precision the ei 's have a constant variance homoscedasticity the ei 's are independent the ei 's are normally distributed normality

Homoscedasticity homoscedasticity heteroscedasticity

Normality Y x

Estimate the parameters A criterion is needed to estimate parameters A statistical model A criterion

How to estimate the "best" a et b ? Intuitive criterion : minimum compensation Reasonnable criterion : minimum Linear model Homoscedasticity Normality Least squares criterion (L.S.)

The least squares criterion

Result of optimisation and change with samples and are random variables

Balance sheet True mean straight line Estimated straight line or Mean predicted value for the ith observation ith residual

Example Estimated straight line Dep Var: HPLC N: 18 Effect Coefficient Std Error t P(2 Tail) CONSTANT 20.046 3.682 5.444 0.000 CONCENT 2.916 0.069 42.030 0.000 Intercept Estimated straight line Slope

Example

Example

Residual variance by construction but The residual variance is defined by standard error of estimate

Example Dep Var: HPLC N: 18 Multiple R: 0.996 Squared multiple R: 0.991 Adjusted squared multiple R: 0.991 Standard error of estimate : 8.282 Effect Coefficient Std Error t P(2 Tail) CONSTANT 20.046 3.682 5.444 0.000 CONCENT 2.916 0.069 42.030 0.000

Questions How to obtain the best straight line ? Is this straight line the best curve to use ? How to use this straight line ?

Is this model the best one to use ? Tools to check the mean model : scatterplot residuals vs fitted values test(s) Tools to check the variance model : scatterplot residuals vs fitted values Probability plot (Pplot)

Checking the mean model scatterplot residuals vs fitted values structure in the residuals change the mean model No structure in the residuals OK

Checking the mean model : tests Two cases No replication Try a polynomial model (quadratic first) Replications Test of lack of fit

Without replication try another mean model and test the improvement Example : If the test on c is significant (c  0) then keep this model Dep Var: HPLC N: 18 Multiple R: 0.996 Squared multiple R: 0.991 Adjusted squared multiple R: 0.991 Standard error of estimate: 8.539 Effect Coefficient Std Error t P(2 Tail) CONSTANT 21.284 6.649 3.201 0.006 CONCENT 2.842 0.335 8.486 0.000 CONCENT *CONCENT 0.001 0.003 0.227 0.824

With replications Perform a test of lack of fit Principle : compare to Departure from linearity Pure error Principle : compare to if - > then change the model

Test of lack of fit : how to do it ? Three steps 1) Linear regression 2) One way ANOVA 3) if then change the model

Test of lack of fit : example Three steps 1) Linear regression 2) One way ANOVA Dep Var: HPLC N: 18 Analysis of Variance Source Sum-of-Squares df Mean-Square F-ratio P CONCENT 121251.776 5 24250.355 289.434 0.000 Error 1005.427 12 83.786 3) if We keep the straight line

Checking the variance model : homoscedasticity scatterplot residuals vs fitted values No structure in the residuals but heteroscedasticity change the model (criterion) homoscedasticity OK

What to do with heteroscedasticity ? scatterplot residuals vs fitted values : modelize the dispersion. The standard deviation of the residuals increases with : it increases with x

What to do with heteroscedasticity ? Estimate again the slope and the intercept but with weights proportionnal to the variance. with and check that the weight residuals (as defined above) are homoscedastic

Checking the variance model : normality Expected value for normal distribution Expected value for normal distribution No curvature : Normality Curvature : non normality is it so important ?

What to do with non normality ? Try to modelize the distribution of residuals In general, it is difficult with few observations If enough observations are available, the non normality does not affect too much the result.

An interesting indice R² R² = square correlation coefficient = % of dispersion of the Yi's explained by the straight line (the model) 0  R²  1 If R² = 1, all the ei = 0, the straight line explain all the variation of the Yi's If R² = 0, the slope is = 0, the straight line does not explain any variation of the Yi's

An interesting indice R² R² and R (correlation coefficient) are not designed to measure linearity ! Example : Multiple R: 0.990 Squared multiple R: 0.980 Adjusted squared multiple R: 0.980

Questions How to obtain the best straight line ? Is this straight line the best curve to use ? How to use this straight line ?

How to use this straight line ? Direct use : for a given x predict the mean Y construct a confidence interval of the mean Y construct a prediction interval of Y Reverse use calibration (approximate results): for a given Y predict the mean x construct a confidence interval of the mean x construct a prediction interval of X

For a given x predict the mean Y Example :

Confidence interval of the mean Y There is a probability 1-a that a+bx belongs to this interval

Confidence interval of the mean Y U L 30

Example

Prediction interval of Y 100(1-a)% of the measurements carried-out for this x belongs to this interval

Prediction interval of Y U L 30

Example

Reverse use : for a given Y=y0 predict the mean X Example :

For a given Y=y0 a confidence interval of the mean X U

Confidence interval of the mean X There is a probability 1-a that the mean X belongs to [ L , U ] L and U are so that

Example

What you should no longer believe One can fit the straight line by inverting x and Y If the correlation coefficient is high, the straight line is the best model Normality of the xi's is required to perform a regression Normality of the ei's is essential to perform a good regression