Download presentation

Presentation is loading. Please wait.

Published byArthur Tiller Modified over 2 years ago

1
Regression: Motivation One dimensional data (Summary by Mean) 10 20 30 40 50

2
X(X-a) 2 10(10-a) 2 20 (20-a) 2 30(30-a) 2 40(40-a) 2 50(50-a) 2 150Tmin T when a = mean=30

3
Regression EstriolBirth Wt 725 9 9 1227 1427 1430 1532 1534 1534 1535 1627 1624 1630 1631 1632 EstriolBirth Wt 3035.5 3235.5 3635.5 3537.0 3737.0 3138.5 3438.5 3840.0 3041.5 4043.0 2846.0 4346.0 3247.5 3947.5 3450.5

4
Regression Concerns –Data summarization (As in one dimensional data) –Prediction of low birthweight baby (for special prenatal care to those in high risk)

5
Scatter plot Birth weight Estriol

6
Lines through scatter plot to represent the data

7
Regression line: The best line The best representation of data

8
What is this with a line and numbers anyway? They could be the same in two different form or language But, lines require less space to record remember, memorize and are easy to comprehend Lines could be pictorial or mathematical representation of numerical data

9
A line Y = 2+3X Numbers generated by the line Slope = 2 Intercept =3 (interpretation ??) xy 02 15 28 …… 50152 …… ……

10
Representation of bivariate measure ments in different forms Equation Y =2+3x Data/Number xy 02 15 28 …… 50152 …… Picture/Graph

11
Straight lines

13
Regression: what line will generate the data? EstriolBirth Wt 725 9 9 1227 1427 1430 1532 1534 1534 1535 1627 1624 1630 1631 1632 EstriolBirth Wt 3035.5 3235.5 3635.5 3537.0 3737.0 3138.5 3438.5 3840.0 3041.5 4043.0 2846.0 4346.0 3247.5 3947.5 3450.5

14
Regression: what line will generate the data? Birth weight Estriol

15
Which is the best line?

16
The best line Birthweight = 21.52 + 0.608 Estriol

17
Computer output

18
Regression The Saga continues

19
Out of curiosity How did this accomplish what we wanted (i.e. data summarization and identifying women who might need special prenatal care)

20
1. We end up with the line Birthweight =21.52+0.608 Estriol, hoping that this line will generate the original data 2. In the case of univariate ‘mean’ is closest to the data in a sense. In similar way, regression line is the closet line to the data. In that sense it summarizes the data.

21
Recall One dimensional data (Summary by Mean) 10 20 30 40 50

22
Recall X(X-a) 2 Bweight(bweight- L) 2 10(10-a) 2 25(25-L) 2 20 (20-a) 2 25(25-L) 2 30(30-a) 2 25(25-L) 2 40(40-a) 2 27(27-L) 2 50(50-a) 2…… Mean=30 minimizes sumL =21.52+0.608 Esriol minimizes the sum – This is regression line

23
Prediction Women that need special care If lowbirth weight is defined as < 2500g, then women with estriol level < 5.72 are in hirisk of having low birthweight babies.

24
So is everything fine and dandy Not necessarily - –How closely does the regression line generates the data? –How much is estriol is responsible for birthweight?? –Was there something that would have better predicted women at risk???

27
How good is the regression

28
R 2 = 0.372 –Estriol explains about 37.2% of variation in the birthweights. Remaining 62.8 % is explained by other factors –At estriol 16, we have several birthweight s(24,30,31,32 and 35). If estriol is the only factor for Birthweight we would not see this variation.

29
How good is the regrssion

30
Other factors Multiple Regression

31
Regression Diagnostics Residual Analysis

32
Diagnostics Residual for a patient (observation) –Difference between observed birthweight and the birthweight regression line would generate (predict) Example: (for the first patient) –Observed birthweight = 25 –Generated = 21.52+0.608 estriol =21.52+0.608(7)=25.776 Residual = 25-25.776= -0.776

33
Diagnostics Residual plots Plot of residuals against predicted values For assumptions –Normality, linearity and homoscedasticity

34
Non normal Heteroscedasticity nonlinearity

35
Diagnostics Residuals for influence patients (observation) - change in estimated parameters (slope and intercept) when the analysis is redone without the patient in question Patients with high leverage and large residual will have greater influence.

36
Diagnostics Standardized and the studentized (or jackknife) residual –A patient with large values for these residuals indicate outliers

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google