Presentation is loading. Please wait.

Presentation is loading. Please wait.

Correlation and Regression SCATTER DIAGRAM The simplest method to assess relationship between two quantitative variables is to draw a scatter diagram.

Similar presentations


Presentation on theme: "Correlation and Regression SCATTER DIAGRAM The simplest method to assess relationship between two quantitative variables is to draw a scatter diagram."— Presentation transcript:

1

2 Correlation and Regression

3 SCATTER DIAGRAM The simplest method to assess relationship between two quantitative variables is to draw a scatter diagram From this diagram we notice that as age increases there is a general tendency for the BP to increase. But this does not give us a quantitative estimate of the degree of the relationship

4 CORRELATION COEFFICIENT index of the degree of association The correlation coefficient is an index of the degree of association between two variables. It can also be used for comparing the degree of association in different groups For example, we may be interested in knowing whether the degree of association between age and systolic BP is the same (or different) in males and females ‘r’ The correlation coefficient is denoted by the symbol ‘r’ ‘r’ ranges from -1 to +1 ‘r’ ranges from -1 to +1

5 High values of one variable tend to occur with high values of the other (and low with low) positive correlation In such situations, we say that there is a positive correlation High values of one variable occur with low values of the other (and vice-versa) negative correlation we say that there is a negative correlation

6 A NOTE OF CAUTION Correlation coefficient is purely a measure of degree of does not association and does not provide any evidence of a cause-effect relationship It is valid only in the range of values studied Extrapolation of the association may not always be valid Eg.: Age & Grip strength

7 r measures the degree of linear relationship r = 0 does not necessarily mean that there is no relationship between the two characteristics under study; the relationship could be curvilinear Spurious correlation : The production of steel in UK and population in India over the last 25 years may be highly correlated

8 r does not give the rate of change in one variable for changes in the other variable Eg: Age & Systolic BP - Males : r = 0.7 Females : r = 0.5 From this one should not conclude that Systolic BP increases at a higher rate among males than females

9 PROPERTY OF CORRELATION COEFFICIENT CORRELATION COEFFICIENT Correlation coefficient is unaffected by addition / subtraction of a constant or multiplication / division by a constant to all the values of X and Y Corr. Coeff. between X & Y = 0.7,, X+10 & Y-6 = 0.7,, 5X & 2Y = 0.7 If the correlation coefficient between height in inches and weight in pounds is say, 0.6, the correlation coefficient between height in cm and weight on kg will also be 0.6

10 COMPUTATION OF THE CORRELATION COEFFICIENT Covariance (XY) Sum n = 7

11 UNIVARIATE REGRESSION Regression : Method of describing the relationship between two variables Use : To predict the value of one variable given the other

12 SAMPLE DATA SET Patient No.Age (X) Sys BP (Y) 1 45150 2 48153 3 46148 4 45150 5 46147 6 48153 7 46149 8 55159 9 51157 10 56160 11 53158 12 60165 13 53157 14 54158 15 49154 BP = Response (dependent) variable; Age = Predicator (independent) variable

13 REGRESSION MODEL We can perform a “regression of BP on age”, to derive a straight line that gives an estimated value of BP for any given age. The general equation of a linear regression line is Y = a + bX + e Where, a = Intercept b = Regression coefficient e = Statistical error

14 CALCULATIONS Estimated from the observed values of Age (X) and BP (Y) by least square method b gives the change in Y for a unit change in X a is the value of Y when X = 0, which may not be meaningful always

15 TEST OF SIGNIFICANCE FOR b Null hypothesis : Test statistic t = Where, The value given under(1) follows a t-distribution with (n-2) df

16 ASSUMPTIONS 1. The relation between the two variables should be linear 2.The residuals should follow a Normal distribution with zero mean and constant variance

17 PRECAUTIONS 1. Adequate sample size should be ensured 2.Prediction should be made within the range of the observed values. No extrapolation should be attempted 3.The equation Y = a + bX should not be used to predict X for a given Y 4. Model adequacy should be verified

18 RESULTS OF REGRESSION ANALYSIS -------------------------------------------------------------------------------------- Ind. variable Reg Coeff. SE t P-value -------------------------------------------------------------------------------------- Age 1.08 0.08 14.16 < 0.0001 Constant 100.34 -------------------------------------------------------------------------------------- R 2 = 93.99%  94% Systolic BP = 100.34 + 1.08 Age 95% CI for b = b ± 1.96 SE(b) = 1.08 ± 1.96 x 0.08 = (0.92, 1.24)

19 INTERPRETATIONS 1. Change in age by one year results in a change of 1.08 mm Hg in Sys. BP 2. When age = 0, BP = 100.34, which is absurd 3.BP of a 50 year old individual is 100.24 + 1.08 x 50 = 154.34  154 mm Hg 4. 94% of the variation in BP is explained by age alone

20 MULTIPLE LINEAR REGRESSION The response variable is expressed as a combination of several predictor variables 0.147 & 1.024 are regression coefficients for ht. and wt. Indicate the increase in for an increase of 1 cm in ht. and 1 kg in wt., respectively Eg.

21 LOGISTIC REGRESSION Response variable - Presence or absence of some condition We predict a transformation of the response variable instead of the actual value of the variable Data : Hypertension, Smoking (X 1 ), Obesity(X 2 ) & Snoring (X 3 ) Which of the factors are predictors of hypertension? Logit (p) = -2.378 - 0.068 X 1 + 0.695 X 2 + 0.872 X 3 The probability can be estimated for any combination of the three variables Also, we can compare the predicated probability for different groups, e.g., Smokers and Non-smokers


Download ppt "Correlation and Regression SCATTER DIAGRAM The simplest method to assess relationship between two quantitative variables is to draw a scatter diagram."

Similar presentations


Ads by Google