Presentation is loading. Please wait.

Presentation is loading. Please wait.

CORRELATION: Correlation analysis Correlation analysis is used to measure the strength of association (linear relationship) between two quantitative variables.

Similar presentations


Presentation on theme: "CORRELATION: Correlation analysis Correlation analysis is used to measure the strength of association (linear relationship) between two quantitative variables."— Presentation transcript:

1

2 CORRELATION: Correlation analysis Correlation analysis is used to measure the strength of association (linear relationship) between two quantitative variables The analysis is only concerned with strength of the relationship ; hence no causal effect is implied A scatter plot (or scatter diagram) is used to show the relationship between two variables 2

3 Linear relationships y x Curvilinear relationships y x x y y x Scatter Plot Examples 3

4 Strong relationships Weak relationships y y y y x x x x 4

5 No relationship 5

6 The population correlation coefficient ρ (rho) measures the strength of the association between the variables The sample correlation coefficient r is an estimate of ρ and is used to measure the strength of the linear relationship in the sample observations. Correlation coefficient: The value of r varies from sample to sample, its sampling distribution is student t distribution 6

7 Are unit free Range between -1 and 1 The closer to -1, the stronger the negative linear relationship The closer to 1, the stronger the positive linear relationship The closer to 0, the weaker the linear relationship Both  and r 7

8 A general guideline on interpretation of correlation 8

9 Significance test for correlation: Hypotheses tested are H 0 : ρ = 0 (no correlation) H 1 : ρ ≠ 0 (correlation exists) Test statistic is If p-value is less than level of significance (  ); then there is evidence of a linear relationship between two variables. 9

10 Pincherle and Robinson (1974) note a marked inter-observer variation in blood pressure readings. They found that doctors who read high on systolic tended to read high on diastolic. The table below shows the mean systolic and diastolic blood pressure reading by 14 doctors. Research question: Is the association between the two variables significant? 10 Example:

11 Scatter plot of blood pressure data: 11

12 r= 0.418; low positive correlation between systolic and diastolic blood pressure p-value= 0.136; there isn’t sufficient evidence to indicate an association between systolic and diastolic blood pressure 12

13 Regression analysis Regression analysis is used to:  Predict the value of a dependent variable based on the value of at least one independent variable  Explain the impact of changes in an independent variable on the dependent variable Dependent variable: the variable we wish to explain Independent variable: the variable used to explain the dependent variable REGRESSION ANLYSIS: 13

14 SIMPLE LINEAR REGRESSION SIMPLE LINEAR REGRESSION : Only one independent variable is used to explain the dependent variable Relationship between dependent and independent variables is described by a linear function Changes in dependent variable are assumed to be caused by changes in independent variable. 14

15 The model is of the form The parameters and are called the regression coefficients; is the intercept and is the slope of the regression fit. y is the dependent variable and X is the independent variable  is error term; it introduces randomness into the model. 15

16 Using sample information And the estimated regression model fit is 16

17 is the estimated change in the average value of y as a result of a one-unit change in x Interpretation of regression coefficient: Example: Research question: is there a linear relationship between BP And age ? Answer this question using information on 30 individuals 17

18 BP is dependent variable Age is the independent variable 18

19 The estimated regression fit is For every additional year in age, the BP increases by 0.97 units 19

20 Test for overall significance fit: Hypotheses for test for determining if the model fitted is Statistically significant are: H 0 : regression fit is not significant H 1 : regression fit is significant Make use of ANOVA table to make a decision about the test 20

21 ANOVA table  SSR is explained variation attributable to the relationship between dependent and independent variables  SSE is unexplained variation; occurs due to chance  SST is the total variation source of variationd.fS.SM.SF-ratiop-value Regression1SSRMSR=SSR/1Fc=MSR/MSEPr(F > Fc) Residualn-2SSEMSE=SSE/(n-2) Totaln-1SST If p-value is less than level of significance, fitted model is a significant fit 21

22 ANOVA Source of variationS.SdfM.SFSig. Regression6394.0231 21.330.000 Residual8393.44428299.766 Total14787.46729 p-value is <0.001, fitted model is significant 22

23 Test for significance of predictor: Hypotheses of the test is : Test statistic is If p-value is less than level of significance, the predictor is linearly associated with response 23

24 CoefficientstSig. BStd. Error Intercept98.71510.0009.871.000 age.971.2104.618.000 a. Dependent Variable: bp Coefficients: The value of test statistic is 4.618, p-value is <0.001; age is linearly associated with BP 24

25 The coefficient of determination is the portion of the total variation in the dependent variable that is explained by variation in the independent variable The coefficient of determination is also called R-squared and is denoted as R 2 Graded interpretation : r 2 = 0.1-0.3 weak relationship ; 0.4-0.7 moderate relationship; 0.8-1 strong relationship Coefficient of determination: 25

26 MULTIPLE LINEAR REGRESSION:  Use two or more independent variables to explain the dependent variable  Multiple linear regression allows us to investigate the joint effect of several independent variables on the dependent  We relate a single outcome(dependent) variable to two or more independent variables simultaneously 26

27 Aim of fitting regression line is:  Identify independent variables that are associated with the dependent variable in order to promote understanding of the underlying process.  Determine the extent to which each independent variable is linearly related to the dependent variable after adjusting for other variables that may be related to it.  Predict the value of the dependent variable as accurately as possible from the predictor values. 27

28 The regression model is of the form: where are the independent variables and are the regression coefficients. 28

29 Interpretation of regression coefficients: The regression coefficient is the estimated change in the average value of dependent variable for every unit increase in the corresponding predictor, holding other factors in the model constant. Each of the estimates is adjusted for the effects of all other predictors. 29

30 Inference on regression coefficients: We can make inference on each regression coefficient ; by carrying out statistical hypothesis test The test statistic is 30

31 If p-value is less than level of significance, the independent variable is linearly associated with dependent after adjusting for all other independent variables Test for significance of model fit: Analysis is same as that of simple model, our focus is on p-value in ANOVA table. The inference is the same; i.e p-value <  ; model fitted is statistically significant. 31

32 32 Adjusted R statistic: related to coefficient of determination. It also measures the proportion of variation of dependent variable that is accounted for by the independent variables.

33 33 A regression model is fitted to determine if a linear relationship exists between patient satisfaction level and :the patient's age(in years), severity of illness (an index) and anxiety level (an index). The data used was for 30 patients selected at random. For the data collected, larger values of patient satisfaction, severity of illness and anxiety level are, respectively associated with more satisfaction, increased severity in illness and more anxiety. Example:

34 34 The estimated regression fit is  Adjusting for severity of illness and anxiety level of patients ; for every additional year in age, the satisfaction level on average decreases by 1.27 units Interpretation of regression coefficients:

35 35  Adjusting for age and anxiety level; the satisfaction level on average decreases by 0.84 units for every unit increase in severity of illness.  Adjusting for age and anxiety level of patients; the satisfaction level on average decreases by 6 units for every unit increase in anxiety level.

36 36 Source of variation(d.f)S.SM.SF-ratioP-value Regression37256.32418.76730.46< 0.001 Residual262063.279.4 Total299319.5 ANOVA table Overall the fit is significant since p-value is < 0.001

37 37  s.e(  ) zp-value age-1.27420.2406-5.295< 0.001 severity-0.84730.4599-1.8420.077 anxiety-6.00726.2042-0.9680.3418 intercept168.6078 Parameter estimates: From results above; age is significant variable; i.e controlling for anxiety level and severity of illness of patients; age is significantly associated with satisfaction level


Download ppt "CORRELATION: Correlation analysis Correlation analysis is used to measure the strength of association (linear relationship) between two quantitative variables."

Similar presentations


Ads by Google