Exploring Relationships: Correlations & Multiple Linear Regression Dr James Betts Developing Study Skills and Research Methods (HL20107)

Exploring Relationships: Correlations & Multiple Linear Regression

2 Lecture Outline: Correlation Coefficients Coefficients of Determinations Prediction & Regression Multiple Linear Regression Assessment Details.

3 StatisticsDescriptiveInferentialCorrelational Relationships GeneralisingOrganising, summarising & describing data Significance

4 Correlation A measure of the relationship (correlation) between interval/ratio LOM variables taken from the same set of subjects A ratio which indicates the amount of concomitant variation between two sets of scores This ratio is expressed as a correlation coefficient (r): Perfect Negative Relationship Perfect Positive Relationship No Relationship + _ Strong ModerateWeak Strong ModerateWeak +10 +0.7 +0.3+0.1 -0.7-0.3-0.1

5 Correlation Coefficient & Scatterplots Direction Variable X (e.g. VO 2 max). Variable Y (e.g. 10 km run time) Variable X (e.g. VO 2 max) Variable Y (e.g. Exercise Capacity).

6 Correlation Coefficient & Scatterplots Variable X (e.g. VO 2 max) Variable Y (e.g. Exercise Capacity). Variable X (e.g. Age) Variable Y (e.g. Strength) Form

7 Correlation Coefficient & Scatterplots Variable X (e.g. VO 2 max) Variable Y (e.g. Exercise Capacity). Significance Variable X (e.g. VO 2 max) Variable Y (e.g. 100 m Sprint time).

8 Correlation Coefficient & Scatterplots Variable X (e.g. VO 2 max) Variable Y (e.g. Exercise Capacity). Variable X (e.g. VO 2 max) Variable Y (e.g. 100 m sprint time). Significance

9 Methods of Calculating r Any method of calculating r requires: –Homoscedacity (i.e. equal scattering) –Linear data (curvilinear data requires eta η) Parametric data (i.e. raw data >ordinal LOM and either normal distribution or large sample) permits the use of Pearsons Product-Moment Correlation If raw data violates these assumptions then use Spearmans Rank Order Correlation instead.

10 X = Alcohol UnitsY = Skill ScoreX2X2 Y2Y2 XY 1542251660 1461963684 1041001640 98816472 87644956 8864 7104910070 69368154 4141619656 212414424 Totals= Pearsons Product-Moment Correlation

11 r = n XY-( X)( Y) [n X 2 -( X) 2 ] [n Y 2 -( Y) 2 Pearsons Product-Moment Correlation

12 X = Alcohol UnitsY = Skill ScoreRank XRank YDD2D2 154101.58.572 14693636 10481.56.542 9875.51.52.3 875.541.52.3 885.5 00 71048416 69374 414210864 21219864 Total= Spearmans Rank-Order Correlation

13 r = 1 - 6 D 2 n(n 2 -1)


15 SPSS Correlation Outputs

16 Coefficient of Determination (r 2 x 100) AKA variance explained, this figure denotes how much of the variance in Y can be explained/predicted by X e.g. to predict long jump distance (Y) from maximum sprint speed (X) r= 0.8 r 2 = 64% YX

17 Correlation versus Regression By attempting to predict one variable using another, we are now moving away from simple correlation and moving into the concept of regression Correlation = Regression =

18 Linear Regression The equation for a linear relationship can be expressed as: Y= a + bX - where: a = the y intercept; and b = the gradient Variable X (e.g. VO 2 max) Variable Y (e.g. Exercise Capacity).



21 SPSS Regression Output

22 Extrapolation versus Interpolation Variable X (e.g. VO 2 max) Variable Y (e.g. Exercise Capacity). Remember that the accuracy of your equation depends upon the linear relationship you observed ? Interpolation = Extrapolation =

23 Multiple Linear Regression We saw earlier how maximum sprint speed (X) can predict/explain 64% of variance in long jump distance (Y) Y X r 2 = 64% …but can Y be predicted any more effectively using more than one independent variable (i.e. X 1, X 2, X 3, etc)?

24 Multiple Linear Regression However, we can often predict Y effectively just using a specific subset of X variables (i.e. a reduced model) Y X1X1 X 2 Event Experience

25 Multiple Linear Regression Best Subset Selection Methods involve calculation of r for every possible combination of IVs Stepwise regression methods involve gradually either adding or removing variables and monitoring the impact of each action on r. –Standard methods add and remove variables –Forward selection methods begin with 1 IV and add more –Backwards elimination methods begin with all IVs and remove The order in which IVs are added/removed is critical as the variance explained solely by any one will be entirely dependent upon the presence of others.


27 SPSS Multiple Linear Regression Output

28 Summary: Exploring Relationships The relationship between two variables can be expressed as a correlation coefficient (r) The coefficient of determination (r 2 ) denotes the % of one variable that is explained by another Linear regression can provide an equation with which to predict one variable from another Multiple linear regression can potentially improve this prediction using multiple predictor variables.

34 both IVs unpaired Both IVs paired >2 variables 2 variables >2 groups 2 groups >2 observations 2 observations >1 observed frequency 1 observed frequency Looking for differences between categories/frequencies? (i.e. nominal LOM) Goodness of Fit χ 2 Looking for differences within the same group of subjects? (i.e. paired data) Looking for differences between 2 separate groups of subjects? (i.e. unpaired data) Looking for relationships? Looking for differences with >1 independent variable? Contingency χ 2 Paired t-test 1-way paired ANOVA Independent t-test 1-way unpaired ANOVA Pearsons r Multiple Linear Regression 2-way paired ANOVA 2-way unpaired ANOVA 1 IV paired 1 IV unpaired 2-way mixed model ANOVA Wilcoxon test Friedmans test Mann-Whitney test Kruskal Wallis test Spearmans r Post-Hoc Tests non-parametric Start Here If multiple DVs are involved then use MANOVA

