Presentation is loading. Please wait.

Presentation is loading. Please wait.

Statistics for the Behavioral Sciences (5th ed.) Gravetter & Wallnau

Similar presentations


Presentation on theme: "Statistics for the Behavioral Sciences (5th ed.) Gravetter & Wallnau"— Presentation transcript:

1 Statistics for the Behavioral Sciences (5th ed.) Gravetter & Wallnau
Chapter 16 Correlations and Regression University of Guelph Psychology 3320 — Dr. K. Hennig Winter 2003 Term

2 Overview of chapter Correlations Regressions Pearson r
For non-linear (non scalar) data: Spearman r (with non-linear data) point-biserial (where one variable is dichotomous) phi-coefficient (where both variables are dichotomous) Regressions

3 CORRELATIONS: Figure 16-1 (p
CORRELATIONS: Figure (p. 522) The relationship between exam grade and time needed to complete the exam. Notice the general trend in these data: Students who finish the exam early tend to have better grades.

4 Figure (p. 523) The same set of n = 6 pairs of scores (X and Y values) is shown in a table and in a scatterplot. Notice that the scatterplot allows you to see the relationship between X and Y.

5 Three characteristics 1
Three characteristics 1. Direction: examples of positive and negative relationships. (a) Beer sales are positively related to temperature. (b) Coffee sales are negatively related to temperature.

6 2. Form: Examples of relationships that are not linear: (a) relationship between reaction time and age; (b) relationship between mood and drug dose.

7 3. Degree: Examples of different values for linear correlations: (a) shows a strong positive relationship, approximately +0.90; (b) shows a relatively weak negative correlation, approximately –0.40; (c) shows a perfect negative correlation, –1.00; (d) shows no linear trend, 0.00.

8 Pearson (product-moment) correlation
sum of products of deviations, or SP = (X-Mx) (Y-MY), Mx = mean for x scores, etc. Recall: SS = ∑(X-M)2=(X-M)(X-M) 3 5

9 Pearson (product-moment) correlation
r = degree to which X and Y vary together degree X and Y vary separately computational formula: SP= XY-  XY/n expressed as a z-score: r= zxzy/n note: must use population 

10 Understanding and interpreting r
correlation do not prove causation, but they can disprove causation the value of a correlation can be effected greatly by range of scores in the data outliers can have a dramatic effect do not interpret a correlation as a proportion (e.g., 0.50 = 50%); rather r2 = .25 or 25% of the total variability is accounted for| -is called the coefficient of determination

11 The effect of range (a) In this example, the full range of X and Y values shows a strong, positive correlation, but the restricted range of scores produces a correlation near zero. (b) An example in which the full range of X and Y values shows a correlation near zero, but the scores in the restricted range produce a strong, positive correlation.

12 Outliers A demonstration of how one extreme data point (an outlier) can influence the value of a correlation.

13 Hyporthesis testing H0: p = 0 (There is no population correlation)
H1: p  0 (there is a real correlation)

14 CORRELATIONS: For non-linear relations Relationship between practice and performance. Although this relationship is not linear, there is a consistent positive relationship. An increase in performance tends to accompany an increase in practice.

15 Spearman r: Scatterplots showing (a) the scores and (b) the ranks for the data in Example Notice that there is a consistent, positive relationship between the X and Y scores, although it is not a linear relationship. Also notice that the scatterplot of the ranks shows a perfect linear relationship. Steps: 1. rank order 2. use formula of Pearson r, or Special formula

16 Other measures of relationship
Point-biserial - where one variable is dichotomous (has two values; male vs. female, first-born vs. later born, etc.) phi-coefficient - where both variables are (e.g., variable above - birth order (->1st vs. later born)

17 Introduction to regression SAT scores and GPA - regression line drawn through the data points. The regression line defines a precise, one-to-one relationship between each X value (SAT score) and its corresponding Y value (GPA).

18 Relationship between total cost and number of hours playing tennis
Relationship between total cost and number of hours playing tennis. The tennis club charges a $25 membership fee plus $5 per hour. The relationship is described by a linear equation: Total cost = $5 (number of hours) + $25 Y = bX + a. The statistical technique for finding a best-fit line is called regression

19 The distance between the actual data point (Y) and the predicted point on the line (Ŷ) is defined as Y – Ŷ. The goal of regression is to find the equation for the line that minimized these distances.

20 Best-fit straight line
Best-fit straight line. The predicted Y values (Ŷ) are on the regression line. Unless the correlation is perfect (+1.00 or –1.00), there will be some error between the actual Y values and the predicted Y values. The larger the correlation is, the less the error will be.

21 Scatterplot showing data points that perfectly fit the regression equation Ŷ = 1.6X – 2. Note that the correlation is r = (b) Scatterplot for the data from Example Notice that there is error between the actual data points and the predicted Y values of the regression line. -total squared error = ∑(Y-Ŷ)2 ->least squared solution

22 Regression (contd.) SP = (X-Mx) (Y-MY) SSx= (X-Mx)2 Example
The regression equation for Y is the linear equation: Goal is to find best a and b for best-fit line Ŷ = bX + a, where: b = SP/SSx, and a = MY-bMx SP = (X-Mx) (Y-MY) SSx= (X-Mx)2 Example X = 1, 3, Y=4, 9, 8 (from text p. 559) What are the predicted values for 5, 7, 9? SPSS

23 A set of 9 data points (X and Y values) with a correlation of r = The colored lines in part (a) show deviations from the mean for Y. For these data, SSY = 240 (total variability). In part (b) the colored lines show deviations from the regression line. For these data, SSerror = 86.4 The regression line reduces SS value by r2 = 0.64 or 64%. Error= 1 - r2


Download ppt "Statistics for the Behavioral Sciences (5th ed.) Gravetter & Wallnau"

Similar presentations


Ads by Google