Presentation is loading. Please wait.

Presentation is loading. Please wait.

Correlation is a statistical technique that describes the degree of relationship between two variables when you have bivariate data. A bivariate distribution.

Similar presentations

Presentation on theme: "Correlation is a statistical technique that describes the degree of relationship between two variables when you have bivariate data. A bivariate distribution."— Presentation transcript:

1 Correlation is a statistical technique that describes the degree of relationship between two variables when you have bivariate data. A bivariate distribution is required for correlation and regression technique. A bivariate distribution is a joint distribution of two variables, the individual scores of which are paired in some logical way. A bivariate distribution may show positive correlation, negative correlation, or zero correlation. EDUCATIONAL STATISTICS EDU5950 WEEK12

2 In a case of a positive correlation between two variables, high measurements on one variable tend to be associated with high measurements on the other variable, and low measurements on one variable with low measurements on the other. Table 1 shows a graph of relationship of tall fathers tend to have sons who grew up to be tall men. Short fathers tend to have sons who grow up to be short men. If such were the case (which, of course, is ridiculous), then it would be possible to predict without error the adult height of an unborn son simply by measuring his father. EDUCATIONAL STATISTICS EDU5950 WEEK12



5 In Figure 1, each point represents a pair of scores, the height of a father and the height of his son. Such an array of points is called a scatterplot. The line that runs through the points is called a regression line. It is a line of best fit. When there is perfect correlation (r = 1.00), all points fall exactly on the regression line. EDUCATIONAL STATISTICS EDU5950 WEEK12

6 When a correlation is negative, increases in one variable are accompanied by decreases in the other variable (an inverse relationship). With negative correlation, the regression line goes from the upper left corner of the graph to the lower right corner. As you may recall, such lines have a negative slope. Although some correlation coefficients are positive and some are negative, one is not more valuable than the other. EDUCATIONAL STATISTICS EDU5950 WEEK12

7 The algebraic sign simply tells you the direction of the relationship (which is important when you are describing how the variables are related). The absolute size of r, however tells you the degree of the relationship. A strong relationship (either positive or negative) is usually more valuable than a weaker one. A zero correlation means there is no linear relationship between two variables. EDUCATIONAL STATISTICS EDU5950 WEEK12

8 High and low scores on the two variables are not associated in any predictable manner. Figure 2 shows a scatterplot that produces a zero correlation coefficient. When r = 0, the regression line is horizontal at a height of Y. This make sense; if r = 0, then your best estimate of Y for any value is Y. EDUCATIONAL STATISTICS EDU5950 WEEK12


10 A correlation coefficient provides a quantitative way to express the degree of relationship that exists between two variables. The definition formula is where r = Pearson product-moment correlation coefficient zx = a z score for variable X zy = the corresponding z score for variable Y N = number of pairs of X and Y values EDUCATIONAL STATISTICS EDU5950 WEEK12

11 Because researchers often use means and standard deviations when telling the story of the data, this formula (Blanched formula) is used by many: Where; x & y = paired observations xy = product of each x value multiplied by its paired y value = mean of variable x = mean of variable y = standard deviation of variable x = standard deviation of variable y N= number of pairs of observations EDUCATIONAL STATISTICS EDU5950 WEEK12

12 With the raw score formula, you calculate r from the raw scores without computing means and standard deviations. The formula is Remember that N is the number of pairs of values. EDUCATIONAL STATISTICS EDU5950 WEEK12




16 The basic simple interpretation of r is probably familiar to you at this point. A correlation coefficient measures the degree of linear relationship between two variables of a bivariate distribution. What is qualifies as a large correlation coefficient? What is small? Jacob Cohen proposed that the question be answered by calculating an effect size index (d) and that d values of.20,.50, and.80 were designated as small, medium, and large, respectively. EDUCATIONAL STATISTICS EDU5950 WEEK12

17 In a similar way, Cohen addressed the question of calculating an effect size index for correlation coefficients. Small r =.10 Medium r =.30 Large r =.50 The correlation coefficient is also the basis of the coefficient of determination, which tells the proportion of variance that two variables in a bivariate distribution have in common. EDUCATIONAL STATISTICS EDU5950 WEEK12

18 The coefficient of determination is calculated by squaring r; it is always a positive value between 0 and 1. coefficient of determination = r 2 EDUCATIONAL STATISTICS EDU5950 WEEK12

19 Linear regression is a technique that uses the data to write an equation for a straight line – then to make predictions. Often predictions are based on an assumption that the relationship between two variables is linear. Formula for a straight line is Y = mX + b Where; - Y and X are variables representing scores on the Y and X axes - m = slope of the line - b = intercept of the line with the Y axis EDUCATIONAL STATISTICS EDU5950 WEEK12

20 Least square method is Consider this data: Father height 64 in First daughter height 66 in Second daughter height 64 in Draw a regression line using least square method EDUCATIONAL STATISTICS EDU5950 WEEK12

21 There is an error for each person on the scatterplot. The least square method creates a straight line such that the sum of the squares of the errors is a minimum. The least square method produces numerical values for the slope and the intercept – write the equation for a straight line; this line is the one that best fits the data. In statistics, the regression equation is EDUCATIONAL STATISTICS EDU5950 WEEK12

22 Where; Y = Y value predicted from a particular X value a = point at which the regression line intersects the s the axis b = slope of the regression line X = X value for which you wish to predict a Y value Note: In correlation problem, the symbol Y can be assigned to either variable, but in regression equation, Y is assigned to the variable you wish to predict. EDUCATIONAL STATISTICS EDU5950 WEEK12

23 You need the values for a and b, which are called regression coefficients – can be calculated from any bivariate set of data. To calculate b, To calculate a, EDUCATIONAL STATISTICS EDU5950 WEEK12

24 Let use our data from earlier correlation problem, predict math from oral test in SPM. EDUCATIONAL STATISTICS EDU5950 WEEK12

25 Entering these regression coefficient values into the regression equation produces a formula that predicts math from oral test: EDUCATIONAL STATISTICS EDU5950 WEEK12


Download ppt "Correlation is a statistical technique that describes the degree of relationship between two variables when you have bivariate data. A bivariate distribution."

Similar presentations

Ads by Google