Chapter 7 -Part 1 Correlation. Correlation Topics zCorrelational research – what is it and how do you do “co-relational” research? zThe three questions:

Slides:



Advertisements
Similar presentations
Review ? ? ? I am examining differences in the mean between groups
Advertisements

Chapter 4 The Relation between Two Variables
Chapter 8 Linear Regression © 2010 Pearson Education 1.
Chapter 15 (Ch. 13 in 2nd Can.) Association Between Variables Measured at the Interval-Ratio Level: Bivariate Correlation and Regression.
Chapter 5 Introduction to Inferential Statistics.
Chapter 3 The Normal Curve.
The standard error of the sample mean and confidence intervals
The standard error of the sample mean and confidence intervals
Chapter 3 The Normal Curve Where have we been? To calculate SS, the variance, and the standard deviation: find the deviations from , square and sum.
Chapter 5 Introduction to Inferential Statistics.
Chapter 1 The mean, the number of observations, the variance and the standard deviation.
Chapter 7 -Part 1 Correlation. Correlation Topics zCo-relationship between two variables. zLinear vs Curvilinear relationships zPositive vs Negative relationships.
Correlation 2 Computations, and the best fitting line.
Basic Statistical Concepts Psych 231: Research Methods in Psychology.
Chapter 3 The Normal Curve Where have we been? To calculate SS, the variance, and the standard deviation: find the deviations from , square and sum.
Basic Statistical Concepts
Statistics Psych 231: Research Methods in Psychology.
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.
Statistics for the Social Sciences Psychology 340 Fall 2006 Relationships between variables.
The standard error of the sample mean and confidence intervals How far is the average sample mean from the population mean? In what interval around mu.
Chapter 5 Introduction to Inferential Statistics.
1 Simple Linear Regression Chapter Introduction In this chapter we examine the relationship among interval variables via a mathematical equation.
Chapter 1 The mean, the number of observations, the variance and the standard deviation.
Chapter 7 -Part 1 Correlation. Correlation Topics zCorrelational research – what is it and how do you do “co-relational” research? zThe three questions:
REGRESSION AND CORRELATION
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 6-1 Chapter 6 The Normal Distribution and Other Continuous Distributions.
Business Statistics - QBM117 Least squares regression.
Basic Statistical Concepts Part II Psych 231: Research Methods in Psychology.
The standard error of the sample mean and confidence intervals How far is the average sample mean from the population mean? In what interval around mu.
Correlation 10/30. Relationships Between Continuous Variables Some studies measure multiple variables – Any paired-sample experiment – Training & testing.
Correlation and Regression A BRIEF overview Correlation Coefficients l Continuous IV & DV l or dichotomous variables (code as 0-1) n mean interpreted.
Chapter 8: Bivariate Regression and Correlation
Aron, Aron, & Coups, Statistics for the Behavioral and Social Sciences: A Brief Course (3e), © 2005 Prentice Hall Chapter 3 Correlation and Prediction.
Descriptive Methods in Regression and Correlation
Linear Regression.
Inference for regression - Simple linear regression
Correlation and regression 1: Correlation Coefficient
CORRELATION & REGRESSION
Covariance and correlation
Chapter 14 – Correlation and Simple Regression Math 22 Introductory Statistics.
Chapter 13 Statistics © 2008 Pearson Addison-Wesley. All rights reserved.
Introduction to Quantitative Data Analysis (continued) Reading on Quantitative Data Analysis: Baxter and Babbie, 2004, Chapter 12.
© 2008 Pearson Addison-Wesley. All rights reserved Chapter 1 Section 13-6 Regression and Correlation.
Correlation and Linear Regression. Evaluating Relations Between Interval Level Variables Up to now you have learned to evaluate differences between the.
Statistics in Applied Science and Technology Chapter 13, Correlation and Regression Part I, Correlation (Measure of Association)
Correlation is a statistical technique that describes the degree of relationship between two variables when you have bivariate data. A bivariate distribution.
Hypothesis of Association: Correlation
Research & Statistics Looking for Conclusions. Statistics Mathematics is used to organize, summarize, and interpret mathematical data 2 types of statistics.
Essential Statistics Chapter 41 Scatterplots and Correlation.
Objectives (IPS Chapter 2.1)
Psych 230 Psychological Measurement and Statistics Pedro Wolf September 23, 2009.
TYPES OF STATISTICAL METHODS USED IN PSYCHOLOGY Statistics.
Objective: Understanding and using linear regression Answer the following questions: (c) If one house is larger in size than another, do you think it affects.
Correlation & Regression Chapter 15. Correlation It is a statistical technique that is used to measure and describe a relationship between two variables.
Describing Relationships Using Correlations. 2 More Statistical Notation Correlational analysis requires scores from two variables. X stands for the scores.
11/23/2015Slide 1 Using a combination of tables and plots from SPSS plus spreadsheets from Excel, we will show the linkage between correlation and linear.
Midterm Review Ch 7-8. Requests for Help by Chapter.
Correlation – Recap Correlation provides an estimate of how well change in ‘ x ’ causes change in ‘ y ’. The relationship has a magnitude (the r value)
Chapter 8: Simple Linear Regression Yang Zhenlin.
Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many.
Summarizing Data Graphical Methods. Histogram Stem-Leaf Diagram Grouped Freq Table Box-whisker Plot.
Week 2 Normal Distributions, Scatter Plots, Regression and Random.
MM150 ~ Unit 9 Statistics ~ Part II. WHAT YOU WILL LEARN Mode, median, mean, and midrange Percentiles and quartiles Range and standard deviation z-scores.
Correlation.
Computations, and the best fitting line.
Correlation 10/27.
CHAPTER 26: Inference for Regression
Chapter 2 Looking at Data— Relationships
Warsaw Summer School 2017, OSU Study Abroad Program
Presentation transcript:

Chapter 7 -Part 1 Correlation

Correlation Topics zCorrelational research – what is it and how do you do “co-relational” research? zThe three questions: yIs it a linear or curvilinear correlation? yIs it a positive or negative relationship? yHow strong is the relationship? zSolving these questions with t scores and r, the estimated correlation coefficient derived from the tx and ty scores of individuals in a random sample.

Correlational research – how to start. zTo begin a correlational study, we select a population or, far more frequently, select a random sample from a population. z(Since we use samples most of the time, for the most part, we will use the formulae and symbols for computing a correlation from a sample.) zWe then obtain two scores from each individual, one score on each of two variables. These are usually variables that we think might be related to each other for interesting reasons). We call one variable X and the other Y.

Correlational research: comparing t X & t Y scores zWe translate the raw scores on the X variable to t scores (called t X scores) and raw scores on the Y variable to t Y scores. ySo each individual has a pair of scores, a t X score and a t Y score. zYou determine how similar or different the t X and t Y scores in the pairs are, on the average, by subtracting t Y from t X, then squaring, summing, and averaging the t X and t Y differences.

The estimated correlation coefficient, Pearson’s r zWith a simple formula, you transform the average squared differences between the t scores to Pearson’s correlation coefficient, r zPearson’s r indicates (with a single number), both the direction and strength of the relationship between the two variables in your sample. zr also estimates the correlation in the population from which the sample was drawn yIn Ch. 8, you will learn when you can use r that way.

Going from pairs of raw scores to r: Linearity - A preliminary question. zOnce you have scores on two variables, you ask, “Is this a linear or curvilinear relationship?” zPsychology is a relatively new science and this is an intro stat course yFor both reasons, you will only learn how to deal with linear relationships between two variables and save correlation with three or more variables and curvilinear relationships for grad school. BUT YOU MUST KNOW WHAT A LINEAR RELATIONSHIP IS, AND HOW TO RECOGNIZE A NONLINEAR (CURVILINEAR) CORRELATION.

Linearity vs. Curvilinearity zIn a linear relationship, as scores on one variable go from low to high, scores on the other variable either generally increase or generally decrease. In a curvilinear relationship, as scores on one variable go from low to high, scores on the other variable change directions. They can go 1.)down and then up, 2.) up and then down, 3.) up and down and then up again, 4.) up or down then flat. ETC.

Examples of linear relationships. zFor example, think of the relationship of the size of a pleasure boat (X) and its cost (Y). As one variable (boat size) increases, scores on the other variable (cost) also increase. zAnother example of a linear relationship: the relationship between the size of a car and the number of miles per gallon it gets. In general, as cars get gradually larger (X), they tend to get fewer miles per gallon (Y).

A curvilinear relationship zIn a curvilinear relationship, as scores on the X variable go gradually from low to high, the Y variable changes direction. zFor example, think of the relationship between age (X) and height (Y). zAs age increases from 0-14 or so, height increases also. zBut then people stop growing. As age increases, height stays the same. zThus the Y variable, height, changes direction. It goes from gradually rising to flat. zIf you graph age and height, the best fitting line is a curved line.

Correlation Characteristics: Which line best shows the relationship between age (X) and height (Y) Linear vs Curvilinear

Another non-linear relationship: shortstops and linemen: great shortstops may be too small to be great football lineman. Football potential Terrible Average Very Good Excellent Good Poor Baseball skill Terrible Very Poor Poor Average Good Very Good Excellent David Ben Ed Frank Chuck Al George Is this a linear relationship?

Plot the dots! zTo check whether a relationship is linear, make a graph and place the scores on it. zThat’s what I mean by “Plot the dots.” zIf you really want to know what is going on with data, Plot the dots! zHere is a graph for the baseball skills and football potential data.

When you plot the dots, is this linear? * Ben* Ed * Frank * Chuck * Al * David * George Excellent Terrible Very Good Good Average Poor Very Poor ExcellentTerribleVery GoodGoodAveragePoorVery Poor Football Skill Baseball Skill NO! It is best described by a curved line. It is a curvilinear relationship!

After you know a correlation is linear, there are other two questions: Direction and Strength of a correlation. But first, a definition of high and low scores. zDefinition of high and low scores: yHigh scores are scores above the mean. They are represented by positive t scores. yLow scores are scores below the mean of each variable. They are represented by negative t scores.

Positive relationships zIn a positive relationship, as X scores gradually increase, Y scores tend to increase as well. Example: The longer a sailboat is, the more it tends to cost. As length goes up, price tends to go up. zIn a positive correlation, X and Y scores tend to be on the same side of their respective means. zAs a result, the t X and t Y scores tend to be similar and the difference between them (t X – t Y ) tends to be small. zSince (t X – t Y ) is small, the squared difference between them, (t X – t Y ) 2 also tends to be small

Graphing a positive relationship. zIn a positive correlation high scores on X tend to go with high scores on Y. On a graph, as the line runs from left to right, scores increase on the X axis. At the same time, Y scores also generally get higher. So, the line will tend to rise as it runs. zRemember from math, slope equals how far a line rises on the Y axis for each unit it moves from left to right or “runs” along the X axis. zIf a line rises from left to right, “rise” is positive. Run is always positive. So a positive rise divided by an (always) positive run results in a positive slope. (That’s why we call it a “positive” correlation.)

Positive vs Negative scatterplot Negative relationship Positive relationship

Graphic display of a strong POSITIVE correlation

Negative relationships zIn a negative relationship, as X scores gradually increase, Y scores tend to decrease. Example: The more years a sailboat is used, the less it tends to cost. As use goes up, price tends to go down. zIn a negative correlation, X and Y scores tend to be on opposite sides of their respective means. zAs a result, the t X and t Y scores tend to be dissimilar and the difference between them (t X – t Y ) tends to be large. zSince (t X – t Y ) is large, the squared difference between them, (t X – t Y ) 2 also tends to be large.

Graphing a negative relationship zIn a negative correlation, high scores on X tend to go with low scores on Y. On a graph, as the line runs from left to right, scores increase on the X axis. At the same time, Y scores get lower. So, the line will tend to fall as it runs. zRemember from math, slope equals how far a line rises on the Y axis for each unit it moves from left to right or “runs” along the X axis. zIf a line falls from left to right, “rise” is negative. Run is always positive. So a negative rise divided by an (always) positive run results in a negative slope. (That’s why we call it a “negative” correlation.)

Positive vs Negative scatterplot Negative relationship Positive relationship

Summary: zWhen t scores are consistently more similar than different, we have a positive correlation. On a graph the dots will rise from your left to your right. zWhen t scores are consistently more different than similar, we have a negative correlation. On a graph the dots will fall from your left to your right.

Positive vs Negative scatterplot Negative relationship Positive relationship

How strong is the relationship between the t X and t Y scores? zHere the question is about the consistency with which t X and t Y scores are either similar or dissimilar.

t scores: sign and size zThere are two aspects to the consistency of the relationship between t X and t Y scores. yFirst, are the t scores consistently of the same sign (positive correlation) or opposite signs (negative correlation). yIf they are almost always one way or the other, you have at least a moderately strong relationship. yOn the other hand, if you sometimes see t scores on the same side of the mean and sometimes on opposite sides, you have a relatively weak correlation.

t scores: sign and size zIf there is a consistent pattern of same signed t scores (positive correlation) or a consistent pattern of opposite signed t scores (negative correlation), then whether the t X and t Y scores are about the same distance from the mean comes into play. zThe large majority of t scores (usually well over 95%, range from –2.50 to zGiven a consistent positive or negative correlation, the more similar in size the t scores, the stronger the correlation.

Positive correlations: zPerfect: t X and t Y scores are all the same sign and are identical in size. zStrong: t X and t Y scores are almost all the same sign and are fairly similar in size. zModerate: t X and t Y scores are predominately the same sign. This is especially true for pairs in which one of the values is one or more standard deviations from the mean. Size may be fairly dissimilar. zWeak: t X and t Y scores are a little more often the same sign than opposite in sign. Nothing can be said about size.

Negative correlations: zPerfect: t X and t Y scores are all of the opposite sign and are identical in size. zStrong: t X and t Y scores are almost all of opposite sign and are fairly similar in size. zModerate: t X and t Y scores are predominately opposite in sign. This is especially true for pairs in which one of the values is one or more standard deviations from the mean. Size may be fairly dissimilar. zWeak: t X and t Y scores are a little more often of opposite signs than the same in sign. Nothing can be said about size.

Unrelated (independent) variables zWhen the size and sign of the t X scores bears no relationship to the size and sign of the t Y scores, the variables are unrelated. zWe also can call the variables “independent of” or “orthogonal to” each other. The three terms, unrelated, independent and orthogonal are synonymous in this context.

Graphing it on t axes: The strength of a relationship tells us approximately how the dots representing pairs of t scores will fall around a best fitting line. zPerfect - scores fall exactly on a straight line whose slope will be or –1.00. zStrong - most scores fall near the line whose slope will be close to or zModerate - some are near the line, some not. The slope of the line will be close to or

Graphing it on t axes: The strength of a relationship tells us approximately how the dots representing pairs of t scores will fall around a best fitting line. zWeak – some scores fall fairly close to the line, but others fall quite far from it. The slope of the line will be close to or zIndependent - the scores are not close to the line and form a circular or square pattern. The best fitting line will be the X axis, a line with a slope of

Strength of a relationship Perfect

Strength of a relationship Very Strong

Strength of a relationship Moderate

Strength of a relationship Independent

What is this relationship?

What is this?

What is this?

What is this?

Computing the correlation coefficient.

Comparing apples to oranges? Use Z or t scores! zYou can use correlation to look for the relationship between ANY two values that you can measure of a single subject. zHowever, there may not be any relationship (independent). zA correlation tells us if scores are consistently similar on two measures, consistently different from each other, or have no real pattern

Comparing apples to oranges? Use t scores! zTo compare scores on two different variables, you transform them into Z X and Z Y scores if you are studying a population or t X and t Y scores if you have a sample. zZ X and Z Y scores (or t X and t Y scores) can be directly compared to each other to see whether they are consistently similar, consistently quite different, or show no consistent pattern of similarity or difference

Comparing variables zAnxiety symptoms, e.g., heartbeat, with number of hours driving to class. zHat size with drawing ability. zMath ability with verbal ability. zNumber of children with IQ. zTurn them all into Z or t scores

Pearson’s Correlation Coefficient zcoefficient - noun, a number that serves as a measure of some property. zThe correlation coefficient indexes BOTH the consistency and direction of a correlation with a single number

Pearson’s rho zPearson’s rho (  ) is the parameter that characterizes the strength and direction of a linear relationship (and only a linear relationship) between two variables. To compute rho, you must have the entire population. Then you can compute sigma, mu, Z scores and rho. zThe formula: rho= 1 -(1/2  ( Z X - Z Y ) 2 / (N P )) where N P is the number of pairs of Z scores in the population z In English: The correlation coefficient equals 1 minus half the average squared distance between the Z scores.

Pearson’s rho zWhen you have a perfect positive correlation, the Z scores will be identical in size and sign. So the average squared distance will be zero and rho = /2(0.000) = zWhen you have a perfect negative correlation, the Z scores will be identical in size and opposite in sign. It can be proven algebraically that the average squared distance in that case will be 4.000: rho = /2(4.000) = z When you have two totally independent variables, the average squared distance will be (halfway between and 4.000). Thus, rho = /2(2.000) = 0.000

Pearson’s Correlation Coefficient zThus, rho varies from (perfect negative correlation to (independent variables) to (perfect positive correlation). zA negative value indicates a negative relationship; a positive value indicates a positive relationship. zValues of r close to or indicate a strong (consistent) relationship; values close to indicate a weak (inconsistent) or independent relationship.

Estimating rho with r zComputing rho involves finding the actual average squared distance between the Z X and Z Y scores in the whole population. zIn computing r, we are estimating rho.

The formula for r zPearsons r is a least squares, unbiased estimate of rho, based on the relationships found between t X and t Y scores in a random sample. zr =1 - (1/2  ( t X - t Y ) 2 / (n P - 1)) where n P -1 equals one less than the number of pairs of t scores in the sample. yIn English: Pearson’s r equals minus half the estimated average squared difference between the Z scores in the population based on squared differences between the t scores in the sample.

Look at those formulae again. zrho= 1 -(1/2  (Z X - Z Y ) 2 / (N P )) where N P is the number of pairs of Z scores in the population z  (Z X - Z Y ) 2 / (N P ) is the average squared distance between the Z scores. zThe rest of the formula, simply transforms the average squared distance between the Z scores into a variable that goes from to –1.000.

Look at those formulae again. r =1 - (1/2  (t X - t Y ) 2 / (n P - 1)) where n P -1 equals one less than the number of pairs of t scores in the sample. REMEMBER, t scores are estimated Z scores z.  (t X - t Y ) 2 / (n P - 1)) is a least squared, unbiased estimate of the average squared difference between the Z scores in the population based on the differences between the t X and t Y scores in a random sample. zThe rest of the formula, simply transforms the estimated average squared distance between the Z scores into a variable that goes from to –1.000.

Thus, r, the least squared, unbiased estimate of rho, is basically an estimate of the average squared difference between the Z X and Z Y scores in the population transformed into a variable that goes from to

Similarities of r and rho zr and rho vary from to zFor both r and rho, a negative value indicates a negative relationship; a positive value indicates a positive relationship. zValues of r or rho close to or indicate a strong (consistent) relationship; values close to indicate a weak (inconsistent) or independent relationship.

Since we almost always are studying random samples, not populations, we almost always compute Pearson’s r, not Pearson’s rho.

r, strength and direction Perfect, positive+1.00 Strong, positive+.75 Moderate, positive+.50 Weak, positive+.25 Independent.00 Weak, negative -.25 Moderate, negative -.50 Strong, negative -.75 Perfect, negative -1.00

Calculating Pearson’s r zSelect a random sample from a population; obtain scores on two variables, which we will call X and Y. zConvert all the scores into t scores.

Calculating Pearson’s r zFirst, subtract the t Y score from the t X score in each pair. zThen square all of the differences and add them up, that is,  (t X - t Y ) 2.

Calculating Pearson’s r zEstimate the average squared distance between Z X and Z Y by dividing by the sum of squared differences between the t scores by (n P - 1).  ( t X - t Y ) 2 / (n P - 1) zTo turn this estimate into Pearson’s r, use the formula r =1 - (1/2  ( t X - t Y ) 2 / (n P - 1))

Example: Calculate t scores for X DATA  X=30 N= 5 X=6.00 MS W = 40.00/(5-1) = 10 s X = 3.16 (X - X) X - X t x =(X-X)/ s SS W = 40.00

Calculate t scores for Y DATA  Y=55 N= 5 Y=11.00 MS W = 10.00/(5-1) = 2.50 s Y = 1.58 (Y - Y) Y - Y (t y =Y - Y) / s SS W = 10.00

Calculate r t Y t X t X - t Y (t X - t Y )  (t X - t Y ) 2 / (n P - 1)=0.200 r = (1/2 * (  (t X - t Y ) 2 / (n P - 1))) r = (1/2 *.200) = =.900  (t X - t Y ) 2 =0.80 This is a very strong, positive relationship.

By the way - True graphs. zCh.7 has true graphs, displays in which each dot stands for a score on two (in this case) or more (in more advanced cases) variables. zIn Ch. 1 through Ch. 6, most of the figures have represented the frequency of scores on a single variable. zFormally, displays of frequencies are figures, but they are not graphs.