Lecture 11 PY 427 Statistics 1 Fall 2006 Kin Ching Kong, Ph.D
Published byModified over 5 years ago
Presentation on theme: "Lecture 11 PY 427 Statistics 1 Fall 2006 Kin Ching Kong, Ph.D"— Presentation transcript:
1 Lecture 11 PY 427 Statistics 1 Fall 2006 Kin Ching Kong, Ph.D Chicago School of Professional PsychologyLecture 11Kin Ching Kong, Ph.D
2 Agenda Correlation The Pearson Correlation IntroductionThe Pearson CorrelationDefinitionSum of Products (SP)CalculationThe Pearson Correlation and z-scoresUses of the Pearson CorrelationInterpreting the Pearson CorrelationHypothesis Tests with the Pearson CorrelationThe Point-Biserial CorrelationThe Spearman CorrelationIntroduction to Regression
3 Correlation, Introduction measures and describes the relationship (association) between two variables (X and Y).requires two scores (X, Y) for each individual.Usually the two variables are simply observed.A scatterplot of the data displays the relationship between the two variables.Figure 15.1 of your book
4 Correlation, Three Characteristics of a Relationship A Correlation measures three characteristics of a relationship:The direction of the relationshipPositive: X and Y tend to move in the same direction.Negative: X and Y tend to go in opposite directions.Direction is identified by + and – signs.Figure 15.2 of your bookThe form of the relationshipLinear form (i.e. straight line)NonlinearThe degree of relationshipHow well the data fit the specific form being consideredHow closely the two variables associate.Represented by the numerical value of the correlationFigure 15.3 of your bookStrength of relationship: how closely the two variables assoicate.
5 The Pearson Correlation (r) The Pearson Correlation (or Pearson Product-Moment Correlation):Measures the degree and direction of the linear relationship between two variables.r = degree to which X and Y vary togetherdegree to which X and Y vary separately= covariability of X and Yvariability of X and Y separatelyr = SP/When r = + 1, every change in X is accompanied by a perfectly predictable change in Y. X and Y always vary together, thus the numerator and denominator are identical.
6 SP: the Sum of Products of Deviations Sum of Products (SP)SP: the Sum of Products of DeviationsMeasures the covariability of two variablesDefinitional Formula:SP = S (X – MX)(Y – MY)Computational Formula:SP = SXY - SXSYnn = number of pairs of scores
7 The Pearson Correlation (r), an Example r = SP/e.g. Scores Deviations Squared Deviations ProductsX Y (X–MX) (Y-MY) (X–MX)2 (Y-MY) (X–MX)(Y-MY)MX = 6 MY = 2 SSX = SSY = SP = +14r = SP/ = +14/ = +14/16 =Scatterplot of the Data (Figure 15.4)
8 Pearson Correlation & z-Scores Karl Pearson based his equation for r on the concept of z-scores.r is defined as the mean of the z-score products for X and Y:r = SzXzYnzX and zY are calculated using the population standard deviation. If using sample standard deviation, use n-1 in the above formula.When zX and zY are both positive or both negative, the product is positive.When zX and zY are of opposite sign, the product is negative.When most of the products are positive, then r is positive (i.e. as X increase, Y increase; as X decrease, Y decrease)When most of the products are negative, then r is negative (i.e. an inverse relationship between X and Y)
9 Uses of the Pearson Correlation, r PredictionWhen two variables are correlated, it is possible to use one to make predictions about the other.e.g. using SAT scores to predict college grade point average.Validityr can be used to demonstrate the validity of a new instrument/measure.e.g. The validity of a new IQ test can be demonstrated by high correlations with standardized IQ tests, performance on learning tests, problem solving ability etc.Reliabilityr can be used to determine the reliability of a measurement procedure.e.g. If an IQ test is reliable, then your IQ measured this week will correlate highly with your IQ measured 3 weeks from now.Theory VerificationMany psychological theories make predictions about relationships between two variables, which can be tested by determining the correlation between the two variables.e.g. parents’ IQ and child’s IQ
10 Interpreting Correlations Correlation Does Not Equal CausationCorrelation simply describe a relationship between two variables, it doesn’t explain why the two are related.Correlation Can be Greatly Affected by Restricted RangeFigure 15.6 of your bookTo be safe, should not generalize correlation beyond the range of data represented in the sample.Outliers (Extreme Data Points) can Greatly Influence a CorrelationFigure 15.7 of your bookYou should always look at a scatterplot of your dataStrength of the Relationship (r2)r2, the coefficient of determination, measures the proportion of variability in one variable that can be determined from it’s relationship with the other variable.e.g. let’s say r for IQ and GPA is +0.60, then 36% of the variability in GPA can be explained by differences in IQ
11 Hypothesis Testing with The Pearson Correlation Hypothesis Testing with Correlation:Use sample correlations to draw inferences about population correlations.The goal of the hypothesis test is to decide between two alternatives:The nonzero sample correlation is due to sampling error.The nonzero sample correlation reflects a real, nonzero correlation in the population.Basic Question: Does a Correlation Exists in the Population?H0: r = 0 (there is no population correlation)H1: r = 0 (there is a real correlation)Degree of Freedomdf= n - 2Table B.6To be significant, a sample correlation has to be greater than the critical value (ignore the sign)
12 Hypothesis Testing with r, an Example A researcher obtains a correlation of r = for a sample of 30 individuals. Does this sample provide sufficient evidence to conclude that there is a significant positive correlation in the population? Test with a = .05Step I: State the Hypotheses:H0: r < 0 (there is not a positive correlation)H1: r > 0 (there is a positive correlation)Step 2: Find the Critical Value:df = n – 2 = 30 – 2 = 28Critical r = 0.306Step 3: Calculate sample statistic:r = 0.321Step 4: Make a decision:Since the sample r is greater than the critical r, we reject the null hypothesis and conclude that there is a positive correlation in the population.
13 Hypothesis Testing with r, Your Turn A researcher obtained the following set of data. Is there a significant correlation between X and Y? Used alpha = .01X Y1 62 84 25 03 4
14 The Point-Biserial Correlation a special version of the Pearson correlation.used to measure the relationship between a quantitative and a dichotomous variable.The dichotomous variable is coded 0 and 1The Pearson formula is then used to calculate the point-biserial correlation.The Point-Biserial Correlation and r2The r2 used to measure effect size is directly related to the r used to measure correlation.
15 Compare Point-Biserial Correlation & t Test Table 15.1The same data, organized for an independent-measures t and for a point-biserial correlation.The t-test resultst (18) = 4.00, p <.05, r2 = 0.47, or 47% of variance in memory scores are accounted for by the treatment, i.e. mental imagery.The point-biserial correlation resultsr = SP/ = 40/ = 40/58.31r= 0.686, n = 20, p < .05r2 = (0.686)2 = 0.47, or 47% of variance in memory scores can be predicted from the variance in mental imagery.What does the two procedures evaluate?The relationship between mental imagery and memory scores.
16 The Spearman Correlation for use with data measured on an ordinal scale.for use with interval or ratio data when there is a nonlinear relationshipMeasure the consistency of relationship, independent of forme.g. consider the relationship between practice (X) and performance.One would expect increase practice to lead to improved performance, but the relationship is not expected to be linear.Figure of your book
17 The Spearman Correlation, Calculation The Data:Convert X and Y to ranks separately (if raw data are interval or ratio)When two or more scores are identical, find the mean of their ranked positions, assign this mean as the final rank for each score.The Calculation:Use the Pearson formula with the rank data.Use the simplified formula with the rank data when there is no ties among the ranks.rS = SD2n(n2 – 1) D = Rank Y – Rank X n = # of pairs of scores
18 The Spearman Correlation, An Example Converting raw scores to ranks:Raw Scores RanksX Y X Y XYFigure 15.12Using the Pearson formula:SSX = SX2 – (SX)2 = 55 – (15)2 = SSY = 10nSP = SXY – (SX)(SY) = 36 – (15)(15) = -9nrS = SP/ = -9/10 = -0.9
19 The Spearman Correlation, Examples Using the simplified formula:Raw Scores Ranks Rank DifferenceX Y X Y D D2rS = SD = 1 – 6(38) 1 – 1.9 = -0.9n(n2 – 1) (24)
20 Introduction to Regression Figure 15.13Hypothetical data showing the relationship between SAT scores and college GPAA line drawn through the middle serves several purposes:The line makes the relationship between SAT and GPA easier to see.The line identifies the center, or central tendency, of the relationship.The line can be used for prediction. The line establishes a precise, one-to-one relationship between each X and Y scores.
21 Introduction to Regression, Linear Equations The formula for a straight line:Y = a + bXa and b are constantsb is called the slope, and is the amount of change in Y per unit change in Xa is call the Y-intercept, which is the value of Y when X is zeroe.g. Your local tennis club charges a fee of $5 per hour plus an annual membership fee of $25The total cost of playing tennis in this club can be described by the linear equation:Y = 5X + 25Figure 15.14
22 Introduction to Regression, Least-Squared Is the statistical technique for finding the best-fitting straight line for a set of data. The resulting straight line is called the regression line.The Least-Squared Method to best-fit:distance = Y – YpredY = actual scoreYpred = Y score predicted by the line for each X valueFigure 15.15This distance measures the error of using the line to predict the actual scoreThe Least-Square Method defines the best-fitting line to be the line that minimizes the total squared error.
23 Introduction to Regression, the Equation Ypred = a + bXb = SP/SSX or b = r(SY/SX)a = MY – bMXThe above linear equation is called the regression equation for Y. This equation results in the least squared error between the data points and the line.
24 X Y (X–MX) (Y-MY) (X–MX)2 (X–MX)(Y-MY) Regression, an ExampleX Y (X–MX) (Y-MY) (X–MX) (X–MX)(Y-MY)MX = 5 MY = SSX = SP = 16b = SP/SSX = 16/10 = 1.6a = MY – bMX= 6 – 1.6(5) = 6 – 8 = -2The regression equation: Ypred = XFigure 15.16
25 Introduction to Regression, Prediction Using the regression equation for prediction:For a person with X = 5, what would be the predicted Y?Ypred = X= (5)= 6Cautions:The predicted value is not prefect (unless r = + 1). The amount of error depend on the magnitude of the r.The regression line should not be use to make predictions for X values that fall outside the range of values covered by the original data.