Lecture 11 PY 427 Statistics 1 Fall 2006 Kin Ching Kong, Ph.D

Slides:



Advertisements
Similar presentations
Chapter 16: Correlation. So far… We’ve focused on hypothesis testing Is the relationship we observe between x and y in our sample true generally (i.e.
Advertisements

Chapter 16: Correlation.
13- 1 Chapter Thirteen McGraw-Hill/Irwin © 2005 The McGraw-Hill Companies, Inc., All Rights Reserved.
Review ? ? ? I am examining differences in the mean between groups
Describing Relationships Using Correlation and Regression
Overview Correlation Regression -Definition
Correlation & Regression Chapter 15. Correlation statistical technique that is used to measure and describe a relationship between two variables (X and.
Chapter 15 (Ch. 13 in 2nd Can.) Association Between Variables Measured at the Interval-Ratio Level: Bivariate Correlation and Regression.
PY 427 Statistics 1Fall 2006 Kin Ching Kong, Ph.D Lecture 3 Chicago School of Professional Psychology.
Lecture 8 PY 427 Statistics 1 Fall 2006 Kin Ching Kong, Ph.D
SIMPLE LINEAR REGRESSION
Correlation and Regression
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 6: Correlation.
Correlation and Regression. Correlation What type of relationship exists between the two variables and is the correlation significant? x y Cigarettes.
Introduction to Probability and Statistics Linear Regression and Correlation.
Regression Chapter 10 Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania.
SIMPLE LINEAR REGRESSION
PY 427 Statistics 1Fall 2006 Kin Ching Kong, Ph.D Lecture 5 Chicago School of Professional Psychology.
Lecture 7 PY 427 Statistics 1 Fall 2006 Kin Ching Kong, Ph.D
PY 427 Statistics 1Fall 2006 Kin Ching Kong, Ph.D Lecture 6 Chicago School of Professional Psychology.
Correlational Designs
Correlation and Regression Analysis
Chapter 9 For Explaining Psychological Statistics, 4th ed. by B. Cohen 1 What is a Perfect Positive Linear Correlation? –It occurs when everyone has the.
Relationships Among Variables
Statistics for the Behavioral Sciences (5th ed.) Gravetter & Wallnau
(and a bit on regression)
Correlation and Regression Quantitative Methods in HPELS 440:210.
Correlation and Regression A BRIEF overview Correlation Coefficients l Continuous IV & DV l or dichotomous variables (code as 0-1) n mean interpreted.
Chapter 8: Bivariate Regression and Correlation
Lecture 16 Correlation and Coefficient of Correlation
Understanding Research Results
PRED 354 TEACH. PROBILITY & STATIS. FOR PRIMARY MATH Lesson 14 Correlation & Regression.
Chapter 12 Correlation and Regression Part III: Additional Hypothesis Tests Renee R. Ha, Ph.D. James C. Ha, Ph.D Integrative Statistics for the Social.
SIMPLE LINEAR REGRESSION
Introduction to Regression Analysis. Two Purposes Explanation –Explain (or account for) the variance in a variable (e.g., explain why children’s test.
Chapter 15 Correlation and Regression
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Section 10-1 Review and Preview.
L 1 Chapter 12 Correlational Designs EDUC 640 Dr. William M. Bauer.
Correlations. Outline What is a correlation? What is a correlation? What is a scatterplot? What is a scatterplot? What type of information is provided.
Statistics for the Behavioral Sciences, Sixth Edition by Frederick J. Gravetter and Larry B. Wallnau Copyright © 2004 by Wadsworth Publishing, a division.
Figure 15-3 (p. 512) Examples of positive and negative relationships. (a) Beer sales are positively related to temperature. (b) Coffee sales are negatively.
Investigating the Relationship between Scores
© 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license.
Elementary Statistics Correlation and Regression.
Correlation & Regression
By: Amani Albraikan.  Pearson r  Spearman rho  Linearity  Range restrictions  Outliers  Beware of spurious correlations….take care in interpretation.
Correlation & Regression Chapter 15. Correlation It is a statistical technique that is used to measure and describe a relationship between two variables.
ITEC6310 Research Methods in Information Technology Instructor: Prof. Z. Yang Course Website: c6310.htm Office:
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Chapter 10 For Explaining Psychological Statistics, 4th ed. by B. Cohen 1 A perfect correlation implies the ability to predict one score from another perfectly.
Correlations. Distinguishing Characteristics of Correlation Correlational procedures involve one sample containing all pairs of X and Y scores Correlational.
Chapter 14 Correlation and Regression
Correlation & Regression Analysis
Correlation They go together like salt and pepper… like oil and vinegar… like bread and butter… etc.
Chapter 16: Correlation. So far… We’ve focused on hypothesis testing Is the relationship we observe between x and y in our sample true generally (i.e.
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
4 basic analytical tasks in statistics: 1)Comparing scores across groups  look for differences in means 2)Cross-tabulating categoric variables  look.
Chapter 15: Correlation. Correlations: Measuring and Describing Relationships A correlation is a statistical method used to measure and describe the relationship.
©2013, The McGraw-Hill Companies, Inc. All Rights Reserved Chapter 3 Investigating the Relationship of Scores.
Correlation and Regression Q560: Experimental Methods in Cognitive Science Lecture 13.
Correlation and Simple Linear Regression
Chapter 15: Correlation.
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Introduction to Regression
Simple Linear Regression and Correlation
Introduction to Regression
Chapter 15 Correlation Copyright © 2017 Cengage Learning. All Rights Reserved.
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

Lecture 11 PY 427 Statistics 1 Fall 2006 Kin Ching Kong, Ph.D Chicago School of Professional Psychology Lecture 11 Kin Ching Kong, Ph.D

Agenda Correlation The Pearson Correlation Introduction The Pearson Correlation Definition Sum of Products (SP) Calculation The Pearson Correlation and z-scores Uses of the Pearson Correlation Interpreting the Pearson Correlation Hypothesis Tests with the Pearson Correlation The Point-Biserial Correlation The Spearman Correlation Introduction to Regression

Correlation, Introduction measures and describes the relationship (association) between two variables (X and Y). requires two scores (X, Y) for each individual. Usually the two variables are simply observed. A scatterplot of the data displays the relationship between the two variables. Figure 15.1 of your book

Correlation, Three Characteristics of a Relationship A Correlation measures three characteristics of a relationship: The direction of the relationship Positive: X and Y tend to move in the same direction. Negative: X and Y tend to go in opposite directions. Direction is identified by + and – signs. Figure 15.2 of your book The form of the relationship Linear form (i.e. straight line) Nonlinear The degree of relationship How well the data fit the specific form being considered How closely the two variables associate. Represented by the numerical value of the correlation Figure 15.3 of your book Strength of relationship: how closely the two variables assoicate.

The Pearson Correlation (r) The Pearson Correlation (or Pearson Product-Moment Correlation): Measures the degree and direction of the linear relationship between two variables. r = degree to which X and Y vary together degree to which X and Y vary separately = covariability of X and Y variability of X and Y separately r = SP/ When r = + 1, every change in X is accompanied by a perfectly predictable change in Y. X and Y always vary together, thus the numerator and denominator are identical.

SP: the Sum of Products of Deviations Sum of Products (SP) SP: the Sum of Products of Deviations Measures the covariability of two variables Definitional Formula: SP = S (X – MX)(Y – MY) Computational Formula: SP = SXY - SXSY n n = number of pairs of scores

The Pearson Correlation (r), an Example r = SP/ e.g. Scores Deviations Squared Deviations Products X Y (X–MX) (Y-MY) (X–MX)2 (Y-MY)2 (X–MX)(Y-MY) 0 1 -6 -1 36 1 +6 10 3 +4 +1 16 1 +4 4 1 -2 -1 4 1 +2 8 2 +2 0 4 0 0 8 3 +2 +1 4 1 +2 MX = 6 MY = 2 SSX = 64 SSY = 4 SP = +14 r = SP/ = +14/ = +14/16 = +0.875 Scatterplot of the Data (Figure 15.4)

Pearson Correlation & z-Scores Karl Pearson based his equation for r on the concept of z-scores. r is defined as the mean of the z-score products for X and Y: r = SzXzY n zX and zY are calculated using the population standard deviation. If using sample standard deviation, use n-1 in the above formula. When zX and zY are both positive or both negative, the product is positive. When zX and zY are of opposite sign, the product is negative. When most of the products are positive, then r is positive (i.e. as X increase, Y increase; as X decrease, Y decrease) When most of the products are negative, then r is negative (i.e. an inverse relationship between X and Y)

Uses of the Pearson Correlation, r Prediction When two variables are correlated, it is possible to use one to make predictions about the other. e.g. using SAT scores to predict college grade point average. Validity r can be used to demonstrate the validity of a new instrument/measure. e.g. The validity of a new IQ test can be demonstrated by high correlations with standardized IQ tests, performance on learning tests, problem solving ability etc. Reliability r can be used to determine the reliability of a measurement procedure. e.g. If an IQ test is reliable, then your IQ measured this week will correlate highly with your IQ measured 3 weeks from now. Theory Verification Many psychological theories make predictions about relationships between two variables, which can be tested by determining the correlation between the two variables. e.g. parents’ IQ and child’s IQ

Interpreting Correlations Correlation Does Not Equal Causation Correlation simply describe a relationship between two variables, it doesn’t explain why the two are related. Correlation Can be Greatly Affected by Restricted Range Figure 15.6 of your book To be safe, should not generalize correlation beyond the range of data represented in the sample. Outliers (Extreme Data Points) can Greatly Influence a Correlation Figure 15.7 of your book You should always look at a scatterplot of your data Strength of the Relationship (r2) r2, the coefficient of determination, measures the proportion of variability in one variable that can be determined from it’s relationship with the other variable. e.g. let’s say r for IQ and GPA is +0.60, then 36% of the variability in GPA can be explained by differences in IQ

Hypothesis Testing with The Pearson Correlation Hypothesis Testing with Correlation: Use sample correlations to draw inferences about population correlations. The goal of the hypothesis test is to decide between two alternatives: The nonzero sample correlation is due to sampling error. The nonzero sample correlation reflects a real, nonzero correlation in the population. Basic Question: Does a Correlation Exists in the Population? H0: r = 0 (there is no population correlation) H1: r = 0 (there is a real correlation) Degree of Freedom df= n - 2 Table B.6 To be significant, a sample correlation has to be greater than the critical value (ignore the sign)

Hypothesis Testing with r, an Example A researcher obtains a correlation of r = 0.321 for a sample of 30 individuals. Does this sample provide sufficient evidence to conclude that there is a significant positive correlation in the population? Test with a = .05 Step I: State the Hypotheses: H0: r < 0 (there is not a positive correlation) H1: r > 0 (there is a positive correlation) Step 2: Find the Critical Value: df = n – 2 = 30 – 2 = 28 Critical r = 0.306 Step 3: Calculate sample statistic: r = 0.321 Step 4: Make a decision: Since the sample r is greater than the critical r, we reject the null hypothesis and conclude that there is a positive correlation in the population.

Hypothesis Testing with r, Your Turn A researcher obtained the following set of data. Is there a significant correlation between X and Y? Used alpha = .01 X Y 1 6 2 8 4 2 5 0 3 4

The Point-Biserial Correlation a special version of the Pearson correlation. used to measure the relationship between a quantitative and a dichotomous variable. The dichotomous variable is coded 0 and 1 The Pearson formula is then used to calculate the point-biserial correlation. The Point-Biserial Correlation and r2 The r2 used to measure effect size is directly related to the r used to measure correlation.

Compare Point-Biserial Correlation & t Test Table 15.1 The same data, organized for an independent-measures t and for a point-biserial correlation. The t-test results t (18) = 4.00, p <.05, r2 = 0.47, or 47% of variance in memory scores are accounted for by the treatment, i.e. mental imagery. The point-biserial correlation results r = SP/ = 40/ = 40/58.31 r= 0.686, n = 20, p < .05 r2 = (0.686)2 = 0.47, or 47% of variance in memory scores can be predicted from the variance in mental imagery. What does the two procedures evaluate? The relationship between mental imagery and memory scores.

The Spearman Correlation for use with data measured on an ordinal scale. for use with interval or ratio data when there is a nonlinear relationship Measure the consistency of relationship, independent of form e.g. consider the relationship between practice (X) and performance. One would expect increase practice to lead to improved performance, but the relationship is not expected to be linear. Figure 15.10 of your book

The Spearman Correlation, Calculation The Data: Convert X and Y to ranks separately (if raw data are interval or ratio) When two or more scores are identical, find the mean of their ranked positions, assign this mean as the final rank for each score. The Calculation: Use the Pearson formula with the rank data. Use the simplified formula with the rank data when there is no ties among the ranks. rS = 1 - 6SD2 n(n2 – 1) D = Rank Y – Rank X n = # of pairs of scores

The Spearman Correlation, An Example Converting raw scores to ranks: Raw Scores Ranks X Y X Y XY 3 12 1 5 5 4 10 2 3 6 8 11 3 4 12 10 9 4 2 8 13 3 5 1 5 Figure 15.12 Using the Pearson formula: SSX = SX2 – (SX)2 = 55 – (15)2 = 10 SSY = 10 n 5 SP = SXY – (SX)(SY) = 36 – (15)(15) = -9 n 5 rS = SP/ = -9/10 = -0.9

The Spearman Correlation, Examples Using the simplified formula: Raw Scores Ranks Rank Difference X Y X Y D D2 3 12 1 5 4 16 4 10 2 3 1 1 8 11 3 4 1 1 10 9 4 2 -2 4 13 3 5 1 -4 16 rS = 1 - 6SD2 = 1 – 6(38) 1 – 1.9 = -0.9 n(n2 – 1) 5(24)

Introduction to Regression Figure 15.13 Hypothetical data showing the relationship between SAT scores and college GPA A line drawn through the middle serves several purposes: The line makes the relationship between SAT and GPA easier to see. The line identifies the center, or central tendency, of the relationship. The line can be used for prediction. The line establishes a precise, one-to-one relationship between each X and Y scores.

Introduction to Regression, Linear Equations The formula for a straight line: Y = a + bX a and b are constants b is called the slope, and is the amount of change in Y per unit change in X a is call the Y-intercept, which is the value of Y when X is zero e.g. Your local tennis club charges a fee of $5 per hour plus an annual membership fee of $25 The total cost of playing tennis in this club can be described by the linear equation: Y = 5X + 25 Figure 15.14

Introduction to Regression, Least-Squared Is the statistical technique for finding the best-fitting straight line for a set of data. The resulting straight line is called the regression line. The Least-Squared Method to best-fit: distance = Y – Ypred Y = actual score Ypred = Y score predicted by the line for each X value Figure 15.15 This distance measures the error of using the line to predict the actual score The Least-Square Method defines the best-fitting line to be the line that minimizes the total squared error.

Introduction to Regression, the Equation Ypred = a + bX b = SP/SSX or b = r(SY/SX) a = MY – bMX The above linear equation is called the regression equation for Y. This equation results in the least squared error between the data points and the line.

X Y (X–MX) (Y-MY) (X–MX)2 (X–MX)(Y-MY) Regression, an Example X Y (X–MX) (Y-MY) (X–MX)2 (X–MX)(Y-MY) 7 11 2 5 4 10 4 3 -1 -3 1 3 6 5 1 -1 1 -1 3 4 -2 -2 4 4 5 7 0 1 0 0 MX = 5 MY = 6 SSX = 10 SP = 16 b = SP/SSX = 16/10 = 1.6 a = MY – bMX = 6 – 1.6(5) = 6 – 8 = -2 The regression equation: Ypred = -2 + 1.6X Figure 15.16

Introduction to Regression, Prediction Using the regression equation for prediction: For a person with X = 5, what would be the predicted Y? Ypred = -2 + 1.6X = -2 + 1.6(5) = 6 Cautions: The predicted value is not prefect (unless r = + 1). The amount of error depend on the magnitude of the r. The regression line should not be use to make predictions for X values that fall outside the range of values covered by the original data.