 # S519: Evaluation of Information Systems Social Statistics Ch5: Correlation.

## Presentation on theme: "S519: Evaluation of Information Systems Social Statistics Ch5: Correlation."— Presentation transcript:

S519: Evaluation of Information Systems Social Statistics Ch5: Correlation

This week What is correlation? How to compute? How to interpret?

Correlation Coefficients The relations between two variables How the value of one variable changes when the value of another variable changes A correlation coefficient is a numerical index to reflect the relationship between two variables. Range: -1 ~ +1 Bivariate correlation (for two variables)

Correlation Coefficients Parametric Pearson product-moment correlation (named for inventor Karl Pearson) Non-parametric Spearman’s rank correlation Kendall tau rank correlation coefficient

Pearson correlation coefficient For two variables which are continuous in nature Height, age, test score, income But not for discrete or categorical variables Race, political affiliation, social class, rank R xy is the correlation between variable X and variable Y

Types of correlation coefficients Direct correlation (positive correlation): If both variables change in the same direction Indirect correlation (negative correlation): If both variables change in opposite directions See table 5.1 (S-p112) -0.70 and +0.5, which is stronger?

Pearson product-moment correlation coefficient The correlation coefficient between X and Y n the size of the sample X the individual’s score on the X variable Ythe individual’s score on the Y variable XYthe product of each X score times its corresponding Y score X 2 the individual X score, squared Y 2 the individual Y score, squared

Exercise Calculate Pearson correlation coefficient XY 23 42 56 65 43 76 85 54 64 75 1.Is variable X and variable Y correlated? 2. What does this correlated mean?

Using Excel to calculate CORREL function Or Pearson function

Visualizing a correlation Scatterplot or scattergram XY 23 42 56 65 43 76 85 54 64 75 X Y

Visualizing a correlation

Direct (positive) correlation r =1, a perfect direct (or positive) correlation In real life case, 0.7 and 0.8 could be the highest you will see

Indirect (or negative) correlation Strength and direction are important

Excel Scatterplot Four sets of data with the same correlation of 0.816

Linear correlation Linear correlation means that X and Y are in one straight line Curvlilinear correlation Age and memory

More than 2 variables? incomeeducationattitudevote 741901311 809311232 813141142 730891152 620231132 612171042 845261151 872511141 626591252 764501062 705121272 78858961 786281371 862121482 74962982 588281194 614711085 786211275 60071984 How to calculate the correlation coefficient? 1.CORREL() 2.Correlation in data analysis toolset

More than 2 variables? Correlation matrix IncomeEducationAttitudeVote Income1.000.35-0.190.51 Education1.00-0.210.43 Attitude1.000.55 Vote1.00

Excel Data Analysis tool - correlation

Meaning of Correlation coefficient Correlation value: - finite number ~ + finite number Correlation coefficient value: -1.00 ~ +1.00 r xy valueInterpretation 0.8 ~ 1.0Very strong relationship (share most of the things in common) 0.6 ~0.8Strong relationship (share many things in common) 0.4 ~ 0.6Moderate relationship (share something in common) 0.2 ~ 0.4Weak relationship (share a little in common) 0.0 ~ 0.2Weak or no relationship (share very little or nothing in common)

Coefficient of determination Coefficient of determination: The percentage of variance in one variable that is accounted for by the variance in the other variable. = square of coefficient 49% of the variance in GPA can be explained by the variance in studying time

Coefficient of nondetermination The amount of unexplained variance is called the coefficient of undetermination (coefficient of alienation) correlationdeterminationinterpretation 00 0.50.25 0.90.81

Ice cream and crime In a small town in Greece, The local police found the direct correlation between ice cream and crime

Correlation vs. causality The correlation represents the association between two or more variables It has nothing to do with causality (there is no cause relation between two correlated variables) Ices cream and crime are correlated, but Ices cream does not cause crime