Download presentation
Presentation is loading. Please wait.
Published byDale Anderson Modified over 8 years ago
1
EC339: Lecture 3 Chapter 2: Correlation and univariate analysis: A step back
2
Histograms Way more important than you might think! Look at data’s range Ex: sample has ages 18-64 Divide into bins There are formulas… but for all intents and purposes… make it look nice. Bin widths should be round numbers if possible Easier in SPSS than Excel (First, create new dataset) Go through ‘SATHist’ and ‘3DHist’
3
CorrelationCorrelation Data in SPSS Excel Histograms in ‘SATHist’ sheet Create an SPSS *.sav file with this data Use Chartbuilder to create a histogram Double-click on result, and go to the “Elements Add distribution curve”
4
This histogram is done using “Histogram” in SPSS, the next two are done using “Histogram Percent” which relates much better to the normal curve… which we will be using repeatedly.
5
Only need to know the mean and standard deviation to plot ANY normal curve
7
Points created in Excel using Normal equation. See my version of ‘SATScatter’
8
Open Correlation.xlsCorrelation.xls SATHist tab Comparing Verbal and Math SAT entrance scores for Wabash College Do you think there should be a relationship? Positive or Negative? Which has higher average? What does the standard deviation tell you?
9
Correlation Using this equation: Open the “Wine_Exercise.xls” Spreadsheet and complete this table. With this data, you should also create a scatter plot, show the trendline with regression line equation, and calculate the SD line slope and intercept. We will calculate the regression line slope and intercept when we go over chapter 4. You can do what the spreadsheet says now, and we will get back to it later. See the [Correlation.xls]corr sheet to walk through another example of calculating correlation coefficients.Wine_Exercise.xls
10
SATScatter Sheet What does the picture tell you about the relationship? SD (Standard Deviations) Line If you increase x by one SD, and increase y by one SD. Slope has sign of correlation coefficient Passes through point of averages Remember “point-slope” method to find the equation of a line Average x and y lines meet at point of averages
11
Extreme Sheet -1 < r < 1 Correlation is BOUND between -1 and 1 Sign of the relationship: positive or negative How strong is the relationship Look for “cigar-shaped” cloud Use the Patterns sheet to play with different correlation coefficients. Note that you don’t get a pronounced “cigar-shaped” cloud until r is 0.9 or higher.
12
3D Histogram: Multivariate Analysis
13
Correlation r is used to measure the degree of linear association between two variables, but it is not perfect. A high r should never be used to infer causation and r may do a poor job of summarizing the relationship. r is the sample estimate of ρ “rho” for the population value Association is not causation
14
Correlation Dangers Twice the r, does not mean twice as much clustering r doesn’t tell you about the slope of a relationship Misleading summary—exactly in the same way that the average and SD are sometimes not enough to describe a list of numbers (e.g., unsymmetrical histogram or outliers): See Patterns sheet View misleading correlations in corr sheet
15
Misleading r: Patterns Worksheet Show misleading r Cycle through r You can change parameters yourself to see how these values might change
16
SD Line: Has positive slope, if r is positive (negative if r is negative). The slope is std(math)/std(verbal) in this case. (std(y)/std(x) generally). The SD line goes THROUGH the intersection of the means. SPSS Version of SATScatter
17
Costa Rica Example (CRExample) In this example, what does the SDLine column signify? Does this SD Line look like a good ‘fit’ of the data?
18
Aggregation Problem a.k.a. Ecological Correlation Ex: If you average data by some grouping, you obtain a different correlation than if you take the correlation of individuals. (almost always…) Correlation at group level suppresses individual variation See [EcolCorr.xls] and [EcolCorrCPS.xls] Walk through live sheet (F9 is the key here)
19
EcolCorrCPS Data averaged at the state level Individual Data Note the vastly different correlations between earnings and education and especially age and earnings. Grouping your data can be VERY misleading.
20
Correlation Lab Open CorrelationLab.docCorrelationLab.doc Use Associated Files StockReturns.xls Hitters1999.xls
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.