Presentation is loading. Please wait.

Presentation is loading. Please wait.

EC339: Lecture 3 Chapter 2: Correlation and univariate analysis: A step back.

Similar presentations


Presentation on theme: "EC339: Lecture 3 Chapter 2: Correlation and univariate analysis: A step back."— Presentation transcript:

1 EC339: Lecture 3 Chapter 2: Correlation and univariate analysis: A step back

2 Histograms  Way more important than you might think!  Look at data’s range Ex: sample has ages 18-64  Divide into bins There are formulas… but for all intents and purposes… make it look nice. Bin widths should be round numbers if possible Easier in SPSS than Excel (First, create new dataset) Go through ‘SATHist’ and ‘3DHist’

3 CorrelationCorrelation Data in SPSS  Excel Histograms in ‘SATHist’ sheet  Create an SPSS *.sav file with this data  Use Chartbuilder to create a histogram  Double-click on result, and go to the “Elements  Add distribution curve”

4 This histogram is done using “Histogram” in SPSS, the next two are done using “Histogram Percent” which relates much better to the normal curve… which we will be using repeatedly.

5 Only need to know the mean and standard deviation to plot ANY normal curve

6

7 Points created in Excel using Normal equation. See my version of ‘SATScatter’

8 Open Correlation.xlsCorrelation.xls  SATHist tab  Comparing Verbal and Math SAT entrance scores for Wabash College Do you think there should be a relationship?  Positive or Negative? Which has higher average? What does the standard deviation tell you?

9 Correlation Using this equation: Open the “Wine_Exercise.xls” Spreadsheet and complete this table. With this data, you should also create a scatter plot, show the trendline with regression line equation, and calculate the SD line slope and intercept. We will calculate the regression line slope and intercept when we go over chapter 4. You can do what the spreadsheet says now, and we will get back to it later. See the [Correlation.xls]corr sheet to walk through another example of calculating correlation coefficients.Wine_Exercise.xls

10 SATScatter Sheet  What does the picture tell you about the relationship?  SD (Standard Deviations) Line If you increase x by one SD, and increase y by one SD. Slope has sign of correlation coefficient Passes through point of averages Remember “point-slope” method to find the equation of a line  Average x and y lines meet at point of averages

11 Extreme Sheet  -1 < r < 1  Correlation is BOUND between -1 and 1  Sign of the relationship: positive or negative  How strong is the relationship  Look for “cigar-shaped” cloud Use the Patterns sheet to play with different correlation coefficients. Note that you don’t get a pronounced “cigar-shaped” cloud until r is 0.9 or higher.

12 3D Histogram: Multivariate Analysis

13 Correlation  r is used to measure the degree of linear association between two variables, but it is not perfect. A high r should never be used to infer causation and r may do a poor job of summarizing the relationship.  r is the sample estimate of ρ “rho” for the population value  Association is not causation

14 Correlation Dangers  Twice the r, does not mean twice as much clustering  r doesn’t tell you about the slope of a relationship  Misleading summary—exactly in the same way that the average and SD are sometimes not enough to describe a list of numbers (e.g., unsymmetrical histogram or outliers): See Patterns sheet  View misleading correlations in corr sheet

15 Misleading r: Patterns Worksheet  Show misleading r  Cycle through r  You can change parameters yourself to see how these values might change

16 SD Line: Has positive slope, if r is positive (negative if r is negative). The slope is std(math)/std(verbal) in this case. (std(y)/std(x) generally). The SD line goes THROUGH the intersection of the means. SPSS Version of SATScatter

17 Costa Rica Example (CRExample)  In this example, what does the SDLine column signify?  Does this SD Line look like a good ‘fit’ of the data?

18 Aggregation Problem  a.k.a. Ecological Correlation Ex: If you average data by some grouping, you obtain a different correlation than if you take the correlation of individuals. (almost always…) Correlation at group level suppresses individual variation See [EcolCorr.xls] and [EcolCorrCPS.xls]  Walk through live sheet (F9 is the key here)

19 EcolCorrCPS Data averaged at the state level Individual Data Note the vastly different correlations between earnings and education and especially age and earnings. Grouping your data can be VERY misleading.

20 Correlation Lab  Open CorrelationLab.docCorrelationLab.doc  Use Associated Files StockReturns.xls Hitters1999.xls


Download ppt "EC339: Lecture 3 Chapter 2: Correlation and univariate analysis: A step back."

Similar presentations


Ads by Google