Review Regression and Pearson’s R SPSS Demo

Slides:



Advertisements
Similar presentations
Lesson 10: Linear Regression and Correlation
Advertisements

13- 1 Chapter Thirteen McGraw-Hill/Irwin © 2005 The McGraw-Hill Companies, Inc., All Rights Reserved.
Learning Objectives Copyright © 2002 South-Western/Thomson Learning Data Analysis: Bivariate Correlation and Regression CHAPTER sixteen.
Learning Objectives Copyright © 2004 John Wiley & Sons, Inc. Bivariate Correlation and Regression CHAPTER Thirteen.
Association for Interval Level Variables
Correlation and Regression
Overview Correlation Regression -Definition
Correlation CJ 526 Statistical Analysis in Criminal Justice.
Chapter 15 (Ch. 13 in 2nd Can.) Association Between Variables Measured at the Interval-Ratio Level: Bivariate Correlation and Regression.
PPA 501 – Analytical Methods in Administration Lecture 8 – Linear Regression and Correlation.
PPA 415 – Research Methods in Public Administration
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.
Linear Regression and Correlation Analysis
Correlation and Regression. Correlation What type of relationship exists between the two variables and is the correlation significant? x y Cigarettes.
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Elaboration Elaboration extends our knowledge about an association to see if it continues or changes under different situations, that is, when you introduce.
Basic Statistical Concepts Part II Psych 231: Research Methods in Psychology.
Correlation and Regression. Relationships between variables Example: Suppose that you notice that the more you study for an exam, the better your score.
Correlation 1. Correlation - degree to which variables are associated or covary. (Changes in the value of one tends to be associated with changes in the.
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
Leon-Guerrero and Frankfort-Nachmias,
Week 9: Chapter 15, 17 (and 16) Association Between Variables Measured at the Interval-Ratio Level The Procedure in Steps.
Correlation and Linear Regression
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 13 Linear Regression and Correlation.
Chapter 8: Bivariate Regression and Correlation
Example of Simple and Multiple Regression
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
Chapter 12 Correlation and Regression Part III: Additional Hypothesis Tests Renee R. Ha, Ph.D. James C. Ha, Ph.D Integrative Statistics for the Social.
Introduction to Linear Regression and Correlation Analysis
Chapter 11 Simple Regression
ASSOCIATION BETWEEN INTERVAL-RATIO VARIABLES
Chapter 6 & 7 Linear Regression & Correlation
Measures of relationship Dr. Omar Al Jadaan. Agenda Correlation – Need – meaning, simple linear regression – analysis – prediction.
Agenda Review Association for Nominal/Ordinal Data –  2 Based Measures, PRE measures Introduce Association Measures for I-R data –Regression, Pearson’s.
Chapter 8 – 1 Chapter 8: Bivariate Regression and Correlation Overview The Scatter Diagram Two Examples: Education & Prestige Correlation Coefficient Bivariate.
Introduction to Linear Regression
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
Section 5.2: Linear Regression: Fitting a Line to Bivariate Data.
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
Examining Relationships in Quantitative Research
Agenda Review Homework 5 Review Statistical Control Do Homework 6 (In-class group style)
Correlation & Regression Chapter 15. Correlation It is a statistical technique that is used to measure and describe a relationship between two variables.
PS 225 Lecture 20 Linear Regression Equation and Prediction.
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Chapter 16 Data Analysis: Testing for Associations.
Correlation & Regression Correlation does not specify which variable is the IV & which is the DV.  Simply states that two variables are correlated. Hr:There.
Political Science 30: Political Inquiry. Linear Regression II: Making Sense of Regression Results Interpreting SPSS regression output Coefficients for.
Examining Relationships in Quantitative Research
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
CHAPTER 5 CORRELATION & LINEAR REGRESSION. GOAL : Understand and interpret the terms dependent variable and independent variable. Draw a scatter diagram.
Regression Analysis. 1. To comprehend the nature of correlation analysis. 2. To understand bivariate regression analysis. 3. To become aware of the coefficient.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.
Lecture 10 Introduction to Linear Regression and Correlation Analysis.
© The McGraw-Hill Companies, Inc., Chapter 10 Correlation and Regression.
Chapter 15 Association Between Variables Measured at the Interval-Ratio Level.
Regression and Correlation
REGRESSION G&W p
ASSOCIATION Practical significance vs. statistical significance
Linear Regression and Correlation Analysis
Correlation and regression
Multiple Regression.
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
Correlation and Regression
Warsaw Summer School 2017, OSU Study Abroad Program
Chapter Thirteen McGraw-Hill/Irwin
Presentation transcript:

Review Regression and Pearson’s R SPSS Demo

The Regression line Properties: The sum of positive and negative vertical distances from it is zero The standard deviation of the points from the line is at a minimum The line passes through the point (mean x, mean y) Bivariate Regression Applet BOARD: The “least squares” line or the “line of best fit” [Use applet to illustrate first 2 points under “properties”] THIS SITE IS AVAILABLE ON THE COURSE WEB PAGE. LATTER POINT: IF YOU HAD A REGRESSION LINE OF AGE & INCOME, THAT LINE WOULD PASS THROUGH THE MEAN AGE & THE MEAN INCOME. MEAN AGE = 48; MEAN INCOME =$38,000

Regression Line Formula Y = a + bX Y = score on the dependent variable X = the score on the independent variable a = the Y intercept – point where the regression line crosses the Y axis b = the slope of the regression line SLOPE – the amount of change produced in Y by a unit change in X; or, a measure of the effect of the X variable on the Y Now, regarding the regression line, you can “eyeball it” and draw a line that seems like it has the best fit for your set of observations. However, (surprise, surprise), given this is statistics, you need a formula to come up with a line that best fits a pattern of dots on the scattergram: Y = a + bx Y = score on the dependent variable a = the Y intercept (the point where the regression line crosses the Y axis) b = the slope of the regression line (that is, the amount of change produced in Y by a unit change in X X = score on the independent variable Slope – the amount of change produced in Y by a unit change in X; or, a measure of the effect of the X variable on the Y variable. REGRESSION LINE FORMULA IS ONE YOU SHOULD MEMORIZE. BOARD: The weaker the effect of X on Y (the weaker the association between the variables), the lower the value of the slope. If 2 variables are unrelated, the regression line would be parallel to the X axis, and b would be zero.

Regression Line Formula Y = a + bX y-intercept (a) = 102 slope (b) = .9 Y = 102 + (.9)X This information can be used to predict weight from height. Example: What is the predicted weight of a male who is 70” tall (5’10”)? Y = 102 + (.9)(70) = 102 + 63 = 165 pounds EXAMPLE WITH HEIGHT WEIGHT…. SAMPLE OF MALES INTERPRETATION OF THIS SLOPE: FOR EACH 1-INCH INCREASE IN HEIGHT, WEIGHT INCREASES 0.9 OF A POUND. QUESTIONS? PUT THIS ON THE BOARD: Y = 102 + (.9)(70) = 102 + 63 = 165 pounds

The Slope (b) – A Strength & A Weakness We know that b indicates the change in Y for a unit change in X, but b is not really a good measure of strength Weakness It is unbounded (can be >1 or <-1) making it hard to interpret The size of b is influenced by the scale that each variable is measured on The size of “b” is influenced by the scale that each variable is measured on. Notice, for example, how this was the case: For example if you are predicting the number of self-reported crimes from yearly income. A very large increase in X (income) is likely to correspond with a very small increase in Y (number of crimes). Now, if you measured income as dollars earned per day, and predicted the number of self-reported crimes in the past 6 months, the “b” value would probably increase simply due to the change in scale. In this way, slope (b) has weaknesses similar to phi.

Pearson’s r Correlation Coefficient By contrast, Pearson’s r is bounded a value of 0.0 indicates no linear relationship and a value of +/-1.00 indicates a perfect linear relationship To avoid these limitations, we calculate Pearson’s r – the correlation coefficient So, Pearson’s r is superior to b (the slope) for discussing the association between two interval-ratio variables in that Pearson’s r is bounded (score of -1 to 1). The major advantage of this is that you can look at the association b/t two variables with very different scales. Now I’m going to go through an example, going from slope to Pearson’s r… …then beyond….

Pearson’s r Y = 0.7 + .99x sx = 1.51 sy = 2.24 Converting the slope to a Pearson’s r correlation coefficient: Formula: r = b(sx/sy) r = .99 (1.51/2.24) r = .67 SIMPLE FORMULA TO TRANSFORM A b INTO AN r… Strong positive relationship between x & y. So, Pearson’s r is superior to b (the slope) for discussing the association between two interval-ratio variables in that Pearson’s r is bounded (score of -1 to 1). The major advantage of this is that you can look at the association b/t two variables with very different scales.

Coefficient of Determination Conceptually, the formula for r2 is: r2 = Explained variation Total variation “The proportion of the total variation in Y that is attributable or explained by X.” The variation not explained by r2 is called the unexplained variation Usually attributed to measurement error, random chance, or some combination of other variables The formula for r2 is explained variation/total variation – that is, the proportion of the total variation in Y that is attributable to or explained by X. Just like other PRE measures, r2 indicates precisely the extent to which x helps us predict, understand, or explain X (much less ambiguous than r). The variation that is not explained by x (1-r2) is referred to as the unexplained variation (or, the difference between our best prediction of Y with X and the actual scores. Unexplained variation is usually attributed to the influence of some combination of other variables (THIS IS HUGE, AND IS SOMETHING YOU’LL BE GETTING INTO A LOT MORE IN 3152 – RARELY IN THE “REAL” SOCIAL WORLD CAN ALL VARIATION IN A FACTOR BE EXPLAINED BY JUST ONE OTHER FACTOR – THE ECONOMY, CRIME, ETC ARE COMPLEX PHENOMENA), measurement error, or random chance.

Coefficient of Determination Interpreting the meaning of the coefficient of determination in the example: Squaring Pearson’s r (.67) gives us an r2 of .45 Interpretation: The # of hours of daily TV watching (X) explains 45% of the total variation in soda consumed (Y) So, to continue with our example, squaring the Pearson’s r for the relationship b/t # of children in a family & number of self-reported crimes committed in a 6-month span gives us an r2 of… INTERPRETATION: ANOTHER PRE-BASED MEASURE… On the other hand, maybe there’s another variable out there that does a better job at predicting how many crimes a person commits. Perhaps, for example, it is the number of delinquent peers they hang out with, or if they hang out with any delinquent peers. In this case, the r2 might be higher for another factor. In multiple regression, we can consider both simultaneously – have 2 x’s predicting or explaining variation in the same y. (PRELUDE TO MULTIVARIATE REGRESSION)

Another Example: Relationship between Mobility Rate (x) & Divorce rate (y) The formula for this regression line is: Y = -2.5 + (.17)X 1) What is this slope telling you? 2) Using this formula, if the mobility rate for a given state was 45, what would you predict the divorce rate to be? 3) The standard deviation (s) for x=6.57 & the s for y=1.29. Use this info to calculate Pearson’s r. How would you interpret this correlation? 4) Calculate & interpret the coefficient of determination (r2) MOBILITY RATES OF STATES

Another Example: Relationship between Mobility Rate (x) & Divorce rate (y) The formula for this regression line is: Y = -2.5 + (.17)X 1) What is this slope telling you? For every one unit increase in x (mobility rate), divorce rate (y) goes up .17 2) Using this formula, if the mobility rate for a given state was 45, what would you predict the divorce rate to be? Y = -2.5 + (.17) 45 = 5.15 3) The standard deviation (s) for x=6.57 & the s for y=1.29. Use this info to calculate Pearson’s r. How would you interpret this correlation? r = .17 (6.57/1.29) = .17(5.093) = .866 There is a strong positive association between mobility rate & divorce rate. 4) Calculate & interpret the coefficient of determination (r2) r2 = (.866)2 = .75 A state’s mobility rate explains 75% of the variation in its divorce rate. MOBILITY RATES OF STATES

PEARSON’S r IN SPSS Steps for running Pearson’s r in SPSS: Click Analyze  Correlate  Bivariate Highlight the 2(+) variables you wish to examine Click OK ALSO DISCUSSED AT THE END OF CHAPTER 15 OF HEALEY: AGEKDBRN CHILDS CHLDIDEL

Pearson’s r Output Note that the table reports each correlation coefficient twice (3 bivariate relationships, 6 correlation coefficients reported) Example interpretation: There is a weak to moderate negative relationship (r = -.260) between age at which one’s first child is born (AGEKDBRN) and the number of children one has (CHILDS). ANOTHER EXAMPLE OF SPSS GIVING YOU MORE INFO THAN YOU NEED…

Measures of Association Level of Measurement (both variables) Measures of Association “Bounded”? PRE interpretation? NOMINAL Phi Cramer’s V Lambda NO* YES NO ORDINAL Gamma INTERVAL-RATIO b (slope) Pearson’s r r2 * But, has an upper limit of 1 when dealing with a 2x2 table.

Significance Testing for 2 IR Variables When both variables are interval-ratio level, strength and association are tested together The slope or “r” will have a “sig value” Sig = the specific odds of this slope, assuming the null is correct The Null in this case is that there is no relationship between the two variables in the population In other words, that the slope (or “r”) in the population is zero What are the odds of getting this slope, if in the population, the slope is zero?

SPSS DEMO Are individuals with stronger moral values more likely to engage in criminal activity? Sample = 484 UMD Students Null hypothesis? How to test null? Both are I/R variables (or close enough) Can test the significance of the measure of strength E.g., is the slope/correlation significantly different from zero? Or, what are the odds of finding this slope/correlation, if in the population, the slope is zero.

Is there a relationship here? If so, what direction? What (ballpark) would the constant, or “y-intercept” be in the regression equation?

“Model” Regression Output The same generic output is given for all regression models The “model” stuff is relevant for models with more than one independent variable How do all the independent variables together predict the dependent variable? For our purposes, the “Model R” will have the same values as pearson’s r HOWEVER, the model R cannot tell you direction (compute separate correlation in SPSS)