Overview Correlation Regression -Definition

Slides:



Advertisements
Similar presentations
Lesson 10: Linear Regression and Correlation
Advertisements

13- 1 Chapter Thirteen McGraw-Hill/Irwin © 2005 The McGraw-Hill Companies, Inc., All Rights Reserved.
Correlation and Regression
Describing Relationships Using Correlation and Regression
Correlation & Regression Chapter 15. Correlation statistical technique that is used to measure and describe a relationship between two variables (X and.
Correlation Correlation is the relationship between two quantitative variables. Correlation coefficient (r) measures the strength of the linear relationship.
Chapter 15 (Ch. 13 in 2nd Can.) Association Between Variables Measured at the Interval-Ratio Level: Bivariate Correlation and Regression.
Statistics for the Social Sciences
Lecture 11 PY 427 Statistics 1 Fall 2006 Kin Ching Kong, Ph.D
Correlation-Regression The correlation coefficient measures how well one can predict X from Y or Y from X.
Statistics Psych 231: Research Methods in Psychology.
SIMPLE LINEAR REGRESSION
Correlation and Regression. Correlation What type of relationship exists between the two variables and is the correlation significant? x y Cigarettes.
RESEARCH STATISTICS Jobayer Hossain Larry Holmes, Jr November 6, 2008 Examining Relationship of Variables.
SIMPLE LINEAR REGRESSION
Business Statistics - QBM117 Least squares regression.
Ch 2 and 9.1 Relationships Between 2 Variables
Basic Statistical Concepts Part II Psych 231: Research Methods in Psychology.
Correlation 1. Correlation - degree to which variables are associated or covary. (Changes in the value of one tends to be associated with changes in the.
Multiple Regression Research Methods and Statistics.
Correlation and Regression Analysis
Relationships Among Variables
Statistics for the Behavioral Sciences (5th ed.) Gravetter & Wallnau
Correlation & Regression Math 137 Fresno State Burger.
Correlation and Linear Regression
Correlation and Linear Regression
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 13 Linear Regression and Correlation.
Correlation and Regression A BRIEF overview Correlation Coefficients l Continuous IV & DV l or dichotomous variables (code as 0-1) n mean interpreted.
Linear Regression and Correlation
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
Chapter 12 Correlation and Regression Part III: Additional Hypothesis Tests Renee R. Ha, Ph.D. James C. Ha, Ph.D Integrative Statistics for the Social.
SIMPLE LINEAR REGRESSION
Introduction to Linear Regression and Correlation Analysis
Linear Regression and Correlation
ASSOCIATION BETWEEN INTERVAL-RATIO VARIABLES
Chapter 15 Correlation and Regression
Chapter 6 & 7 Linear Regression & Correlation
Agenda Review Association for Nominal/Ordinal Data –  2 Based Measures, PRE measures Introduce Association Measures for I-R data –Regression, Pearson’s.
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
Section 5.2: Linear Regression: Fitting a Line to Bivariate Data.
When trying to explain some of the patterns you have observed in your species and community data, it sometimes helps to have a look at relationships between.
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 13 Linear Regression and Correlation.
Examining Relationships in Quantitative Research
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Relationships If we are doing a study which involves more than one variable, how can we tell if there is a relationship between two (or more) of the.
Political Science 30: Political Inquiry. Linear Regression II: Making Sense of Regression Results Interpreting SPSS regression output Coefficients for.
Correlation and Regression: The Need to Knows Correlation is a statistical technique: tells you if scores on variable X are related to scores on variable.
CHAPTER 5 CORRELATION & LINEAR REGRESSION. GOAL : Understand and interpret the terms dependent variable and independent variable. Draw a scatter diagram.
Correlation & Regression Analysis
Regression Analysis. 1. To comprehend the nature of correlation analysis. 2. To understand bivariate regression analysis. 3. To become aware of the coefficient.
SIMPLE LINEAR REGRESSION AND CORRELLATION
Advanced Statistical Methods: Continuous Variables REVIEW Dr. Irina Tomescu-Dubrow.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.
Chapters 8 Linear Regression. Correlation and Regression Correlation = linear relationship between two variables. Summarize relationship with line. Called.
Chapter 15 Association Between Variables Measured at the Interval-Ratio Level.
Describing Bivariate Relationships. Bivariate Relationships When exploring/describing a bivariate (x,y) relationship: Determine the Explanatory and Response.
Regression and Correlation
Correlation & Regression
Correlation and Simple Linear Regression
LSRL Least Squares Regression Line
Suppose the maximum number of hours of study among students in your sample is 6. If you used the equation to predict the test score of a student who studied.
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Simple Linear Regression and Correlation
Warsaw Summer School 2017, OSU Study Abroad Program
Chapter Thirteen McGraw-Hill/Irwin
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

Overview Correlation Regression -Definition -Deviation Score Formula, Z score formula -Hypothesis Test Regression Intercept and Slope Unstandardized Regression Line Standardized Regression Line Hypothesis Tests

Associations among Continuous Variables Early developments Sir Francis Galton was very interested in these issues in the 1880’s Galton was the cousin of Darwin and thus he became interested in evolution and heredity Galton had an intuition that the heredity could be understood in terms of deviations from means He began to measure characteristics of plants and animals including people Lots of normality Galton found many characteristics that were normally distributed Height in humans (by gender) Weight in humans (by gender) Length of animal bones Weights of seeds of plants Sweat peas The next step that Galton took was to look at distributions of measurements of parents and offspring together He first looked at sweet pea plants because the female plants can self fertilize This makes examining the data easier because only one parent influences the characteristics of the offspring He plotted mother seed sizes against daughter seed sizes

3 characteristics of a relationship Direction Positive(+) Negative (-) Degree of association Between –1 and 1 Absolute values signify strength Form Linear Non-linear

Direction Positive Negative Large values of X = large values of Y, Small values of X = small values of Y. - e.g. IQ and SAT Large values of X = small values of Y Small values of X = large values of Y -e.g. SPEED and ACCURACY

Degree of association Strong (tight cloud) Weak (diffuse cloud)

Form Linear Non- linear

Regression & Correlation Early developments Sir Francis Galton was very interested in these issues in the 1880’s Galton was the cousin of Darwin and thus he became interested in evolution and heredity Galton had an intuition that the heredity could be understood in terms of deviations from means He began to measure characteristics of plants and animals including people Lots of normality Galton found many characteristics that were normally distributed Height in humans (by gender) Weight in humans (by gender) Length of animal bones Weights of seeds of plants Sweat peas The next step that Galton took was to look at distributions of measurements of parents and offspring together He first looked at sweet pea plants because the female plants can self fertilize This makes examining the data easier because only one parent influences the characteristics of the offspring He plotted mother seed sizes against daughter seed sizes

What is the best fitting straight line? Regression Equation: Y = a + bX How closely are the points clustered around the line? Pearson’s R

Correlation Early developments Sir Francis Galton was very interested in these issues in the 1880’s Galton was the cousin of Darwin and thus he became interested in evolution and heredity Galton had an intuition that the heredity could be understood in terms of deviations from means He began to measure characteristics of plants and animals including people Lots of normality Galton found many characteristics that were normally distributed Height in humans (by gender) Weight in humans (by gender) Length of animal bones Weights of seeds of plants Sweat peas The next step that Galton took was to look at distributions of measurements of parents and offspring together He first looked at sweet pea plants because the female plants can self fertilize This makes examining the data easier because only one parent influences the characteristics of the offspring He plotted mother seed sizes against daughter seed sizes

Correlation - Definition Correlation: a statistical technique that measures and describes the degree of linear relationship between two variables Scatterplot Obs X Y A 1 1 B 1 3 C 3 2 D 4 5 E 6 4 F 7 5 Dataset Y X

Pearson’s r A value ranging from -1.00 to 1.00 indicating the strength and direction of the linear relationship. Absolute value indicates strength +/- indicates direction

The Logic of Correlation MEAN of X Below average on X Above average on X Above average on Y Below average on Y MEAN of Y For a strong positive association, the cross-products will mostly be positive Cross-Product =

The Logic of Correlation MEAN of X Below average on X Above average on X Above average on Y Below average on Y MEAN of Y For a strong negative association, the cross-products will mostly be negative Cross-Product =

The Logic of Correlation MEAN of X Below average on X Above average on X Above average on Y Below average on Y MEAN of Y For a weak association, the cross-products will be mixed Cross-Product =

Pearson’s r Deviation score formula SP (sum of products) = ∑ (X – X)(Y – Y) Deviation score formula

Deviation Score Formula Femur Humerus A 38 41 B 56 63 C 59 70 D 64 72 E 74 84 mean 58.2 66.00 SSX SSY SP

Deviation Score Formula Femur Humerus A 38 41 -20.2 -25 408.04 625 505 B 56 63 -2.2 -3 4.84 9 6.6 C 59 70 0.8 4 .64 16 3.2 D 64 72 5.8 6 33.64 36 34.8 E 74 84 15.8 18 249.64 324 284.4 mean 58.2 66.00 696.8 1010 834 SSX SSY SP = .99

Pearson’s r Deviation score formula SP (sum of products) = Below average on X Above average on X Below Average on Y Above average on Y Below average on Y SP (sum of products) = ∑ (X – X)(Y – Y) Deviation score formula For a strong positive association, the SP will be a big positive number

Pearson’s r Deviation score formula SP (sum of products) = Below average on X Above average on X Below Average on Y Above average on Y Below average on Y Deviation score formula For a strong negative association, the SP will be a big negative number ∑ (X – X)(Y – Y) SP (sum of products) =

Pearson’s r Deviation score formula SP (sum of products) = Below average on X Above average on X Below Average on Y Above average on Y Below average on Y Deviation score formula For a weak association, the SP will be a small number (+ and – will cancel each other out) ∑ (X – X)(Y – Y) SP (sum of products) =

Pearson’s r Z score formula Standardized cross-products

Z-score formula Femur Humerus ZX ZY ZXZY A 38 41 B 56 63 C 59 70 D 64 72 E 74 84 mean 58.2 66.00 s 13.20 15.89

Z-score formula Femur Humerus ZX ZY ZXZY A 38 41 -1.530 -1.573 B 56 63 -0.167 -0.189 C 59 70 0.061 0.252 D 64 72 0.439 0.378 E 74 84 1.197 1.133 mean 58.2 66.00 s 13.20 15.89

Z-score formula Femur Humerus ZX ZY ZXZY A 38 41 -1.530 -1.573 2.408 B 56 63 -0.167 -0.189 0.031 C 59 70 0.061 0.252 0.015 D 64 72 0.439 0.378 0.166 E 74 84 1.197 1.133 1.356 mean 58.2 66.00 ∑=3.976 s 13.20 15.89 r = .99

Formulas for R Z score formula Deviations formula

Interpretation of R A measure of strength of association: how closely do the points cluster around a line? A measure of the direction of association: is it positive or negative?

Interpretation of R r = .10 very small association, not usually reliable r = .20 small association r = .30 typical size for personality and social studies r = .40 moderate association r = .60 you are a research rock star r = .80 hmm, are you for real?

Interpretation of R-squared The amount of covariation compared to the amount of total variation “The percent of total variance that is shared variance” E.g. “If r = .80, then X explains 64% of the variability in Y” (and vice versa)

Hypothesis testing with r Hypotheses H0:  = 0 HA :  ≠ 0 Test statistic = r Or just use table E.2 to find critical values of r

Practice alcohol tobacco A 6.47 4.03 B 6.13 3.76 C 6.19 3.77 D 4.89 3.34 E 5.63 3.47 mean SSX SSY SP

Practice alcohol tobacco A 6.47 4.03 B 6.13 3.76 C 6.19 3.77 D 4.89 3.34 E 5.63 3.47 mean 1.55 .30 .64 SSX SSY SP

Properties of R A standardized statistic – will not change if you change the units of X or Y. (bc based on z-scores) The same whether X is correlated with Y or vice versa Fairly unstable with small n Vulnerable to outliers Has a skewed distribution

Linear Regression

Linear Regression But how do we describe the line? If two variables are linearly related it is possible to develop a simple equation to predict one variable from the other The outcome variable is designated the Y variable, and the predictor variable is designated the X variable E.g. centigrade to Fahrenheit: F = 32 + 1.8C this formula gives a specific straight line

The Linear Equation F = 32 + 1.8(C) General form is Y = a + bX The prediction equation: Y’ = a+ bX Where a = intercept b = slope X = the predictor Y = the criterion a and b are constants in a given line; X and Y change

The Linear Equation F = 32 + 1.8(C) General form is Y = a + bX The prediction equation: Y’ = a + bX Where a = intercept b = slope X = the predictor Y = the criterion Different b’s…

The Linear Equation F = 32 + 1.8(C) General form is Y = a + bX The prediction equation: Y’ = a + bX Where a = intercept b = slope X = the predictor Y = the criterion Different a’s…

The Linear Equation F = 32 + 1.8(C) General form is Y = a + bX The prediction equation: Y’ = a + bX Where a = intercept b = slope X = the predictor Y = the criterion Different a’s and b’s …

Slope and Intercept Equation of the line The slope b: the amount of change in y with one unit change in x The intercept a: the value of y when x is zero

Slope and Intercept Equation of the line The slope The intercept The slope is influenced by r, but is not the same as r

When there is no linear association (r = 0), the regression line is horizontal. b=0. and our best estimate of age is 29.5 at all heights.

When the correlation is perfect (r = ± 1 When the correlation is perfect (r = ± 1.00), all the points fall along a straight line with a slope

When there is some linear association (0<|r|<1), the regression line fits as close to the points as possible and has a slope

Where did this line come from? It is a straight line which is drawn through a scatterplot, to summarize the relationship between X and Y It is the line that minimizes the squared deviations (Y’ – Y)2 We call these vertical deviations “residuals”

Regression lines Minimizing the squared vertical distances, or “residuals”

Unstandardized Regression Line Equation of the line The slope The intercept

Properties of b (slope) An unstandardized statistic – will change if you change the units of X or Y. Depends on whether Y is regressed on X or vice versa

Standardized Regression Line Equation of the line The slope The intercept A person 1 stdev above the mean on height would be how many stdevs above the mean on weight?

Properties of β (standardized slope) A standardized statistic – will not change if you change the units of X or Y. Is equal to r, in simple linear regression

Exercise X Y 1 3 4 5 6 9 7 Calculate: b = a = β = Write the regression equation: Write the standardized equation: Mean: 5 5 Stdevp: 3.27 1.41 r= 0.866025404 b= 0.375 a= 3.125 X Y Y' 1 3 3.5 1 4 3.5 5 4 5 5 6 5 9 6 6.5 9 7 6.5

Exercise X Y 1 3 4 5 6 9 7 Calculate: b = .375 a = 3.125 β = .866 Write the regression equation: Write the standardized equation: Mean: 5 5 Stdevp: 3.27 1.41 r= 0.866025404 b= 0.375 a= 3.125 X Y Y' 1 3 3.5 1 4 3.5 5 4 5 5 6 5 9 6 6.5 9 7 6.5

Regression Coefficients Table Predictor Unstandardized Coefficient Standard error Standardized Coefficient t sig Intercept a SEa - Variable X b SEb

Summary Correlation: Pearson’s r Unstandardized Regression Line

Exercise in Excel X Y 1 -1.5 2 -3 4 -4 7 -2 9 -6 Calculate: 2 -3 4 -4 7 -2 9 -6 Calculate: r = b = a = β = Write the regression equation: Write the standardized equation: Sketch the scatterplot and regression line Mean: 5 5 Stdevp: 3.27 1.41 r= 0.866025404 b= 0.375 a= 3.125 X Y Y' 1 3 3.5 1 4 3.5 5 4 5 5 6 5 9 6 6.5 9 7 6.5