1 Regression & Correlation (1) 1.A relationship between 2 variables X and Y 2.The relationship seen as a straight line 3.Two problems 4.How can we tell.

Slides:



Advertisements
Similar presentations
Lesson 10: Linear Regression and Correlation
Advertisements

Chapter 12 Simple Linear Regression
Simple Linear Regression 1. 2 I want to start this section with a story. Imagine we take everyone in the class and line them up from shortest to tallest.
Correlation and Regression
1 Simple Linear Regression and Correlation The Model Estimating the Coefficients EXAMPLE 1: USED CAR SALES Assessing the model –T-tests –R-square.
Describing Relationships Using Correlation and Regression
Correlation Correlation is the relationship between two quantitative variables. Correlation coefficient (r) measures the strength of the linear relationship.
© The McGraw-Hill Companies, Inc., 2000 CorrelationandRegression Further Mathematics - CORE.
Cal State Northridge  320 Andrew Ainsworth PhD Regression.
Elementary Statistics Larson Farber 9 Correlation and Regression.
PSY 307 – Statistics for the Behavioral Sciences
The Simple Regression Model
SIMPLE LINEAR REGRESSION
1 Simple Linear Regression Chapter Introduction In this chapter we examine the relationship among interval variables via a mathematical equation.
Correlation and Regression. Correlation What type of relationship exists between the two variables and is the correlation significant? x y Cigarettes.
Introduction to Probability and Statistics Linear Regression and Correlation.
Chapter 9: Correlation and Regression
SIMPLE LINEAR REGRESSION
Simple Linear Regression. Introduction In Chapters 17 to 19, we examine the relationship between interval variables via a mathematical equation. The motivation.
Chapter 2 – Simple Linear Regression - How. Here is a perfect scenario of what we want reality to look like for simple linear regression. Our two variables.
Review Guess the correlation. A.-2.0 B.-0.9 C.-0.1 D.0.1 E.0.9.
Lecture 5 Correlation and Regression
Correlation and Linear Regression
SIMPLE LINEAR REGRESSION
Introduction to Linear Regression and Correlation Analysis
Linear Regression and Correlation
Correlation and Linear Regression
Section #6 November 13 th 2009 Regression. First, Review Scatter Plots A scatter plot (x, y) x y A scatter plot is a graph of the ordered pairs (x, y)
CORRELATION & REGRESSION
Correlation and Regression. The test you choose depends on level of measurement: IndependentDependentTest DichotomousContinuous Independent Samples t-test.
Chapter 15 Correlation and Regression
Regression. Idea behind Regression Y X We have a scatter of points, and we want to find the line that best fits that scatter.
Ch4 Describing Relationships Between Variables. Pressure.
© The McGraw-Hill Companies, Inc., Chapter 11 Correlation and Regression.
Correlation is a statistical technique that describes the degree of relationship between two variables when you have bivariate data. A bivariate distribution.
Production Planning and Control. A correlation is a relationship between two variables. The data can be represented by the ordered pairs (x, y) where.
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 13 Linear Regression and Correlation.
Elementary Statistics Correlation and Regression.
Multivariate Analysis. One-way ANOVA Tests the difference in the means of 2 or more nominal groups Tests the difference in the means of 2 or more nominal.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
1 Chapter 10 Correlation. Positive and Negative Correlation 2.
Chapter 14 Correlation and Regression
Regression Analysis. 1. To comprehend the nature of correlation analysis. 2. To understand bivariate regression analysis. 3. To become aware of the coefficient.
Chapter 7 Calculation of Pearson Coefficient of Correlation, r and testing its significance.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
1 Chapter 10 Correlation. 2  Finding that a relationship exists does not indicate much about the degree of association, or correlation, between two variables.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.
Regression Analysis Deterministic model No chance of an error in calculating y for a given x Probabilistic model chance of an error First order linear.
4 basic analytical tasks in statistics: 1)Comparing scores across groups  look for differences in means 2)Cross-tabulating categoric variables  look.
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
Significance Tests for Regression Analysis. A. Testing the Significance of Regression Models The first important significance test is for the regression.
© The McGraw-Hill Companies, Inc., Chapter 10 Correlation and Regression.
Correlation and Regression Elementary Statistics Larson Farber Chapter 9 Hours of Training Accidents.
Chapter 13 Linear Regression and Correlation. Our Objectives  Draw a scatter diagram.  Understand and interpret the terms dependent and independent.
Correlation and Linear Regression
Regression and Correlation
Correlation and Regression
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Correlation and Regression
SIMPLE LINEAR REGRESSION
Simple Linear Regression and Correlation
Regression lecture 2 1. Review: deterministic and random components
SIMPLE LINEAR REGRESSION
Regression & Correlation (1)
Regression lecture 2 1. Review: deterministic and random components
Linear Regression and Correlation
MGS 3100 Business Analysis Regression Feb 18, 2016
Presentation transcript:

1 Regression & Correlation (1) 1.A relationship between 2 variables X and Y 2.The relationship seen as a straight line 3.Two problems 4.How can we tell if our regression line is useful? 5.Test of hypothesis about the slope, β 1 6.Correlation 7.Useful features of r 8.Test of hypothesis about ρ 9.Examples

2 A relationship between two variables X & Y We often have pairs of scores for a given set of cases. For example, we might have: * # of years of education and annual income, or * IQ and GPA * income and # of books in the household More generally, we have any X and Y, and our question is, does knowing something about X tell us anything about Y?

3 A relationship between two variables X & Y Does knowing something about X tell us anything about Y? For example, knowing how many years of education a person has, could you usefully estimate their annual income, or the number of cigarettes they smoke in a year?

4 A relationship between two variables X & Y Often, the answer to that question is, Yes – there is a relationship between the X and Y scores you have measured. * On average, as number of years of education goes up (across a set of people), number of cigarettes smoked per year goes down.

5 A relationship between two variables X & Y In the graph on the next slide, we see two things: 1.X goes down as Y goes up. 2.At each value of X, there is some variability in Y – but substantially less than there is in Y overall.

6 X = Years of education Y = Cigarettes per year Note that the range of the Y values for this value of X is small, compared to the whole range of Y in the data set.

7 The relationship seen as a straight line The relationship between an X and a Y can be described using the equation for a straight line. Y = β 0 + β 1 X + ε Y-interceptSlopeError Note: this is the (theoretical) population equation relating Y to X

8 Two problems Y = β 0 + β 1 X + ε In principle, this equation would let us predict the value of Y for a given X without error IF A. X were the only variable that influenced Y * Usually, it isn’t B. We knew the population values of β 0 + β 1 * Usually, we don’t

9 Two problems Be sure to distinguish between A.Actual values of Y in the population. B.Values of Y we would predict using Y = β 0 + β 1 X + ε if we had the population values for β 0 + β 1. C. Values of Y we predict on the basis of the X-Y relationship in our sample data: Y = β 0 + β 1 X ^^ Why no ε here?

10 Two problems When we predict Y on the basis of X for a given case, two things can cause the predicted values to be different from the values we would find if we actually measured Y for that case: 1. We don’t know the population values of β 0 and β 1 – only the sample values β 0 and β 1. Note that if we did know β 0 & β 1, this source of error would disappear. ^^

11 Two problems 2. In the population, Y is not uniquely determined by X. As a result, for each value of X, there is a distribution of Y values. * relative to our predicted Y for a given value of X, the observed values of Y will sometimes be higher and sometimes be lower. * these “errors” are random – over the long term, they will cancel each other out * but even if we knew β 0 and β 1, this source of error would still exist.

12 Two problems In other words 1.We don’t have population values for the slope and the intercept of the line relating X to Y. That’s one problem. 2.Even if we had population values for the slope and the intercept, the equation relating X to Y would still not perfectly predict Y. That’s the other problem.

13 How can we tell if our regression line is useful? The line is useful if the predicted values of Y are close to the observed values of Y (in the sample). We use our sample X and Y values to compute the regression line, Y = β 0 + β 1 X. We then use this line to predict the same Y values, and compare our predicted values with the observed values in the sample data. If the prediction is good, we can then use the regression line to predict Y for values of X not in our sample. ^

14 How can we tell if our regression line is useful? (Y i – Y i ) = Y i – (β 0 + β 1 X i )(since Y i = β 0 + β 1 X i ) Therefore, the sum of the squared deviations of predicted Y values from actual Y values is: SSE = Σ[Yi – (β 0 + β 1 X i )] 2 Now β 0 and β 1 are the “least squares estimators” of β 0 + β 1 – giving smaller SSE than any other values of β 0 and β 1 would. ^^^^^^ ^^ ^^ ^ ^

15 X Y When there is no relation between X and Y, the best estimator of the Y value for any case is the mean, Y. Notice that the slope of this line is zero!

16 How can we tell if our regression line is useful? If X is completely unrelated to Y, the best estimate we could make of Y would be the mean, Y, for any value of X. We find out whether our regression line is useful by asking whether its slope is different from 0. H 0 : β 1 = 0 [Why not β 1 ?] ^

17 How can we tell if our regression line is useful? To test that null hypothesis, we use the fact that β1 is one slope taken from the sampling distribution of β 1. β 1 = SS XY β 0 = Y - β 1 X SS XX Where SS XY = Σ(X i – X) (Y i –Y) = ΣX i Y i – ΣX i ΣY i n ^ ^ ^^^

18 How can we tell if our regression line is useful? SS XX = Σ(X i – X) 2 = ΣX 2 – (ΣX) 2 n (n = sample size) For the sampling distribution of β 1 : The mean = β 1  β 1 =  √SS XX ^ ^

19 How can we tell if our regression line is useful? We estimate  β 1 by s β 1 = s √SS XX Where s = SSE n-2 ^ ^ √

20 Test of hypothesis about the slope, β 1 Since  is unknown, we use t to test H 0 : H 0 : β 1 = 0H 0 : β 1 = 0 H A : β 1 < 0H A : β 1 ≠ 0 or β 1 > 0 Test statistic:t = β 1 – 0 S β 1 ^ ^

21 Test of hypothesis about the slope, β 1 Rejection region: t obt t  /2 t obt > t  t crit is based on n-2 degrees of freedom.

22 Correlation The Pearson Correlation coefficient r is a numerical, descriptive measure of the strength and direction of relationship between two variables X and Y. r = SS XY SS XX SS YY r gives much the same information as β 1. However r is “scale-less” and (-1 ≤ r ≤1) √ ^

23 Useful features of r r indexes the X-Y relationship: r > 0 means Y increases as X increases r < 0 means Y decreases as X increases r = 0 means there is no relationship between X & Y r is the sample correlation coefficient. We can use it to estimate rho (ρ), the population correlation coefficient, and use r to test H 0 : ρ = 0

24 Test of hypothesis about ρ H 0 : ρ = 0H 0 : ρ = 0 H A : ρ < 0H A : ρ ≠ 0 or ρ > 0 Test statistic:t = r – ρ 1 – r 2 n – 2 t crit has n-2 degrees of freedom. √

25 Example 1 H 0 : ρ = 0 H A : ρ ≠ 0 Test statistic:t = r – ρ 1 – r 2 n – 2 t crit = t (5, α/2 =.025) = √

26 Example 1 – Sum formulas First, calculations involving X: ΣX = 74(ΣX) 2 = 5476ΣX 2 = 922 Then, analogous calculations involving Y: ΣY = 82(ΣY) 2 = 6724ΣY 2 = 1076 Then, calculations involving X and Y: ΣXY = 976

27 Example 1 – Sums of squares formulas SS XY = Σ(X i – X) (Y i –Y) = ΣX i Y i – ΣX i ΣY i n SS XX = Σ(X i – X) 2 = ΣX 2 – (ΣX) 2 n SS YY = Σ(Y i – Y) 2 = ΣY 2 – (ΣY) 2 n

28 Example 1 – calculate r SS XY = SS XX = SS YY = r = SS XY r =.859 SS XX SS YY √

29 Example 1 – do t-test t = r – ρ 1 – r 2 n – 2 t = =.859= Reject H 0 : A significant correlation exists. √ √

30 Example 2 H 0 : ρ = 0 H A : ρ > 0 Test statistic:t = r – ρ 1 – r 2 n – 2 t crit = t (7-2 = 5, α =.05) = √ Note – these are the Greek letter rho, NOT the English letter P

31 Example 2 – Sum formulas First, calculations involving X: ΣX = 4.2(ΣX) 2 = 17.64ΣX 2 = 2.86 Then, analogous calculations involving Y: ΣY = 32(ΣY) 2 = 1024ΣY 2 = Then, calculations involving X and Y: ΣXY = 21.35

32 Example 2 – calculate r SS XY = – (4.2)(32) = SS XX = 2.86 – 17.64=.34 7

33 Example 2 – calculate r SS YY = – 1024= r = SS XY SS XX SS YY r =.945 √

34 Example 2 – do t-test t = r – ρ 1 – r 2 n – 2 t = =.945= Reject H 0 : A significant correlation exists. √ √