Chapter 8 – 1 Regression & Correlation:Extended Treatment Overview The Scatter Diagram Bivariate Linear Regression Prediction Error Coefficient of Determination.

Slides:



Advertisements
Similar presentations
Managerial Economics in a Global Economy
Advertisements

Chap 12-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 12 Simple Regression Statistics for Business and Economics 6.
Chapter 14, part D Statistical Significance. IV. Model Assumptions The error term is a normally distributed random variable and The variance of  is constant.
Review ? ? ? I am examining differences in the mean between groups
Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
Learning Objectives Copyright © 2002 South-Western/Thomson Learning Data Analysis: Bivariate Correlation and Regression CHAPTER sixteen.
Learning Objectives Copyright © 2004 John Wiley & Sons, Inc. Bivariate Correlation and Regression CHAPTER Thirteen.
Correlation Chapter 9.
Chapter 15 (Ch. 13 in 2nd Can.) Association Between Variables Measured at the Interval-Ratio Level: Bivariate Correlation and Regression.
Chapter 13 Multiple Regression
1 Lecture 2: ANOVA, Prediction, Assumptions and Properties Graduate School Social Science Statistics II Gwilym Pryce
Chapter 10 Simple Regression.
9. SIMPLE LINEAR REGESSION AND CORRELATION
PPA 501 – Analytical Methods in Administration Lecture 8 – Linear Regression and Correlation.
Chapter 12 Simple Regression
Chapter 12 Multiple Regression
PPA 415 – Research Methods in Public Administration
Chapter 13 Introduction to Linear Regression and Correlation Analysis
The Simple Regression Model
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.
SIMPLE LINEAR REGRESSION
Linear Regression and Correlation Analysis
1 Simple Linear Regression Chapter Introduction In this chapter we examine the relationship among interval variables via a mathematical equation.
Chapter 11 Multiple Regression.
Regression Chapter 10 Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania.
SIMPLE LINEAR REGRESSION
BCOR 1020 Business Statistics
Business Statistics - QBM117 Least squares regression.
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Leon-Guerrero and Frankfort-Nachmias,
Introduction to Regression Analysis, Chapter 13,
Simple Linear Regression. Introduction In Chapters 17 to 19, we examine the relationship between interval variables via a mathematical equation. The motivation.
Review Regression and Pearson’s R SPSS Demo
Relationships Among Variables
Correlation and Linear Regression
Chapter 8: Bivariate Regression and Correlation
Lecture 16 Correlation and Coefficient of Correlation
Lecture 15 Basics of Regression Analysis
SIMPLE LINEAR REGRESSION
Introduction to Linear Regression and Correlation Analysis
Correlation and Linear Regression
Correlation and Regression. The test you choose depends on level of measurement: IndependentDependentTest DichotomousContinuous Independent Samples t-test.
Chapter 6 & 7 Linear Regression & Correlation
EQT 272 PROBABILITY AND STATISTICS
Statistics for Business and Economics 7 th Edition Chapter 11 Simple Regression Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch.
INTRODUCTORY LINEAR REGRESSION SIMPLE LINEAR REGRESSION - Curve fitting - Inferences about estimated parameter - Adequacy of the models - Linear.
Chapter 12 Examining Relationships in Quantitative Research Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin.
1 1 Slide Simple Linear Regression Coefficient of Determination Chapter 14 BA 303 – Spring 2011.
Chapter 8 – 1 Chapter 8: Bivariate Regression and Correlation Overview The Scatter Diagram Two Examples: Education & Prestige Correlation Coefficient Bivariate.
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
Examining Relationships in Quantitative Research
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Go to Table of Content Single Variable Regression Farrokh Alemi, Ph.D. Kashif Haqqi M.D.
Warsaw Summer School 2015, OSU Study Abroad Program Regression.
Y X 0 X and Y are not perfectly correlated. However, there is on average a positive relationship between Y and X X1X1 X2X2.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
Chapter 16 Data Analysis: Testing for Associations.
Lecture 10: Correlation and Regression Model.
Examining Relationships in Quantitative Research
Chapter Thirteen Copyright © 2006 John Wiley & Sons, Inc. Bivariate Correlation and Regression.
Chapter 9: Correlation and Regression Analysis. Correlation Correlation is a numerical way to measure the strength and direction of a linear association.
Correlation & Regression Analysis
Regression Analysis. 1. To comprehend the nature of correlation analysis. 2. To understand bivariate regression analysis. 3. To become aware of the coefficient.
Statistics for Managers Using Microsoft® Excel 5th Edition
REGRESSION AND CORRELATION SIMPLE LINEAR REGRESSION 10.2 SCATTER DIAGRAM 10.3 GRAPHICAL METHOD FOR DETERMINING REGRESSION 10.4 LEAST SQUARE METHOD.
Warsaw Summer School 2017, OSU Study Abroad Program
Introduction to Regression
Presentation transcript:

Chapter 8 – 1 Regression & Correlation:Extended Treatment Overview The Scatter Diagram Bivariate Linear Regression Prediction Error Coefficient of Determination Correlation Coefficient Anova and the F statistic Multiple Regression

Chapter 8 – 2 Overview Interval Nominal Dependent Variable Independent Variables Nominal Interval Considers the distribution of one variable across the categories of another variable Considers the difference between the mean of one group on a variable with another group Considers how a change in a variable affects a discrete outcome Considers the degree to which a change in one or two variables results in a change in another

Chapter 8 – 3 Overview Interval Nominal Dependent Variable Independent Variables Nominal Interval Regression Correlation You already know how to deal with two nominal variables Lambda Anova and F-Test TODAY! This cell is not covered in this course Logistic Regression TODAY!

Chapter 8 – 4 General Examples Does a change in one variable significantly affect another variable? Do two scores co-vary positively (high on one score high on the other, low on one, low on the other)? Do two scores co-vary negatively (high on one score low on the other; low on one, hi on the other)? Does a change in two or more variables significantly affect another variable?

Chapter 8 – 5 Specific Examples Does getting older significantly influence a person’s political views? Does marital satisfaction increase with length of marriage? How does an additional year of education affect one’s earnings? How do education and seniority affect one’s earnings?

Chapter 8 – 6 Scatter Diagrams Scatter Diagram (scatterplot)—a visual method used to display a relationship between two interval-ratio variables. Typically, the independent variable is placed on the X-axis (horizontal axis), while the dependent variable is placed on the Y-axis (vertical axis.)

Chapter 8 – 7 Scatter Diagram Example The data…

Chapter 8 – 8 Scatter Diagram Example

Chapter 8 – 9 A Scatter Diagram Example of a Negative Relationship

Chapter 8 – 10 Linear Relationships Linear relationship – A relationship between two interval-ratio variables in which the observations displayed in a scatter diagram can be approximated with a straight line. Deterministic (perfect) linear relationship – A relationship between two interval-ratio variables in which all the observations (the dots) fall along a straight line. The line provides a predicted value of Y (the vertical axis) for any value of X (the horizontal axis.

Chapter 8 – 11 Graph the data below and examine the relationship:

Chapter 8 – 12 The Seniority-Salary Relationship

Chapter 8 – 13 Example: Education & Prestige Does education predict occupational prestige? If so, then the higher the respondent’s level of education, as measured by number of years of schooling, the greater the prestige of the respondent’s occupation. Take a careful look at the scatter diagram on the next slide and see if you think that there exists a relationship between these two variables…

Chapter 8 – 14 Scatterplot of Prestige by Education

Chapter 8 – 15 Example: Education & Prestige The scatter diagram data can be represented by a straight line, therefore there does exist a relationship between these two variables. In addition, since occupational prestige becomes higher, as years of education increases, we can say also that the relationship is a positive one.

Chapter 8 – 16 Take your best guess? The mean age for U.S. residents. Now if I tell you that this person owns a skateboard, would you change your guess? (Of course!) With quantitative analyses we are generally trying to predict or take our best guess at value of the dependent variable. One way to assess the relationship between two variables is to consider the degree to which the extra information of the second variable makes your guess better. If someone owns a skateboard, that is likely to indicate to us that s/he is younger and we may be able to guess closer to the actual value. If you know nothing else about a person, except that he or she lives in United States and I asked you to his or her age, what would you guess?

Chapter 8 – 17 Take your best guess? Similar to the example of age and the skateboard, we can take a much better guess at someone’s occupational prestige, if we have information about her/his years or level of education.

Chapter 8 – 18 Equation for a Straight Line Y= a + bX wherea = intercept b = slope Y = dependent variable X = independent variable X Y a rise run rise run = b

Chapter 8 – 19 Bivariate Linear Regression Equation Y = a + bX Y-intercept (a)—The point where the regression line crosses the Y-axis, or the value of Y when X=0. Slope (b)—The change in variable Y (the dependent variable) with a unit change in X (the independent variable.) The estimates of a and b will have the property that the sum of the squared differences between the observed and predicted (Y-Y) 2 is minimized using ordinary least squares (OLS). Thus the regression line represents the Best Linear and Unbiased Estimators (BLUE) of the intercept and slope. ˆ ^

Chapter 8 – 20 Now let’s interpret the SPSS output... SPSS Regression Output: 1996 GSS Education & Prestige

Chapter 8 – 21 The Regression Equation Prediction Equation: Y = (X) This line represents the predicted values for Y when X is zero. ˆ

Chapter 8 – 22 Prediction Equation: Y = (X) This line represents the predicted values for Y for each additional year of education ˆ The Regression Equation

Chapter 8 – 23 If a respondent had zero years of schooling, this model predicts that his occupational prestige score would be points. For each additional year of education, our model predicts a point increase in occupational prestige. Y = (X) ˆ Interpreting the regression equation

Chapter 8 – 24 Ordinary Least Squares Least-squares line (best fitting line) – A line where the errors sum of squares, or e 2, is at a minimum. Least-squares method – The technique that produces the least squares line.

Chapter 8 – 25 Estimating the slope: b The bivariate regression coefficient or the slope of the regression line can be obtained from the observed X and Y scores.

Chapter 8 – 26 Covariance= Variance of X= Covariance of X and Y—a measure of how X and Y vary together. Covariance will be close to zero when X and Y are unrelated. It will be greater than zero when the relationship is positive and less than zero when the relationship is negative. Variance of X—we have talked a lot about variance in the dependent variable. This is simply the variance for the independent variable Covariance and Variance

Chapter 8 – 27 Estimating the Intercept The regression line always goes through the point corresponding to the mean of both X and Y, by definition. So we utilize this information to solve for a:

Chapter 8 – 28 Back to the original scatterplot:

Chapter 8 – 29 A Representative Line

Chapter 8 – 30 Other Representative Lines

Chapter 8 – 31 Calculating the Regression Equation

Chapter 8 – 32 Calculating the Regression Equation

Chapter 8 – 33 The Least Squares Line!

Chapter 8 – 34 Summary: Properties of the Regression Line Represents the predicted values for Y for any and all values of X. Always goes through the point corresponding to the mean of both X and Y. It is the best fitting line in that it minimizes the sum of the squared deviations. Has a slope that can be positive or negative;

Chapter 8 – 35 Prediction Errors Back to our original data… Consider the prediction of Y for one country: Norway Norway’s predicted Y=73

Chapter 8 – 36 Take your best guess? If you didn’t know the percentage of citizens in Norway who agreed to pay higher prices for environmental protection (Y) what would you guess? The mean for Y or = 56.45(The horizontal line in Figure 8) With this prediction the error for Norway is:

Chapter 8 – 37 IMPROVING THE PREDICTION Let’s see if we can reduce the error of prediction for Norway by using the linear regression equation: The new error of prediction is: Have we improved the prediction? Yes! By… 5.72 ( =5.72)

Chapter 8 – 38 SUM OF SQUARED DEVIATION We have looked only at Norway..To calculate deviations from the mean for all the cases we square the deviations and sum them;we call it the total sum of squares or SST: The sum of squared deviations from the regression line is called the error sum of squares or SSE

Chapter 8 – 39 MEASURING THE IMPROVEMENT IN PREDICTION The improvement in the prediction error resulting from our use of the linear prediction equation is called the regression sum of squares or SSR. It is calculated by subtracting SSE from SST or: SSR=SST-SSE

Chapter 8 – 40 EXAMPLE:GNP AND WILLINGNESS TO PAY MORE Calculating the error sum of squares(SSE)

Chapter 8 – 41 Example:GNP and Willingness to Pay More We already have the total sum of squares from Table 4:( SST ) The regression sum of squares or SSR is thus: SSR=SST-SSE =3, ,625.92=406.78

Chapter 8 – 42 Coefficient of Determination Coefficient of Determination (r 2 ) – A PRE measure reflecting the proportional reduction of error that results from using the linear regression model. The total sum of squares(SST) measures the prediction error when the independent variable is ignored(E 1 ): E 1 = SST The error sum of squares(SSE) measures the prediction errors when using the independent variable and the linear regression equation(E 2) : E 2 =SSE

Chapter 8 – 43 Coefficient of Determination Thus... r 2 = 0.13 means: by using GNP and the linear prediction rule to predict Y-the percentage willing to pay higher prices-the error of prediction is reduced by 13 percent(0.13x100). r 2 also reflects the proportion of the total variation in the dependent variable, Y, explained by the independent variable, X.

Chapter 8 – 44 Coefficient of Determination r 2 can also be calculated using this equation……..

Chapter 8 – 45 Pearson’s Correlation Coefficient (r) — The square root of r 2. It is a measure of association between two interval- ratio variables. Symmetrical measure—No specification of independent or dependent variables. Ranges from –1.0 to The sign (  ) indicates direction. The closer the number is to  1.0 the stronger the association between X and Y. The Correlation Coefficient

Chapter 8 – 46 r = 0 means that there is no association between the two variables. The Correlation Coefficient Y X r = 0

Chapter 8 – 47 The Correlation Coefficient Y X r = +1 r = 0 means that there is no association between the two variables. r = +1 means a perfect positive correlation.

Chapter 8 – 48 The Correlation Coefficient Y X r = –1 r = 0 means that there is no association between the two variables. r = +1 means a perfect positive correlation. r = –1 means a perfect negative correlation.

Chapter 8 – 49 Testing the Significance of r 2 using Anova r 2 is an estimate based on sample data. We test it for statistical significance to assess the probability that the linear relationship it expresses is zero in the population. This technique, analysis of variance (Anova) is based on the regression sum of squares(SSR) and the error sum of squares(SSE).

Chapter 8 – 50 Determining df There are df associated with both the regression sum of squares(SSR) and errors sum of squares (SSE). For SSR df=k. K is equal to the number of independent variables in the regression equation. In the bivariate case df=1 For SSE df=N-(K+1). In the bivariate case df=N-2[N-(1+1)]

Chapter 8 – 51 Calculating Mean Squares Mean squares are averages computed by dividing each sum of squares by its corresponding degrees of freedom. To calculate MSR and MSE for our example….

Chapter 8 – 52 The F Statistic The mean squares regression (MSR) and mean squares error (MSE) compose the obtained F-statistic. The larger MSR is relative to MSE, the larger the F ratio and the more likely r 2 is larger than zero in the population. The null hypothesis states that r 2 is zero in the population. Thus….

Chapter 8 – 53 Making a Decision Use Appendix F to determine the probability of F=1.39. Listed are F values for the numerator df-associated with MSR(df=1); and the denominator df associated with MSE(df=9). Choose the table marked “p=.05.” For df(1,9) the critical F= 5.12 The obtained F is smaller than the critical F(1.3<5.12). We cannot reject the null hypothesis. Conclusion: the linear relationship as expressed in r 2, is probably zero in the population.

Chapter 8 – 54 The Anova Table The results of the ANOVA test are often summarized in a table such as Table 9.

Chapter 8 – 55 Multiple Regression An extension of bivariate regression. We examine the effect of two or more independent variables on the dependent variable. The calculations are easily accomplished using SPSS or other statistical software. The General Form of the Multiple Regression Equation(2 independent variables):

Chapter 8 – 56 Multiple Regression =Predicted Y X 1 =The score on independent variable X 1 X 2 =The score on independent variable X 2 a=the Y-intercept or the value of Y when both X 1 and X 2 are equal to zero b 1 =the change in Y with a unit change in X 1, when X 2 is controlled b 2 = the change in Y with a unit change in X 2, when X 1 is controlled

Chapter 8 – 57 Multiple Regression The hypothesis: the higher the state’s expenditure, the lower the teen pregnancy rate. The higher the state’s unemployment rate the higher the teen pregnancy rate. The multiple linear equation is: =Teen pregnancy;X 1 =Unemployment rate,;X 2 =Expenditure per pupil;

Chapter 8 – 58 A state’s teen pregnancy rate( ) goes up by for each 1% increase in the unemployment rate (X 1 ), holding expenditure per pupil (X 2 ) constant. A state’s pregnancy rate goes down by.007 with each $1 increase in the state’s expenditure per pupil (X 2 ), holding the unemployment rate(X 1 ) constant The value of a reflects the state’s teen pregnancy rate when both the unemployment rate and the state’s expenditure per pupil are equal to zero. Interpretation

Chapter 8 – 59 Multiple Regression and Coefficient of Determination -R 2 The coefficient of determination for multiple regression is R 2. It measures the proportional reduction of error that results from using the linear regression model. We obtained an R 2 of.267. This means that by using states’ unemployment rates and expenditures per pupil to predict pregnancy rates the error of prediction is reduced by 26.7%(.267x100).

Chapter 8 – 60 The Multiple Correlation Coefficient- R The square root of R 2 (R), is the multiple correlation coefficient. It measures the linear relationship between the dependent variable and the combined effect of two or more independent variables. For our example, R= It indicates that there is a moderate relationship between teen pregnancy rate and both unemployment rate and expenditure per pupil.

Chapter 8 – 61 ANOVA and R 2 The statistical significance of R 2 is assessed by performing an Anova test, calculating an F ratio and determining its level of significance With 2 and 43 df, we would need an F of 5.18 to reject the null hypothesis that R 2 =0, at the.01 level. The obtained F exceeds the critical F(7.85>5.18). We can reject the null hypothesis with p<. 01. The obtained F Ratio