Bivariate linear regression ASW, Chapter 12 Economics 224 – Notes for November 12, 2008.

Slides:



Advertisements
Similar presentations
Chapter 12 Simple Linear Regression
Advertisements

Chap 12-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 12 Simple Regression Statistics for Business and Economics 6.
Forecasting Using the Simple Linear Regression Model and Correlation
Hypothesis Testing Steps in Hypothesis Testing:
Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
Learning Objectives Copyright © 2002 South-Western/Thomson Learning Data Analysis: Bivariate Correlation and Regression CHAPTER sixteen.
Learning Objectives Copyright © 2004 John Wiley & Sons, Inc. Bivariate Correlation and Regression CHAPTER Thirteen.
Chapter 12 Simple Linear Regression
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Chapter 13 Multiple Regression
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Chapter 10 Simple Regression.
Chapter 12 Simple Regression
Chapter 12 Multiple Regression
Statistics for Business and Economics
Chapter 13 Introduction to Linear Regression and Correlation Analysis
The Simple Regression Model
SIMPLE LINEAR REGRESSION
Pengujian Parameter Koefisien Korelasi Pertemuan 04 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Chapter Topics Types of Regression Models
Introduction to Probability and Statistics Linear Regression and Correlation.
SIMPLE LINEAR REGRESSION
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Simple Linear Regression Basic Business Statistics 10 th Edition.
Chapter 7 Forecasting with Simple Regression
Introduction to Regression Analysis, Chapter 13,
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Statistics for Business and Economics 7 th Edition Chapter 11 Simple Regression Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch.
SIMPLE LINEAR REGRESSION
Introduction to Linear Regression and Correlation Analysis
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
1 1 Slide © 2005 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
1 1 Slide © 2004 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Statistics for Business and Economics 7 th Edition Chapter 11 Simple Regression Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch.
© 2003 Prentice-Hall, Inc.Chap 13-1 Basic Business Statistics (9 th Edition) Chapter 13 Simple Linear Regression.
Bivariate Regression Assumptions and Testing of the Model Economics 224, Notes for November 17, 2008.
Introduction to Linear Regression
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
Applied Quantitative Analysis and Practices LECTURE#22 By Dr. Osman Sadiq Paracha.
EQT 373 Chapter 3 Simple Linear Regression. EQT 373 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value.
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
Production Planning and Control. A correlation is a relationship between two variables. The data can be represented by the ordered pairs (x, y) where.
Chapter 11 Linear Regression Straight Lines, Least-Squares and More Chapter 11A Can you pick out the straight lines and find the least-square?
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Chapter 13 Multiple Regression
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 13-1 Introduction to Regression Analysis Regression analysis is used.
Statistics for Business and Economics 8 th Edition Chapter 11 Simple Regression Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Ch.
Lecture 10: Correlation and Regression Model.
Chapter Thirteen Copyright © 2006 John Wiley & Sons, Inc. Bivariate Correlation and Regression.
Regression Analysis. 1. To comprehend the nature of correlation analysis. 2. To understand bivariate regression analysis. 3. To become aware of the coefficient.
Statistics for Managers Using Microsoft® Excel 5th Edition
Chapter 12 Simple Linear Regression n Simple Linear Regression Model n Least Squares Method n Coefficient of Determination n Model Assumptions n Testing.
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
Lecture 10 Introduction to Linear Regression and Correlation Analysis.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
Chapter 13 Simple Linear Regression
The simple linear regression model and parameter estimation
Correlation and Simple Linear Regression
Linear Regression and Correlation Analysis
Statistics for Business and Economics (13e)
Simple Linear Regression
Slides by JOHN LOUCKS St. Edward’s University.
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
SIMPLE LINEAR REGRESSION
Simple Linear Regression and Correlation
SIMPLE LINEAR REGRESSION
St. Edward’s University
Presentation transcript:

Bivariate linear regression ASW, Chapter 12 Economics 224 – Notes for November 12, 2008

Regression line For a bivariate or simple regression with an independent variable x and a dependent variable y, the regression equation is y = β 0 + β 1 x + ε. The values of the error term, ε, average to 0 so E(ε) = 0 and E(y) = β 0 + β 1 x. Using observed or sample data for values of x and y, estimates of the parameters β 0 and β 1 are obtained and the estimated regression line is where is the value of y that is predicted from the estimated regression line.

Bivariate regression line x y E(y) = β 0 + β 1 x yiyi ε or error term xixi y = β 0 + β 1 x + ε E(y i )

Observed scatter diagram and estimated least squares line x y ŷ = b 0 + b 1 x y (actual) ŷ (estimated) deviation

Example from SLID 2005 According to human capital theory, increased education is associated with greater earnings. Random sample of 22 Saskatchewan males aged with positive wages and salaries in 2004, from the Survey of Labour and Income Dynamics, Let x be total number of years of school completed (YRSCHL18) and y be wages and salaries in dollars (WGSAL42). Source: Statistics Canada, Survey of Labour and Income Dynamics, 2005 [Canada]: External Cross-sectional Economic Person File [machine readable data file]. From IDLS through UR Data Library.

ID#YRSCHL18WGSAL YRSCHL18 is the variable “number of years of schooling” WGSAL42 is the variable “wages and salaries in dollars, 2004”

x y x y Mean of x is 14.2 and sd is 2.64 years. Mean of y is $45,954 and sd is $21,960. n = 22 cases

y x

Analysis and results H 0 : β 1 = 0. Schooling has no effect on earnings. H 1 : β 1 > 0. Schooling has a positive effect on earnings. From the least squares estimates, using the data for the 22 cases, the regression equation and associate statistics are: y = -13, ,181 x. R 2 = 0.253, r = Standard error of the slope b 0 is 1,606. t = (20 df), significance = At α = 0.05, reject H 0, accept H 1 and conclude that schooling has a positive effect on earnings. Each extra year of schooling adds $4,181 to annual wages and salaries for those in this sample. Expected wages and salaries for those with 20 years of schooling is -13,493 + (4,181 x 20) = $70,127.

Equation of a line y = β 0 + β 1 x. x is the independent variable (on horizontal) and y is the dependent variable (on vertical). β 0 and β 1 are the two parameters that determine the equation of the line. β 0 is the y intercept – determines the height of the line. β 1 is the slope of the line. –Positive, negative, or zero. –Size of β 1 provides an estimate of the manner that x is related to y.

Positive Slope: β 1 > 0 x y β0β0 ΔxΔx ΔyΔy Example – schooling (x) and earnings (y).

Negative Slope: β 1 < 0 x y β0β0 ΔxΔx ΔyΔy Example – higher income (x) associated with fewer trips by bus (y).

Zero Slope: β 1 = 0 x y β0β0 ΔxΔx Example – amount of rainfall (x) and student grades (y)

Infinite Slope: β 1 =  x y

Infinite number of possible lines can be drawn. Find the straight line that best fits the points in the scatter diagram.

Least squares method (ASW, 469) Find estimates of β 0 and β 1 that produce a line that fits the points the best. The most commonly used criterion is least squares. The least squares line is the unique line for which the sum of the squares of the deviations of the y values from the line is as small as possible. Minimize the sum of the squares of the errors ε. Or, equivalent to this, minimize the sum of the squares of the differences of the y values from the values of E(y). That is, find b 0 and b 1 that minimize:

Least squares line Let the n observed values of x and y be termed x i and y i, where i = 1, 2, 3,..., n. ∑ε 2 is minimized when b 0 and b 1 take on the following values:

ProvinceIncomeAlcohol Newfoundland Prince Edward Island Nova Scotia New Brunswick Quebec Ontario Manitoba Saskatchewan Alberta British Columbia Income is family income in thousands of dollars per capita, (independent variable) Alcohol is litres of alcohol consumed per person 15 years of age or over, (dependent variable) Is alcohol a superior good? Sources: Saskatchewan Alcohol and Drug Abuse Commission, Fast Factsheet, Regina, 1988 Statistics Canada, EconomIc Families – 1986 [machine-readable data file, 1988.

Hypotheses H 0 : β 1 = 0. Income has no effect on alcohol consumption. H 1 : β 1 > 0. Income has a positive effect on alcohol consumption.

Provincexyx-barxy-bary(x-barx)(y-bary)x-barx sq Newfoundland PEI Nova Scotia New Brunswick Quebec Ontario Manitoba Saskatchewan Alberta British Columbia sum E E mean b b

SUMMARY OUTPUT Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations10 ANOVA dfSSMSF Significance F Regression Residual Total Coefficients Standard Errort StatP-value Intercept X Variable Analysis. b 1 = and its standard error is 0.076, for a t value of At α = 0.01, the null hypothesis can be rejected (ie. with H 0, the probability of a t this large or larger is ) and the alternative hypothesis accepted. At 0.01 significance, there is evidence that alcohol is a superior good, ie. that income has a positive effect on alcohol consumption.

Uses of regression line Draw line – select two x values (eg. 26 and 36) and compute the predicted y values (8.1 and 10.8, respectively). Plot these points and draw line. Interpolation. If a city had a mean income of $32,000, the expected level of alcohol consumption would be 9.7 litres per capita.

Extrapolation Suppose a city had a mean income of $50,000 in From the equation, expected alcohol consumption would be 14.6 litres per capita. Cautions: –Model was tested over the range of income values from 26 to 36 thousand dollars. While it appears to be close to a straight line over this range, there is no assurance that a linear relation exists outside this range. –Model does not fit all points – only 62% of the variation in alcohol consumption is explained by this linear model. –Confidence intervals for prediction become larger the further the independent variable x is from its mean.

Change in y resulting from change in x Estimate of change in y resulting from a change in x is b 1. For the alcohol consumption example, b 1 = A 10.0 thousand dollar increase in income is associated with a 2.76 per litre increase in annual alcohol consumption per capita, at least over the range estimated. This can be used to calculate the income elasticity for alcohol consumption.

Goodness of fit (ASW, 12.3) y is the dependent variable, or the variable to be explained. How much of y is explained statistically from the regression model, in this case the line? Total variation in y is termed the total sum of squares, or SST. The common measure of goodness of fit of the line is the coefficient of determination, the proportion of the variation or SST that is “explained” by the line.

SST or total variation of y Difference of any observed value of y from the mean is the difference between the observed and predicted value plus the difference of the predicted value from the mean of y. From this, it can be proved that: Difference from mean “Error” of prediction Value of y “explained” by the line SST= Total variation of y SSE = “Unexplained” or “error” variation of y SSR = “Explained” variation of y

29 Variation in y x y ŷ = b 0 + b 1 x yiyi ŷiŷi xixi

30 x y ŷ = b 0 + b 1 x yiyi ŷiŷi xixi Variation in y “explained” by the line “Explained” portion

31 Variation in y that is “unexplained” or error x y yiyi ŷiŷi xixi y i – ŷ i ŷ = b 0 + b 1 x ‘Unexplained” or error

Coefficient of determination The coefficient of determination, r 2 or R 2 (the notation used in many texts), is defined as the ratio of the “explained” or regression sum of squares, SSR, to the total variation or sum of squares, SST. The coefficient of determination is the square of the correlation coefficient r. As noted by ASW (483), the correlation coefficient, r, is the square root of the coefficient of determination, but with the same sign (positive or negative) as b 1.

Calculations for: ProvincexyPredicted YResidualsSSESSRSST Nfld PEI NS NB Que Ont Man SK Alb BC R squared

SUMMARY OUTPUT Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations10 ANOVA dfSSMSFSignificance F Regression Residual Total911.08

Interpretation of R 2 Proportion, or percentage if multiplied by 100, of the variation in the dependent variable that is statistically explained by the regression line. 0  R 2  1. Large R 2 means the line fits the observed points well and the line explains a lot of the variation in the dependent variable, at least in statistical terms. Small R 2 means the line does not fit the observed points very well and the line does not explain much of the variation in the dependent variable. –Random or error component dominates. –Missing variables. –Relationship between x and y may not be linear.

How large is a large R 2 ? Extent of relationship – weak relationship associated with low value and strong relationship associated with large value. Type of data –Micro/survey data associated with small values of R 2. For schooling/earnings example, R 2 = Much individual variation. –Grouped data associated with larger values of R 2. In income/alcohol example, R 2 = Grouping averages out individual variation. –Time series data often results in very high R 2. In consumption function example (next slide), R 2 = Trends often move together.

GDP Consumption Consumption (y) and GDP (x), Canada, 1995 to 2004, quarterly data

Beware of R 2 Difficult to compare across equations, especially with different types of data and forms of relationships. More variables added to model can increase R 2. Adjusted R 2 can correct for this. ASW, Chapter 13. Grouped or averaged observations can result in larger values of R 2. Need to test for statistical significance. We want good estimates of β 0 and β 1, rather than high R 2. At the same time, for similar types of data and issues, a model with a larger value of R 2 may be preferable to one with a smaller value.

Next day Assumptions of regression model. Testing for statistical significance.