1 Lecture 8 Regression: Relationships between continuous variables Slides available from Statistics & SPSS page of www.gpryce.com www.gpryce.com Social.

Slides:



Advertisements
Similar presentations
Lesson 10: Linear Regression and Correlation
Advertisements

Inference for Regression
1 SSS II Lecture 1: Correlation and Regression Graduate School 2008/2009 Social Science Statistics II Gwilym Pryce
1 Simple Linear Regression and Correlation The Model Estimating the Coefficients EXAMPLE 1: USED CAR SALES Assessing the model –T-tests –R-square.
AP Statistics Chapters 3 & 4 Measuring Relationships Between 2 Variables.
Lecture 8 Relationships between Scale variables: Regression Analysis
CORRELATON & REGRESSION
1 Lecture 2: ANOVA, Prediction, Assumptions and Properties Graduate School Social Science Statistics II Gwilym Pryce
1 Module II Lecture 3: Misspecification: Non-linearities Graduate School Quantitative Research Methods Gwilym Pryce.
1 Lecture 8 Regression: Relationships between continuous variables Slides available from Statistics & SPSS page of Social.
1 Lecture 2: ANOVA, Prediction, Assumptions and Properties Graduate School Social Science Statistics II Gwilym Pryce
Copyright (c) Bani K. Mallick1 STAT 651 Lecture #18.
Linear Regression with One Regression
Statistics for Business and Economics
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.
SIMPLE LINEAR REGRESSION
Linear Regression and Correlation Analysis
1 Simple Linear Regression Chapter Introduction In this chapter we examine the relationship among interval variables via a mathematical equation.
Correlation and Regression. Correlation What type of relationship exists between the two variables and is the correlation significant? x y Cigarettes.
Regression Chapter 10 Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania.
SIMPLE LINEAR REGRESSION
Correlation and Regression Analysis
Leon-Guerrero and Frankfort-Nachmias,
Introduction to Regression Analysis, Chapter 13,
Simple Linear Regression. Introduction In Chapters 17 to 19, we examine the relationship between interval variables via a mathematical equation. The motivation.
Lecture 5 Correlation and Regression
Correlation & Regression
Chapter 8: Bivariate Regression and Correlation
Lecture 15 Basics of Regression Analysis
Regression and Correlation Methods Judy Zhong Ph.D.
SIMPLE LINEAR REGRESSION
Introduction to Linear Regression and Correlation Analysis
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
Linear Regression and Correlation
Correlation and Linear Regression
Chapter 6 & 7 Linear Regression & Correlation
Faculty of Social Sciences Induction Block: Maths & Statistics Lecture 3 Precise & Approximate Relationships Between Variables Dr Gwilym Pryce.
Statistics for Business and Economics Chapter 10 Simple Linear Regression.
Linear Trend Lines = b 0 + b 1 X t Where is the dependent variable being forecasted X t is the independent variable being used to explain Y. In Linear.
Regression Analysis. Scatter plots Regression analysis requires interval and ratio-level data. To see if your data fits the models of regression, it is.
© 2001 Prentice-Hall, Inc. Statistics for Business and Economics Simple Linear Regression Chapter 10.
Introduction to Linear Regression
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
Ch4 Describing Relationships Between Variables. Section 4.1: Fitting a Line by Least Squares Often we want to fit a straight line to data. For example.
Topic 10 - Linear Regression Least squares principle - pages 301 – – 309 Hypothesis tests/confidence intervals/prediction intervals for regression.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 4 Section 2 – Slide 1 of 20 Chapter 4 Section 2 Least-Squares Regression.
Chapter 14 Inference for Regression AP Statistics 14.1 – Inference about the Model 14.2 – Predictions and Conditions.
Y X 0 X and Y are not perfectly correlated. However, there is on average a positive relationship between Y and X X1X1 X2X2.
Inference for Regression Chapter 14. Linear Regression We can use least squares regression to estimate the linear relationship between two quantitative.
Linear Trend Lines = b 0 + b 1 X t Where is the dependent variable being forecasted X t is the independent variable being used to explain Y. In Linear.
STATISTICS 12.0 Correlation and Linear Regression “Correlation and Linear Regression -”Causal Forecasting Method.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 13-1 Introduction to Regression Analysis Regression analysis is used.
Chapter 9: Correlation and Regression Analysis. Correlation Correlation is a numerical way to measure the strength and direction of a linear association.
AP STATISTICS LESSON 14 – 1 ( DAY 1 ) INFERENCE ABOUT THE MODEL.
CHAPTER 5 CORRELATION & LINEAR REGRESSION. GOAL : Understand and interpret the terms dependent variable and independent variable. Draw a scatter diagram.
Economics 173 Business Statistics Lecture 10 Fall, 2001 Professor J. Petry
All Rights Reserved to Kardan University 2014 Kardan University Kardan.edu.af.
Module II Lecture 1: Multiple Regression
Regression Analysis.
Topic 10 - Linear Regression
Regression Analysis.
Correlation and Regression
CHAPTER 26: Inference for Regression
Undergraduated Econometrics
Simple Linear Regression
SIMPLE LINEAR REGRESSION
Chapter 14 Inference for Regression
Algebra Review The equation of a straight line y = mx + b
Presentation transcript:

1 Lecture 8 Regression: Relationships between continuous variables Slides available from Statistics & SPSS page of Social Science Statistics Module I Gwilym Pryce

Notices: Register

Plan: 1. Linear & Non-linear Relationships 2. Fitting a line using OLS 3. Inference in Regression 4. Omitted Variables & R 2 5. Summary

1. Linear & Non-linear relationships between variables Often of greatest interest in social science is investigation into relationships between variables: –is social class related to political perspective? –is income related to education? –is worker alienation related to job monotony? We are also interested in the direction of causation, but this is more difficult to prove empirically: –our empirical models are usually structured assuming a particular theory of causation

Relationships between scale variables The most straight forward way to investigate evidence for relationship is to look at scatter plots: –traditional to: put the dependent variable (I.e. the “effect”) on the vertical axis –or “y axis” put the explanatory variable (I.e. the “cause”) on the horizontal axis –or “x axis”

Scatter plot of IQ and Income:

We would like to find the line of best fit: Predicted values (i.e. values of y lying on the line of best fit) are given by:

What does the output mean?

Sometimes the relationship appears non-linear:

… straight line of best fit is not always very satisfactory:

Could try a quadratic line of best fit:

We can simulate a non-linear relationship by first transforming one of the variables:

e.g. squaring IQ and taking the natural log of IQ:

… or a cubic line of best fit: ( over-fitted?)

Or could try two linear lines: “structural break”

2. Fitting a line using OLS The most popular algorithm for drawing the line of best fit is one that minimises the sum of squared deviations from the line to each observation: Where: y i = observed value of y = predicted value of y i = the value on the line of best fit corresponding to x i

Regression estimates of a, b using Ordinary Least Squares (OLS): Solving the min[error sum of squares] problem yields estimates of the slope b and y-intercept a of the straight line:

3. Inference in Regression: Hypothesis tests on the slope coefficient: Regressions are usually run on samples, so what can we say about the population relationship between x and y? Repeated samples would yield a range of values for estimates of b ~ N( , s b ) I.e. b is normally distributed with mean =  = population mean = value of b if regression run on population If there is no relationship in the population between x and y, then  = 0, & this is our H 0

What does the standard error mean? Returning to our IQ example:

Hypothesis test on b: (1) H 0 :  = 0 (I.e. slope coefficient, if regression run on population, would = 0) H 1 :  (2)  = 0.05 or 0.01 etc. (3) Reject H 0 iff P <  (N.B. Rule of thumb: P < 0.05 if t c  2, and P < 0.01 if t c  2.6) (4) Calculate P and conclude.

Floor Area Example: You run a regression of house price on floor area which yields the following output. Use this output to answer the following questions: Q/ What is the “Constant”? What does it’s value mean here? Q/ What is the slope coefficient and what does it tell you here? Q/ What is the estimated value of an extra square metre? Q/ How would you test for the existence of a relationship between purchase price and floor area? Q/ How much is a 200m 2 house worth? Q/ How much is a 100m 2 house worth? Q/ On average, how much is the slope coefficient likely to vary from sample to sample? NB Write down your answers – you’ll need them later!

Floor area example: (1) H 0 : no relationship between house price and floor area. H 1 : there is a relationship (2), (3), (4): P = 1- CDF.T(24.469,554) = Reject H 0

4. Omitted Variables & R 2 Q/ is floor area the only factor? Q/ How much of the variation in Price does it explain?

R-square R-square tells you how much of the variation in y is explained by the explanatory variable x –0 < R 2 < 1 (NB: you want R 2 to be near 1). –If more than one explanatory variable, use Adjusted R 2

House Price Example cont’d: Two explanatory variables Q/ How has the estimated value of an extra square metre changed? Q/ Do a hypothesis test for the existence of a relationship between price and number of bathrooms. Q/ How much will an extra bathroom typically add to the value of a house? Q/ What is the value of a 200m 2 house with one bathroom? Compare your estimate with that from the previous model. Q/ What is the value of a 100m 2 house with one bathroom? Compare your estimate with that from the previous model. Q/ What is the value of a 100m 2 house with two bathrooms? Compare your estimate with that from the previous model. Q/ On average, how much is the slope coefficient on floor area likely to vary from sample to sample? Now add number of bathrooms as an extra explanatory variable…

Scatter plot (with floor spikes)

3D Surface Plots: Construction, Price & Unemployment Q = P P U + 3 U 2

Construction Equation in a Slump Q = P - 73 U + 5 U 2

Summary 1. Linear & Non-linear Relationships 2. Fitting a line using OLS 3. Inference in Regression 4. Omitted Variables & R 2

Reading: Regression Analysis: –*Pryce chapter on relationships. –*Field, A. chapters on regression. –*Moore and McCabe Chapters on regression. –Kennedy, P. ‘A Guide to Econometrics’ –Bryman, Alan, and Cramer, Duncan (1999) “Quantitative Data Analysis with SPSS for Windows: A Guide for Social Scientists”, Chapters 9 and 10. –Achen, Christopher H. Interpreting and Using Regression (London: Sage, 1982).