Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Lecture 8 Regression: Relationships between continuous variables Slides available from Statistics & SPSS page of www.gpryce.com www.gpryce.com Social.

Similar presentations


Presentation on theme: "1 Lecture 8 Regression: Relationships between continuous variables Slides available from Statistics & SPSS page of www.gpryce.com www.gpryce.com Social."— Presentation transcript:

1 1 Lecture 8 Regression: Relationships between continuous variables Slides available from Statistics & SPSS page of www.gpryce.com www.gpryce.com Social Science Statistics Module I Gwilym Pryce

2 Notices: Register

3 Plan: 1. Linear & Non-linear Relationships 2. Fitting a line using OLS 3. Inference in Regression 4. Omitted Variables & R 2 5. Summary

4 1. Linear & Non-linear relationships between variables Often of greatest interest in social science is investigation into relationships between variables: –is social class related to political perspective? –is income related to education? –is worker alienation related to job monotony? We are also interested in the direction of causation, but this is more difficult to prove empirically: –our empirical models are usually structured assuming a particular theory of causation

5 Relationships between scale variables The most straight forward way to investigate evidence for relationship is to look at scatter plots: –traditional to: put the dependent variable (I.e. the “effect”) on the vertical axis –or “y axis” put the explanatory variable (I.e. the “cause”) on the horizontal axis –or “x axis”

6 Scatter plot of IQ and Income:

7 We would like to find the line of best fit: Predicted values (i.e. values of y lying on the line of best fit) are given by:

8 What does the output mean?

9 Sometimes the relationship appears non-linear:

10 … straight line of best fit is not always very satisfactory:

11 Could try a quadratic line of best fit:

12 We can simulate a non-linear relationship by first transforming one of the variables:

13 e.g. squaring IQ and taking the natural log of IQ:

14 … or a cubic line of best fit: ( over-fitted?)

15 Or could try two linear lines: “structural break”

16 2. Fitting a line using OLS The most popular algorithm for drawing the line of best fit is one that minimises the sum of squared deviations from the line to each observation: Where: y i = observed value of y = predicted value of y i = the value on the line of best fit corresponding to x i

17 Regression estimates of a, b using Ordinary Least Squares (OLS): Solving the min[error sum of squares] problem yields estimates of the slope b and y-intercept a of the straight line:

18 3. Inference in Regression: Hypothesis tests on the slope coefficient: Regressions are usually run on samples, so what can we say about the population relationship between x and y? Repeated samples would yield a range of values for estimates of b ~ N( , s b ) I.e. b is normally distributed with mean =  = population mean = value of b if regression run on population If there is no relationship in the population between x and y, then  = 0, & this is our H 0

19 What does the standard error mean? Returning to our IQ example:

20 Hypothesis test on b: (1) H 0 :  = 0 (I.e. slope coefficient, if regression run on population, would = 0) H 1 :  (2)  = 0.05 or 0.01 etc. (3) Reject H 0 iff P <  (N.B. Rule of thumb: P < 0.05 if t c  2, and P < 0.01 if t c  2.6) (4) Calculate P and conclude.

21 Floor Area Example: You run a regression of house price on floor area which yields the following output. Use this output to answer the following questions: Q/ What is the “Constant”? What does it’s value mean here? Q/ What is the slope coefficient and what does it tell you here? Q/ What is the estimated value of an extra square metre? Q/ How would you test for the existence of a relationship between purchase price and floor area? Q/ How much is a 200m 2 house worth? Q/ How much is a 100m 2 house worth? Q/ On average, how much is the slope coefficient likely to vary from sample to sample? NB Write down your answers – you’ll need them later!

22 Floor area example: (1) H 0 : no relationship between house price and floor area. H 1 : there is a relationship (2), (3), (4): P = 1- CDF.T(24.469,554) = 0.000000 Reject H 0

23 4. Omitted Variables & R 2 Q/ is floor area the only factor? Q/ How much of the variation in Price does it explain?

24 R-square R-square tells you how much of the variation in y is explained by the explanatory variable x –0 < R 2 < 1 (NB: you want R 2 to be near 1). –If more than one explanatory variable, use Adjusted R 2

25 House Price Example cont’d: Two explanatory variables Q/ How has the estimated value of an extra square metre changed? Q/ Do a hypothesis test for the existence of a relationship between price and number of bathrooms. Q/ How much will an extra bathroom typically add to the value of a house? Q/ What is the value of a 200m 2 house with one bathroom? Compare your estimate with that from the previous model. Q/ What is the value of a 100m 2 house with one bathroom? Compare your estimate with that from the previous model. Q/ What is the value of a 100m 2 house with two bathrooms? Compare your estimate with that from the previous model. Q/ On average, how much is the slope coefficient on floor area likely to vary from sample to sample? Now add number of bathrooms as an extra explanatory variable…

26 Scatter plot (with floor spikes)

27 3D Surface Plots: Construction, Price & Unemployment Q = - 246 + 27 P - 0.2 P 2 - 73 U + 3 U 2

28 Construction Equation in a Slump Q = 315 + 4 P - 73 U + 5 U 2

29 Summary 1. Linear & Non-linear Relationships 2. Fitting a line using OLS 3. Inference in Regression 4. Omitted Variables & R 2

30 Reading: Regression Analysis: –*Pryce chapter on relationships. –*Field, A. chapters on regression. –*Moore and McCabe Chapters on regression. –Kennedy, P. ‘A Guide to Econometrics’ –Bryman, Alan, and Cramer, Duncan (1999) “Quantitative Data Analysis with SPSS for Windows: A Guide for Social Scientists”, Chapters 9 and 10. –Achen, Christopher H. Interpreting and Using Regression (London: Sage, 1982).


Download ppt "1 Lecture 8 Regression: Relationships between continuous variables Slides available from Statistics & SPSS page of www.gpryce.com www.gpryce.com Social."

Similar presentations


Ads by Google