# Multivariate Data/Statistical Analysis SC504/HS927 Spring Term 2008 Week 18: Relationships between variables: simple ordinary least squares (OLS) regression.

## Presentation on theme: "Multivariate Data/Statistical Analysis SC504/HS927 Spring Term 2008 Week 18: Relationships between variables: simple ordinary least squares (OLS) regression."— Presentation transcript:

Multivariate Data/Statistical Analysis SC504/HS927 Spring Term 2008 Week 18: Relationships between variables: simple ordinary least squares (OLS) regression

2 Outline  What is regression analysis?  Scatter plots  Linear regression  Terminology and notation  Interpreting a regression equation  Putting it into practice

3 What is regression analysis? A statistical technique for:  analysing the association between variables (e.g. how is alcohol consumption related to income on average ?)  making conditional predictions (e.g. what do we expect to happen to smoking behaviour if tobacco taxes increase?)  testing hypotheses about the nature of conditional relationships (e.g. on average do crime rates vary in proportion to unemployment rates?)  summarizing/describing data on 2+ variables

4 Scatterplot of suicide against unemployment rates

5 How do we summarise the relationship between suicide and unemployment rates?  Assume a straight-line (linear) relationship between suicide rate (y) and unemployment rate (x): y=a + bx  Estimate a and b by applying ordinary least squares regression to the data in the scatter plot: estimate of a = 1.435 estimate of b = 0.324

6 Method of Least Squares  A method of finding the line that best fits the data  The line of ‘best fit’ is found by ascertaining which line, of all possible lines, results in the least amount of difference between observed data points and the line

7 Scatter with fitted line

8 Interpretation y=1.435 + 0.324x  if unemployment (x) is zero, suicide rates are predicted to be 1.435 per 100,000 population  each 1 percentage point increase in unemployment increases the predicted suicide rate by 0.324  relationship between y and x is not exact so we usually write: y=a + bx + e

9 Terminology and notation y i =a + bx i + e  x i and y i are variables which have different values for each individual/ observation  they vary across cases in dataset (i refers to case (individual) i)  y=dependent variable  x=independent variable  a and b are unknown (not observed) constants  a and b are population parameters  a and b are to be estimated from sample data  e is error/disturbance/residual term

10 a is the y-axis intercept a 0 x y

11 b is the slope or coefficient of x a 0 x y b 1

12 A note on causality  Just because we write: y i =a + bx i + e  Does not mean x causes y  Suppose y = income, x = whether or not someone is an owner-occupier  would turning renters into homeowners increase their incomes?  or is it that you need a good income to be able to purchase a home?  or that people on low incomes are more likely to be eligible for social rented housing

13 What is the relationship between suicide and unemployment?  Which is your ‘dependent’ variable?  Use Graphs – scatter- simple- define-OK  Double click on chart. Go to: Elements-Fit line at Total. You can also change axes by going to: Edit- Select Y [X] axis  For the values, use Analyse – regression - linear

14 SPSS Output R =.702 (simple correlation between suicide and unemployment) R² =.493 (unemployment rates can account for 49% of the variation in suicide rates)

15 a = intercept (constant) = 1.435 b = gradient (unemployment rate per 100) =.324 In 1997, the unemployment rate was 1 (per 100) therefore…… Suicide rate = 1.435 +.324 x 1 = 1.759 (per 100000)

Download ppt "Multivariate Data/Statistical Analysis SC504/HS927 Spring Term 2008 Week 18: Relationships between variables: simple ordinary least squares (OLS) regression."

Similar presentations