Faculty of Social Sciences Induction Block: Maths & Statistics Lecture 3 Precise & Approximate Relationships Between Variables Dr Gwilym Pryce.

Slides:



Advertisements
Similar presentations
Multivariate Data/Statistical Analysis SC504/HS927 Spring Term 2008 Week 18: Relationships between variables: simple ordinary least squares (OLS) regression.
Advertisements

Applied Econometrics Second edition
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
1 SSS II Lecture 1: Correlation and Regression Graduate School 2008/2009 Social Science Statistics II Gwilym Pryce
Learning Objectives Copyright © 2004 John Wiley & Sons, Inc. Bivariate Correlation and Regression CHAPTER Thirteen.
Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester Eng. Tamer Eshtawi First Semester
Learning Objectives 1 Copyright © 2002 South-Western/Thomson Learning Data Analysis: Bivariate Correlation and Regression CHAPTER sixteen.
Regression Analysis Using Excel. Econometrics Econometrics is simply the statistical analysis of economic phenomena Here, we just summarize some of the.
AP Statistics Chapters 3 & 4 Measuring Relationships Between 2 Variables.
Lecture 8 Relationships between Scale variables: Regression Analysis
1 Lecture 8 Regression: Relationships between continuous variables Slides available from Statistics & SPSS page of Social.
1 Lecture 8 Regression: Relationships between continuous variables Slides available from Statistics & SPSS page of Social.
Copyright (c) Bani K. Mallick1 STAT 651 Lecture #18.
Class 5: Thurs., Sep. 23 Example of using regression to make predictions and understand the likely errors in the predictions: salaries of teachers and.
Introduction to simple linear regression ASW, Economics 224 – Notes for November 5, 2008.
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
1 1 Slide Simple Linear Regression Chapter 14 BA 303 – Spring 2011.
Regression and Correlation
Lecture 5 Correlation and Regression
Chapter 8: Bivariate Regression and Correlation
Graphical Analysis. Why Graph Data? Graphical methods Require very little training Easy to use Massive amounts of data can be presented more readily Can.
Chapter 6 & 7 Linear Regression & Correlation
What is the MPC?. Learning Objectives 1.Use linear regression to establish the relationship between two variables 2.Show that the line is the line of.
AP STATISTICS LESSON 3 – 3 LEAST – SQUARES REGRESSION.
1 Chapter 10 Correlation and Regression 10.2 Correlation 10.3 Regression.
Chapter 8 – 1 Chapter 8: Bivariate Regression and Correlation Overview The Scatter Diagram Two Examples: Education & Prestige Correlation Coefficient Bivariate.
© 2001 Prentice-Hall, Inc. Statistics for Business and Economics Simple Linear Regression Chapter 10.
Section 5.2: Linear Regression: Fitting a Line to Bivariate Data.
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
Chapter 10 Correlation and Regression
Introduction to Probability and Statistics Thirteenth Edition Chapter 12 Linear Regression and Correlation.
Regression Regression relationship = trend + scatter
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
STATISTICS 12.0 Correlation and Linear Regression “Correlation and Linear Regression -”Causal Forecasting Method.
Relationships If we are doing a study which involves more than one variable, how can we tell if there is a relationship between two (or more) of the.
Examining Bivariate Data Unit 3 – Statistics. Some Vocabulary Response aka Dependent Variable –Measures an outcome of a study Explanatory aka Independent.
CHAPTER 5 Regression BPS - 5TH ED.CHAPTER 5 1. PREDICTION VIA REGRESSION LINE NUMBER OF NEW BIRDS AND PERCENT RETURNING BPS - 5TH ED.CHAPTER 5 2.
Chapter Thirteen Copyright © 2006 John Wiley & Sons, Inc. Bivariate Correlation and Regression.
1 Association  Variables –Response – an outcome variable whose values exhibit variability. –Explanatory – a variable that we use to try to explain the.
Correlation – Recap Correlation provides an estimate of how well change in ‘ x ’ causes change in ‘ y ’. The relationship has a magnitude (the r value)
Business Statistics for Managerial Decision Making
Regression Analysis. 1. To comprehend the nature of correlation analysis. 2. To understand bivariate regression analysis. 3. To become aware of the coefficient.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.
STATISTICS 12.0 Correlation and Linear Regression “Correlation and Linear Regression -”Causal Forecasting Method.
All Rights Reserved to Kardan University 2014 Kardan University Kardan.edu.af.
Maths Study Centre CB Open 11am – 5pm Semester Weekdays
EXCEL DECISION MAKING TOOLS AND CHARTS BASIC FORMULAE - REGRESSION - GOAL SEEK - SOLVER.
Chapter 14 Introduction to Regression Analysis. Objectives Regression Analysis Uses of Regression Analysis Method of Least Squares Difference between.
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 9 l Simple Linear Regression 9.1 Simple Linear Regression 9.2 Scatter Diagram 9.3 Graphical.
Bivariate Regression. Bivariate Regression analyzes the relationship between two variables. Bivariate Regression analyzes the relationship between two.
Stats Methods at IC Lecture 3: Regression.
Module II Lecture 1: Multiple Regression
The simple linear regression model and parameter estimation
32931 Technology Research Methods Autumn 2017 Quantitative Research Component Topic 4: Bivariate Analysis (Contingency Analysis and Regression Analysis)
Chapter 4: Basic Estimation Techniques
Regression and Correlation
Bivariate & Multivariate Regression Analysis
CHAPTER 3 Describing Relationships
Basic Estimation Techniques
3.1 Examples of Demand Functions
Basic Estimation Techniques
CHAPTER 26: Inference for Regression
BIVARIATE ANALYSIS: Measures of Association Between Two Variables
M248: Analyzing data Block D UNIT D2 Regression.
Correlation and Regression
BIVARIATE ANALYSIS: Measures of Association Between Two Variables
Introduction to simple linear regression
BEC 30325: MANAGERIAL ECONOMICS
Presentation transcript:

Faculty of Social Sciences Induction Block: Maths & Statistics Lecture 3 Precise & Approximate Relationships Between Variables Dr Gwilym Pryce

Plan: n 1. Introduction n 2. Precise Relationships n 3. Approximate Relationships n 4. Relationships between categorical variables

A token of transatlantic friendship… the relationship between variables:

1. Introduction to relationships between variables n Often of greatest interest in social science is investigation into relationships between variables: –is social class related to political perspective? –is income related to education? –is work alienation related to job monotony? n We are also interested in the direction of causation, but this is more difficult to prove empirically: –our empirical models are usually structured assuming a particular theory of causation

Exercise: n Q/ Does the main research question that interests you involve a relationship between variables? n Think about: –what the variables are –the direction of causation –the rationale for this causation –whether it is a precise or approximate relationship

2. Precise relationships n No random or error component: Circumference = 3.14 Diameter –(linear) Fahrenheit = / 5 Centigrade –(linear) F = ma –(non-linear) –where F = force; m = mass; a = acceleration e = mc 2 –(non-linear) –where e = energy; m = mass; c = speed of light

–linear relationships have straight line graphical representations –non-linear relationships have curved graphical representations

Precise Linear Relationships n Exercise: –Write a column of integers from 0 to 10 and call this variable ‘C’ –Then construct a new column called ‘F’ where F = C –Then plot F and C on a graph with F on the vertical axis, and C on the horizontal axis.

Equation of a straight line: n Traditional to: –call the dependent variable “y” I.e. the variable that’s being determined or explained –call the explanatory variable “x” I.e. the determinant of y; the factor that explains the variation in y

n y = a + bx where: a is the vertical intercept »measures how much y would be if x is zero »changes in a simply move the line up or down in parallel shifts b is the slope coefficient »measures how much y increases for every unit increase in x »the greater the value of b the steeper the slope and the more sensitive y is to x.

Graphing exact relationships n Axes: –put the dependent variable y on the vertical axis –put the explanatory variable x on the horizontal axis n Equation is fully summarised with a line

3. Approximate relationships n In social science/epidemiology/history we don’t tend to get precise relationships –e.g. Relationship between heart disease and smoking –e.g. Educational achievement and social class of parents –e.g. Rate of teenage pregnancy and area deprivation

Modelling approximate relationships: n Such relationships can sometimes be approximated/summarised by a precise relationship plus an error term: –Linear: Risk Heart disease = a + b no. cigs + e y = a + b x + e –Multivariate: y = a + b x + c z + e –Non-linear: y = a + b x 2 + e

Graphing approximate relationships n The most straight forward way to investigate evidence for relationship is to look at scatter plots: –Again, traditional to: put the dependent variable (I.e. the “effect”) on the vertical axis –or “y axis” put the explanatory variable (I.e. the “cause”) on the horizontal axis –or “x axis”

Scatter plot of IQ and Income:

We would like to find the line of best fit:

Sometimes the relationship appears non-linear:

… and so a straight line of best fit is not always very satisfactory:

Could try a quadratic line of best fit:

… or a cubic line of best fit: ( overfitted?)

Could try two linear lines: “structural break”

Q/How do we best fit a straight line? n A/ Regression analysis –The most popular algorithm for drawing the line of best fit –minimises the sum of squared deviations from the line to each observation –also called ‘Ordinary Least Squares’ (OLS) Where: y i = observed value of y = predicted value of y i = the value on the line of best fit corresponding to x i

Regression estimates of a, b: n This algorithm yields estimates of the slope b and y-intercept a of the straight line –b is usually the parameter of most interest since it tells us what happens to y if x increases by 1.

But sometimes the line of best fit doesn’t seem to explain the variation in y very well: Q/ Why do you think this might be?

Is floor area the only factor? What other variables determine purchase price?

Omitted explanatory variables: n If the line of best fit doesn’t seem to explain much of the variation in y this might be because there are other factors determining y:

Scatter plot (with floor spikes)

Fitting non-linear lines of best fit: n Regression analysis can be used to summarise non-linear relationships, both bi-variate and multivariate: –e.g. y = a + b x 2 + cz 2 multivariate and quadratic in x and z –e.g. y = a + b x + cz 2 multivariate: linear relationship between y and x but quadratic relationship between y and z

3D Surface Plots: Construction, Price & Unemployment

Construction Equation in a Slump => new construction has a linear relationship with Price, but a quatratic relationship with unemployment.

4. Relationships between categorical variables: n The easiest way to represent relationships between categorical variables is to use contingency tables –also called cross-tabulations or cross tabs –also called two way tables n They show the number of observations (or % of observations) in particular categories and naturally lead to a test of independence which has a Chi-square (or “  2 ”) distribution.

Contingency Tables in SPSS:

n Most basic cross tab just lists the count in each category: n You can add % in each category by returning to the cross-tabs window, select the cells button, and choose which percentages you want:

n If you select all three (row, column and total), you will end up with: