Presentation is loading. Please wait.

Presentation is loading. Please wait.

Faculty of Social Sciences Induction Block: Maths & Statistics Lecture 3 Precise & Approximate Relationships Between Variables Dr Gwilym Pryce.

Similar presentations


Presentation on theme: "Faculty of Social Sciences Induction Block: Maths & Statistics Lecture 3 Precise & Approximate Relationships Between Variables Dr Gwilym Pryce."— Presentation transcript:

1 Faculty of Social Sciences Induction Block: Maths & Statistics Lecture 3 Precise & Approximate Relationships Between Variables Dr Gwilym Pryce

2 Plan: n 1. Introduction n 2. Precise Relationships n 3. Approximate Relationships n 4. Relationships between categorical variables

3 A token of transatlantic friendship… the relationship between variables:

4

5 1. Introduction to relationships between variables n Often of greatest interest in social science is investigation into relationships between variables: –is social class related to political perspective? –is income related to education? –is work alienation related to job monotony? n We are also interested in the direction of causation, but this is more difficult to prove empirically: –our empirical models are usually structured assuming a particular theory of causation

6 Exercise: n Q/ Does the main research question that interests you involve a relationship between variables? n Think about: –what the variables are –the direction of causation –the rationale for this causation –whether it is a precise or approximate relationship

7 2. Precise relationships n No random or error component: Circumference = 3.14 Diameter –(linear) Fahrenheit = 32 + 9 / 5 Centigrade –(linear) F = ma –(non-linear) –where F = force; m = mass; a = acceleration e = mc 2 –(non-linear) –where e = energy; m = mass; c = speed of light

8 –linear relationships have straight line graphical representations –non-linear relationships have curved graphical representations

9 Precise Linear Relationships n Exercise: –Write a column of integers from 0 to 10 and call this variable ‘C’ –Then construct a new column called ‘F’ where F = 32 + 2C –Then plot F and C on a graph with F on the vertical axis, and C on the horizontal axis.

10

11

12 Equation of a straight line: n Traditional to: –call the dependent variable “y” I.e. the variable that’s being determined or explained –call the explanatory variable “x” I.e. the determinant of y; the factor that explains the variation in y

13 n y = a + bx where: a is the vertical intercept »measures how much y would be if x is zero »changes in a simply move the line up or down in parallel shifts b is the slope coefficient »measures how much y increases for every unit increase in x »the greater the value of b the steeper the slope and the more sensitive y is to x.

14 Graphing exact relationships n Axes: –put the dependent variable y on the vertical axis –put the explanatory variable x on the horizontal axis n Equation is fully summarised with a line

15

16

17

18 3. Approximate relationships n In social science/epidemiology/history we don’t tend to get precise relationships –e.g. Relationship between heart disease and smoking –e.g. Educational achievement and social class of parents –e.g. Rate of teenage pregnancy and area deprivation

19 Modelling approximate relationships: n Such relationships can sometimes be approximated/summarised by a precise relationship plus an error term: –Linear: Risk Heart disease = a + b no. cigs + e y = a + b x + e –Multivariate: y = a + b x + c z + e –Non-linear: y = a + b x 2 + e

20 Graphing approximate relationships n The most straight forward way to investigate evidence for relationship is to look at scatter plots: –Again, traditional to: put the dependent variable (I.e. the “effect”) on the vertical axis –or “y axis” put the explanatory variable (I.e. the “cause”) on the horizontal axis –or “x axis”

21 Scatter plot of IQ and Income:

22 We would like to find the line of best fit:

23 Sometimes the relationship appears non-linear:

24 … and so a straight line of best fit is not always very satisfactory:

25 Could try a quadratic line of best fit:

26 … or a cubic line of best fit: ( overfitted?)

27 Could try two linear lines: “structural break”

28 Q/How do we best fit a straight line? n A/ Regression analysis –The most popular algorithm for drawing the line of best fit –minimises the sum of squared deviations from the line to each observation –also called ‘Ordinary Least Squares’ (OLS) Where: y i = observed value of y = predicted value of y i = the value on the line of best fit corresponding to x i

29 Regression estimates of a, b: n This algorithm yields estimates of the slope b and y-intercept a of the straight line –b is usually the parameter of most interest since it tells us what happens to y if x increases by 1.

30 But sometimes the line of best fit doesn’t seem to explain the variation in y very well: Q/ Why do you think this might be?

31 Is floor area the only factor? What other variables determine purchase price?

32 Omitted explanatory variables: n If the line of best fit doesn’t seem to explain much of the variation in y this might be because there are other factors determining y:

33 Scatter plot (with floor spikes)

34 Fitting non-linear lines of best fit: n Regression analysis can be used to summarise non-linear relationships, both bi-variate and multivariate: –e.g. y = a + b x 2 + cz 2 multivariate and quadratic in x and z –e.g. y = a + b x + cz 2 multivariate: linear relationship between y and x but quadratic relationship between y and z

35 3D Surface Plots: Construction, Price & Unemployment

36 Construction Equation in a Slump => new construction has a linear relationship with Price, but a quatratic relationship with unemployment.

37 4. Relationships between categorical variables: n The easiest way to represent relationships between categorical variables is to use contingency tables –also called cross-tabulations or cross tabs –also called two way tables n They show the number of observations (or % of observations) in particular categories and naturally lead to a test of independence which has a Chi-square (or “  2 ”) distribution.

38 Contingency Tables in SPSS:

39

40 n Most basic cross tab just lists the count in each category: n You can add % in each category by returning to the cross-tabs window, select the cells button, and choose which percentages you want:

41 n If you select all three (row, column and total), you will end up with:

42


Download ppt "Faculty of Social Sciences Induction Block: Maths & Statistics Lecture 3 Precise & Approximate Relationships Between Variables Dr Gwilym Pryce."

Similar presentations


Ads by Google