Sociology 601, Class17: October 27, 2009 Linear relationships. A & F, chapter 9.1 Least squares estimation. A & F 9.2 The linear regression model (9.3)

Slides:



Advertisements
Similar presentations
Chap 12-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 12 Simple Regression Statistics for Business and Economics 6.
Advertisements

13- 1 Chapter Thirteen McGraw-Hill/Irwin © 2005 The McGraw-Hill Companies, Inc., All Rights Reserved.
1 SSS II Lecture 1: Correlation and Regression Graduate School 2008/2009 Social Science Statistics II Gwilym Pryce
Regression Analysis Simple Regression. y = mx + b y = a + bx.
Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
Simple Linear Regression. G. Baker, Department of Statistics University of South Carolina; Slide 2 Relationship Between Two Quantitative Variables If.
Simple Linear Regression and Correlation
Lecture 9 Today: Ch. 3: Multiple Regression Analysis Example with two independent variables Frisch-Waugh-Lovell theorem.
Sociology 601 Class 17: October 28, 2009 Review (linear regression) –new terms and concepts –assumptions –reading regression computer outputs Correlation.
Simple Linear Regression
Lecture 4 This week’s reading: Ch. 1 Today:
Sociology 601 Class 19: November 3, 2008 Review of correlation and standardized coefficients Statistical inference for the slope (9.5) Violations of Model.
Sociology 601 Class 21: November 10, 2009 Review –formulas for b and se(b) –stata regression commands & output Violations of Model Assumptions, and their.
Chapter 10 Simple Regression.
Chapter 12 Simple Regression
Statistics for Business and Economics
Chapter 13 Introduction to Linear Regression and Correlation Analysis
The Simple Regression Model
SIMPLE LINEAR REGRESSION
1 Review of Correlation A correlation coefficient measures the strength of a linear relation between two measurement variables. The measure is based on.
So far, we have considered regression models with dummy variables of independent variables. In this lecture, we will study regression models whose dependent.
Chapter Topics Types of Regression Models
Simple Linear Regression Analysis
Introduction to Probability and Statistics Linear Regression and Correlation.
SIMPLE LINEAR REGRESSION
1 BA 555 Practical Business Analysis Review of Statistics Confidence Interval Estimation Hypothesis Testing Linear Regression Analysis Introduction Case.
Simple Linear Regression and Correlation
Linear Regression/Correlation
EDUC 200C Section 4 – Review Melissa Kemmerle October 19, 2012.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
© 2011 Pearson Education, Inc. Statistics for Business and Economics Chapter 10 Simple Linear Regression.
Lecture 5 Correlation and Regression
Correlation & Regression
Statistics for Business and Economics 7 th Edition Chapter 11 Simple Regression Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch.
Objectives of Multiple Regression
Regression and Correlation Methods Judy Zhong Ph.D.
SIMPLE LINEAR REGRESSION
Introduction to Linear Regression and Correlation Analysis
Chapter 15 Correlation and Regression
Statistics for Business and Economics Chapter 10 Simple Linear Regression.
EDUC 200C Section 3 October 12, Goals Review correlation prediction formula Calculate z y ’ = r xy z x for a new data set Use formula to predict.
Statistics for Business and Economics 7 th Edition Chapter 11 Simple Regression Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch.
What is the MPC?. Learning Objectives 1.Use linear regression to establish the relationship between two variables 2.Show that the line is the line of.
© 2001 Prentice-Hall, Inc. Statistics for Business and Economics Simple Linear Regression Chapter 10.
Introduction to Linear Regression
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Introduction to Probability and Statistics Thirteenth Edition Chapter 12 Linear Regression and Correlation.
Warsaw Summer School 2015, OSU Study Abroad Program Regression.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
CENTRE FOR INNOVATION, RESEARCH AND COMPETENCE IN THE LEARNING ECONOMY Session 3: Basic techniques for innovation data analysis. Part II: Introducing regression.
Biostat 200 Lecture Simple linear regression Population regression equationμ y|x = α +  x α and  are constants and are called the coefficients.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
1 CHANGES IN THE UNITS OF MEASUREMENT Suppose that the units of measurement of Y or X are changed. How will this affect the regression results? Intuitively,
SEMILOGARITHMIC MODELS 1 This sequence introduces the semilogarithmic model and shows how it may be applied to an earnings function. The dependent variable.
Regression Analysis Presentation 13. Regression In Chapter 15, we looked at associations between two categorical variables. We will now focus on relationships.
11-1 Copyright © 2014, 2011, and 2008 Pearson Education, Inc.
The simple linear regression model and parameter estimation
Chapter 20 Linear and Multiple Regression
Inference for Least Squares Lines
Chapter 11 Simple Regression
Relationship with one independent variable
Relationship with one independent variable
SIMPLE LINEAR REGRESSION
SIMPLE LINEAR REGRESSION
Introduction to Econometrics, 5th edition
Presentation transcript:

Sociology 601, Class17: October 27, 2009 Linear relationships. A & F, chapter 9.1 Least squares estimation. A & F 9.2 The linear regression model (9.3) 1

Example of a linear Relationship 2

Equation for a linear relationship A linear relationship is a relationship between two variables Y and X that can be defined by the equation: Y =  +  X Y is the value for the response variable X is the value for the explanatory variable  is the Y-intercept  is the slope 3

Example of a linear relationship Change over time in attitudes about gender Y = α +βX Y (the response variable) = % disagree that men make better politicians than women X (the explanatory variable) = year of survey α(Y-intercept) = value of y when x=0 β(the slope) = change in y per unit of x 4

Example of a linear relationship Change over time in attitudes about gender Y hat = a +bX = *year yearobservedpredicted %47.6% %49.0% %51.6% %52.9% %58.2% %59.5% %62.2% %63.5% %66.1% %67.5% %68.8% %70.1% %72.7% %74.1% %76.7% 5

The Dangers of Extrapolation Change over time in attitudes about gender Y hat = a +bX = *year yearobservedpredicted %47.6% %76.7% %92.6% 2020?108.4% 6

Example of a linear Relationship 7

Key terms for linear relationships Explanatory variable: a variable that we think of as explaining or “causing” the value of another variable. (also called the independent variable) We reserve X to denote the explanatory variable Response variable: a variable that we think of as being explained or “caused” by the value of another variable. (also called the dependent variable) We reserve Y to denote the response variable (Q: what happens if both variables explain each other?) 8

More key terms for linear relationships  : the slope of a linear relationship  : the increment in y per one unit of x o If  > 0, the relationship between the explanatory and response variables is positive. o If  < 0, the relationship between the explanatory and response variables is negative. o If  = 0, the explanatory and response variables are said to be independent. if x is multiplied by 12 (e.g., months rather than years), then  ’ = ? if x is divided by 10 (e.g., decades rather than years), then  ’ = ? if y is multiplied by 100 (e.g., percentage points rather than proportion), then  ’ = ? if you subtract 1974 from x, then  ’ = ? 9

More key terms for linear relationships  : the y-intercept of a linear relationship  is the value of y when x = 0. o this is sometimes a meaningless value of x way beyond its observed range.  : determines the height of the line up or down on the y-axis if x is multiplied by 12 (e.g., months rather than years), then  ’ = ? if x is divided by 10 (e.g., decades rather than years), then  ’ = ? if y is multiplied by 100 (e.g., percentage points rather than proportion), then  ’ = ? if you subtract 1974 from x, then  ’ = ? (note:  and  are both population parameters like  ) 10

More key terms for linear relationships model: a formula that provides a simple approximation for the relationship between variables. The linear function is the simplest model for a relationship between two interval scale variables. Regression analysis: using linear models to study… o the form of a relationship between variables o the strength of a relationship between variables o whether a statistically significant relationship exists between variables 11

Another Example of a linear Relationship 12

9.2 Predicting Y-scores using least squares regression Next, we study relationships between two variables where there are multiple cases of X, and Y scores do not always line up on a straight line. There is some scatter to the data points. The objective is still to predict a value of Y, given a value of X. 13

Linear prediction: an example. Chaves, M. and D.E. Cann “Regulation, Pluralism, and Religious Market Structure.” Rationality and Society 4(3): observations for 18 countries outcome var: weekly percent attending religious services variable name – “attend” explanatory var: level of state regulation of religion variable name – “regul” (not really interval scale), ordinal ranking

Plotting a linear relationship in STATA. plot attend regul 82 + | * | a | t | e | n | * * d | | | * * | * * * | | * * | 3 + * * regul 6 15

Solving a least squares regression, using STATA. regress attend regul Source | SS df MS Number of obs = F( 1, 16) = 9.65 Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = attend | Coef. Std. Err. t P>|t| [95% Conf. Interval] regul | _cons | b is the coefficient for “regul”. a is the coefficient for “_cons”. (ignore all the other output for now) %attend = * regul 16

Finding the predicted values of religious attendance (y hat) for each observed level of regulation (x) regulcalculationy hat = predicted attendance – 5.4*036.8% – 5.4*131.5% – 5.4*226.1% – 5.4*320.8% – 5.4*415.4% – 5.4*510.0% – 5.4*64.7% %attend = * regul 17

Finding the predicted values for each observed level of X, using STATA. predict pattend (option xb assumed; fitted values). tabulate pattend regul Fitted | regul values | | Total | | | | | | | | | | | | Total | | 18 you can use the predict command only after you have used regress to estimate the regression function. 18

Plotting the predicted values for each observed level of X - in STATA. plot pattend regul | * | F | i | * t | e | * d | | v | * a | l | u | e | s | | * | * regul 6 19

Interpreting a regression When the data are scattered, we need to ask two questions: o Is the data suitable for a linear model? o If so, how do we draw a line through it? Checking suitability (i.e,. assumptions) scattergrams crosstabs (including means and sd’s by x-levels) The assumptions of a linear regression are violated if o the plot / crosstab suggests a nonlinear relationship o there are severe outliers (extreme x or y scores) o there is evidence of heteroskedasticity (the amount of “scatter” of the dots depends on the x- score) 20

Possible prediction methods Once you have decided that a linear model is appropriate, how do you choose a linear equation with a scattered mess of dots? A.) Calculate a slope from any two points? B.) Calculate the average slope of all the points (with the least error)? C.) Calculate the slope with the least squared error)? All these solutions may be technically unbiased, but C. is generally accepted as the most efficient. (C gives a slope that is, on average, closest to the slope of the population.) 21

Least squares prediction: formal terms population equation for a linear model: Y =  +  X + ε equation for a given observation: Y i = a + bX i + e i where Y i and X i are observed values of Y and X, and e i is the error in observation Y i. prediction for a given value of X, based on a sample: Y hat = a + bX, where Y hat is the predicted value of Y Note that Y i – Y hat = e i = residual for observation i 22

Least squares prediction: equation for b goal for a given sample: estimate b and a such that  (Y i – Y hat ) 2 is as small as possible. (To derive the solution: start with Q =  (Y i – a - bX i ) 2, take partial differentials of Q with respect to a and b, and solve for relative minima. This will not be tested in class!) solution: 23

Least squares prediction: more terms  (Y i – Y hat ) 2 is also called the sum of squared errors or SSE. (Also called the residual sum of squares, the squared errors in the response variable left over after you control for variation due to the explanatory variable.) The method that calculates b and a to produce the smallest possible SSE is the method of least squares. b and a are least squares estimates The prediction line Y hat = a + bX is the least squares line 24

Least squares prediction: still more terms For a given observation, the prediction error e i (Y i – Y hat ) is called the residual. An atypical X or Y score or a large residual can be called an outlier. o outliers can bias an estimate of a slope o outliers can increase the possibility of a type I error of inference. o outlier Y scores are especially troublesome when they are associated with extreme values of X. o outliers sometimes belong in the data, sometimes not. o Q: DC homicide rates? 25

Calculating the residuals for each observation, using STATA. predict rattend, residuals. summarize attend pattend rattend if country=="Ireland" Variable | Obs Mean Std. Dev. Min Max attend | pattend | rattend | reminder: you can only use the predict command after you have used regress to estimate the regression function. 26

Plotting the residuals for each observed level of X, using STATA. plot rattend regul | | * | R | e | s | i | d | * u | * a | * l | * s | * | * * | * * * | * * * regul do you notice the residual that is an outlier? 27

More on Sums of Squares: Sum of Squares refers to the act of taking each ‘error’, squaring it, and adding it to all the other errors in the sample. This operation is analogous to calculating a variance, without dividing by n- 1. Sum of Squares Total (SST) refers to the difference between a score y i and the overall mean Y bar.  (Y i – Y bar ) 2 Sum of Squares Error (SSE), also called Sum of Squares Residual (SSR), refers to the difference between a score y i and the corresponding prediction from the regression line Y hat.  (Y i – Y hat ) 2 28

9.3 the linear regression model The conceptual problem: The linear model Y =  +  X has limited use because it is deterministic and cannot account for variability in Y-values for observations with the same X-value. The conceptual solution: The linear regression model E(Y) =  +  X is a probabilistic model more suited to the variable data in social science research. A regression function describes how the mean of the response variable changes according to the value of an explanatory variable. For example, we don’t expect qll college graduates to earn more than all high school graduates, but we expect the mean earnings of college graduates to be greater than the mean earnings of high school graduates. 29

A standard deviation for the linear regression model A new problem: How do we describe variation about the means of a regression line? A solution: The conditional standard deviation  refers to variability of Y values about the conditional population mean E(Y) =  +  X for subjects with the same value of X. Q: why n-2? 30

The linear regression model: example of conditional standard deviation Church attendance and state control problem: SSE (also called SSR) = n = 18, n-2 = 16 31

Solving a least squares regression, using STATA. regress attend regul Source | SS df MS Number of obs = F( 1, 16) = 9.65 Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = attend | Coef. Std. Err. t P>|t| [95% Conf. Interval] regul | _cons | b is the coefficient for “regul”. a is the coefficient for “_cons”. (ignore all the other output for now) %attend = * regul 32

interpreting the conditional standard deviation Church attendance and state control problem: For every level of state control of religion, the standard deviation for the predicted mean church attendance is percentage points. (Draw chart on board) By assumptions of the regression model, this is true for every level of state control. (Is that assumption valid in this case?) 33

Conditional standard deviation and Marginal standard deviation Degrees of freedom are different E(Y) is different: Y bar versus Y hat Conditional s.d. is usually smaller than marginal. 34