Social Science Models What elements (at least two) are necessary for a “social science model”?

Slides:



Advertisements
Similar presentations
Managerial Economics in a Global Economy
Advertisements

Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
Simple Linear Regression. G. Baker, Department of Statistics University of South Carolina; Slide 2 Relationship Between Two Quantitative Variables If.
Simple Linear Regression and Correlation
Linear Regression: Making Sense of Regression Results
Lecture 9 Today: Ch. 3: Multiple Regression Analysis Example with two independent variables Frisch-Waugh-Lovell theorem.
Sociology 601, Class17: October 27, 2009 Linear relationships. A & F, chapter 9.1 Least squares estimation. A & F 9.2 The linear regression model (9.3)
Sociology 601 Class 19: November 3, 2008 Review of correlation and standardized coefficients Statistical inference for the slope (9.5) Violations of Model.
1 Lecture 2: ANOVA, Prediction, Assumptions and Properties Graduate School Social Science Statistics II Gwilym Pryce
Analysis of Economic Data
The Simple Linear Regression Model: Specification and Estimation
Sociology 601 Class 21: November 10, 2009 Review –formulas for b and se(b) –stata regression commands & output Violations of Model Assumptions, and their.
Chapter 12 Simple Regression
Statistics for Business and Economics
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.
1 Review of Correlation A correlation coefficient measures the strength of a linear relation between two measurement variables. The measure is based on.
So far, we have considered regression models with dummy variables of independent variables. In this lecture, we will study regression models whose dependent.
Linear Regression and Correlation Analysis
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Introduction to Probability and Statistics Linear Regression and Correlation.
Interpreting Bi-variate OLS Regression
TESTING A HYPOTHESIS RELATING TO A REGRESSION COEFFICIENT This sequence describes the testing of a hypotheses relating to regression coefficients. It is.
SLOPE DUMMY VARIABLES 1 The scatter diagram shows the data for the 74 schools in Shanghai and the cost functions derived from a regression of COST on N.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 3) Slideshow: precision of the multiple regression coefficients Original citation:
EDUC 200C Section 4 – Review Melissa Kemmerle October 19, 2012.
Relationships Among Variables
Review Guess the correlation. A.-2.0 B.-0.9 C.-0.1 D.0.1 E.0.9.
Correlation and Linear Regression
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 13 Linear Regression and Correlation.
Example of Simple and Multiple Regression
Introduction to Linear Regression and Correlation Analysis
Linear Regression and Correlation
How do Lawyers Set fees?. Learning Objectives 1.Model i.e. “Story” or question 2.Multiple regression review 3.Omitted variables (our first failure of.
Statistics for Business and Economics Chapter 10 Simple Linear Regression.
EDUC 200C Section 3 October 12, Goals Review correlation prediction formula Calculate z y ’ = r xy z x for a new data set Use formula to predict.
CHAPTER 14 MULTIPLE REGRESSION
MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE 1 This sequence provides a geometrical interpretation of a multiple regression model with two.
Correlation and Linear Regression. Evaluating Relations Between Interval Level Variables Up to now you have learned to evaluate differences between the.
© 2001 Prentice-Hall, Inc. Statistics for Business and Economics Simple Linear Regression Chapter 10.
DON’T TAKE NOTES! MUCH OF THE FOLLOWING IS IN THE COURSEPACK! Just follow the discussion and try to interpret the statistical results that follow.
Introduction to Linear Regression
Correlation and Regression Used when we are interested in the relationship between two variables. NOT the differences between means or medians of different.
Statistics Review - 1 What is the difference between a variable and a constant? Why are we more interested in variables than constants? What are the four.
Warsaw Summer School 2015, OSU Study Abroad Program Regression.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
11/23/2015Slide 1 Using a combination of tables and plots from SPSS plus spreadsheets from Excel, we will show the linkage between correlation and linear.
Political Science 30: Political Inquiry. Linear Regression II: Making Sense of Regression Results Interpreting SPSS regression output Coefficients for.
POSSIBLE DIRECT MEASURES FOR ALLEVIATING MULTICOLLINEARITY 1 What can you do about multicollinearity if you encounter it? We will discuss some possible.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
(1)Combine the correlated variables. 1 In this sequence, we look at four possible indirect methods for alleviating a problem of multicollinearity. POSSIBLE.
Regression Analysis in Theory and Practice. DON’T WRITE THE FORMULAS AHEAD!!!
U.S. Senate Voting on Taxation – Variable List -1 Tax = Percentage of times the senator voted in favor of federal tax changes where over 50% of the benefits.
ANOVA, Regression and Multiple Regression March
STAT E100 Section Week 12- Regression. Course Review - Project due Dec 17 th, your TA. - Exam 2 make-up is Dec 5 th, practice tests have been updated.
1 CHANGES IN THE UNITS OF MEASUREMENT Suppose that the units of measurement of Y or X are changed. How will this affect the regression results? Intuitively,
Multiple Independent Variables POLS 300 Butz. Multivariate Analysis Problem with bivariate analysis in nonexperimental designs: –Spuriousness and Causality.
Chapter 20 Linear and Multiple Regression
QM222 Class 9 Section A1 Coefficient statistics
Correlation and Simple Linear Regression
QM222 Class 11 Section A1 Multiple Regression
Political Science 30: Political Inquiry
Political Science 30: Political Inquiry
Multiple Regression.
The slope, explained variance, residuals
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
MGS 3100 Business Analysis Regression Feb 18, 2016
Presentation transcript:

Social Science Models What elements (at least two) are necessary for a “social science model”?

Why Regression? - 1 Measures of Association (e.g., correlation) only tell us the strength of the relationship between X and Y, NOT the MAGNITUDE of the relationship. Regression tells us the MAGNITUDE of the relationship (i.e., how MUCH the dependent variable changes for a specified amount of change in the independent variable).

Why Regression? - 2

Why Regression? - 3 Properties that our estimate of the magnitude will adhere to: 1. Unbiased – on average our estimate will be the “true” value of the magnitude.

Why Regression? Efficient – among all unbiased estimation procedures our procedure will have the least variation among estimates. 3. Consistent – as the sample size increases the accuracy of our estimate of the magnitude increases.

Linear Regression I: Scatterplots and Regression Lines SAT scores and graduation rates Looking at a scatterplot Fitting a regression line What does it mean? Other factors that affect graduation rates Confound: reputation Other independent variables

SAT Scores and Graduation Rates - 1 To test the hypothesis that colleges with smarter students have higher graduation rates, I will look at data from 148 colleges in the United States. SAT Scores Graduation Rates

SAT Scores and Graduation Rates - 2

SAT Scores and Graduation Rates - 3 We can summarize the direction and strength of a relationship between two variables by calculating “r,” the correlation. If we want to know more about the relationship, we can fit a “regression line” to the scatterplot.

SAT Scores and Graduation Rates - 4

SAT Scores and Graduation Rates - 5 A regression line summarizes how much the dependent variable (graduation rates) changes when the independent variable (SAT scores) increases. The line gives you a “predicted value” of graduation rates for a college with a given average SAT score. To make this prediction as good as possible, the line minimizes vertical distances between the line and the data points.

SAT Scores and Graduation Rates - 6 – DON’T WRITE THE FORMULA! Like all lines, a regression line can be summarized with this formula: y = a + bx where: b = slope of line or “regression coefficient” a = the intercept, or the value of y when x=0

Regression Theory - 1 We want to “explain” the variation in graduation rates among universities. We could just use the mean graduation rate as a prediction of each university’s graduation rate. However, even with many cases, we often do not observe multiple cases at the same value of x. Thus, no average graduation rate for a particular average SAT score.

Regression Theory - 2 We might think the value of “y” (graduation rates) we observe is conditional on the value of “x” (SAT scores). Take the mean of y at each value of x We essentially have a frequency distribution for the values y can take on for each value of x

E(Y | x i ) The one time we observe x, it is likely to be close to the mean of its probability distribution

SAT Scores and Graduation Rates - 7 The regression coefficient for the effect of SAT scores on graduation rates is This is the predicted effect on graduation rates when SAT scores go up by one unit. This means that comparing one college to another college where average SATs were 100 points higher should lead to a graduation rate that is 6 percentage points higher.

Other Factors Affecting Grad Rates - 1 The school’s reputation could be a confound: a school with a good rep might attract smart students and want to keep its reputation high by graduating them.

Other Factors Affecting Grad Rates – 2-Which Relationship is Negative?

Other Factors Affecting Grad Rates - 3 Year of Founding SAT Scores Graduation Tuition Rates Student/Faculty Ratio

Multiple Regression - 1 The impact of one independent variable on the dependent variable may change as other independent variables are included in the model (i.e., the same equation). The following example will demonstrate this.

Multiple Regression - 2 Tax = Percentage of times the senator voted in favor of federal tax changes where over 50% of the benefits went to households earning less than the median family income on 76 amendments to the Tax Reform Act of 1976 (my first attempt at looking at the impact of political parties on income inequality – too long ago!). This is the dependent variable in the analysis ahead.

Multiple Regression - 3 Cons = Percentage of times the senator voted for positions favored by the Americans for Constitutional Action (a conservative interest group) Note: What assumption about vote value does using a percentage measure make?

Multiple Regression - 4 Party = Senator’s party affiliation (1 = Democrat; 0 = Republican) Stinc = median household income in the senator’s state in thousands of dollars (i.e., $10,200 is entered as 10.2) What’s the “median”?

Multiple Regression - 5 Source | SS df MS Number of obs = F( 1, 98) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = tax | Coef. Std. Err. t P>|t| [95% Conf. Interval] cons | _cons | Note: _cons is the y-intercept. Can you interpret the above Stata output? Why might multiple regression be useful?

Multiple Regression - 6 Example from the 300Reader: Value of “b”: (1) if you use the senator’s conservatism to explain tax voting: (2) if you use the senator’s party to explain tax voting: (3) if you use the median family income in the senator’s state to explain tax voting: CAN YOU INTERPRET EACH “b”?

Regression of Tax on Cons, Party and Stinc in Stata Source | SS df MS Number of obs = F( 3, 96) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = tax | Coef. Std. Err. t P>|t| Beta cons | party | stinc | _cons | Interpret both the unstandardized (“Coef.” column) and standardized (“Beta” column). Karl Marx’s thoughts on this?

Multiple Regression - Interpretation Notice how much smaller the impact of senator party identification is when senator ideology is in the same equation. Also, note that the sign (i.e., direction of the relationship) for state median family income changes from positive to negative once all three independent variables are in the same equation.

Multiple Regression – Prediction - 1 From the previous output we know the following: “a” = , the impact of senator conservatism = -.644, the impact of senator party affiliation = and the impact of the median household income in the senator’s state = Senator #1’s scores on the three independent variables are as follows: conservatism = 26, party affiliation = 1 and state median household income = 7.4 (i.e., $7,400 in 1970).

Multiple Regression – Prediction - 2 To predict the score on “tax” for senator #1 the computer works the following equation: (26)(-.644) + (1)(11.207) + [(7.4)(-.560)] = – – =

Multiple Regression – Prediction - 3 Senator #1 is “predicted” to support the poor % of the time. Since senator #1 “actually” supported the poor on 54% of their tax votes, the prediction error (“e” or “residual”) for senator #1 is: = The computer then squares this value (i.e., x = 13.69). The computer performs this same operation for all 100 senators. The sum of the squared prediction errors for all 100 senators is 26,840.

Multiple Regression – Prediction - 4 If any of the values of the coefficients (i.e., , -.644, or -.560) were changed, the sum of the squared prediction errors would have been greater than 26,840. This is known as the “least squared errors principle.”

Regression Model Performance - 1 Let’s see how well our regression model performed. From the following we know that the mean score on “tax” is 46.5 (i.e., the average senator supported the poor/middle class 46.5% of the time). Variable | Obs Mean Std. Dev tax |

Regression Model Performance - 2 We also know that senator #1 supported the poor/middle class 54% of the time. If we subtract the average score from senator #1s score, we obtain senator #1s deviation from the mean. Thus, 54 – = If we squared this deviation (i.e., 7.46 x 7.46) we obtain the squared deviation from the mean for senator #1 (7.46 x 7.46 = 55.65).

Regression Model Performance - 3 If we repeat this process for all remaining 99 senators and add this total, we obtain the total variation in the dependent variable that we could explain: 81,776. From the previous discussion we know that the total squared prediction errors equal 26,840. If take [1 – (26,840/81,776 = = 67.1) we find that variation in senator conservatism, party affiliation and state median household income explained 67.1% of the variation in senatorial voting on tax legislation.

Multicollinearity An independent variable may be statistically insignificant because it is highly correlated with one, or more, of the other independent variables. For example, perhaps state median family income is highly correlated with senator conservatism (e.g., if wealthier states elected more conservative senators). Multicollinearity is a lack of information rather than a lack of data.

Visualizing Multicollinearity - 1

Visualizing Multicollinearity - 2

Visualizing Multicollinearity - 3

Multicollinearity Check in Stata 1 - 1/vif yields the percentage of the variation in one independent explained by all the other independent variables. Variable | VIF 1/VIF cons | party | stinc | What would Karl Marx think now?

Multicollinearity - Interpretation Unfortunately for Karl Marx, only 26% of the variation in state median family income is explained by the variation in senator conservatism and senator party affiliation ( =.262). Since this is low (i.e., well below the.70 threshold mentioned in the readings), Marx can’t legitimately claim high multicollinearity undermined his hypothesis.

Bread and Peace Model - 1 The Bread and Peace Model explain presidential voting on the basis of the percentage change in real disposable income and U.S. casualties in post-WWII wars. a = 46.2 (y intercept) b 1 = 3.6 (average per capita real income growth – annual lag operator.91) b 2 = (thousands of post-WWII casualties)

Bread and Peace Model - 2

Government Benefits - 1 The following slide contains the percentage of people who (a) benefit from various programs, and (b) claim in response to a government survey that they 'have not used a government social program.’ Government social programs are stigmatized as “welfare.” But many people benefit from such programs without realizing it. This results in a likely underprovision of such benefits.

Government Benefits or Coverdell Home mortgage interest deduction Hope or Lifetime Learning Tax Credit Student Loans Child and Dependent Tax Credit Earned income tax credit Pell Grants – 43.1 Medicare – 39.8 Food Stamps – 25.4

Regression in Value Added Teacher Evaluations – LA Times - 3/28/11 The general formula for the "linear mixed model" used in her district is a string of symbols and letters more than 80 characters long: y = Xβ + Zv + ε where β is a p-by- 1 vector of fixed effects; X is an n-by-p matrix; v is a q- by-1 vector of random effects; Z is an n-by-q matrix; E(v) = 0, Var(v) = G; E(ε) = 0, Var(ε) = R; Cov(v,ε) = 0. V = Var(y) = Var(y - Xβ) = Var(Zv + ε) = ZGZ T + R. In essence, value-added analysis involves looking at each student's past test scores to predict future scores. The difference between the prediction and students' actual scores each year is the estimated "value" that the teacher added — or subtracted.

California Election Given the correlations below, what should you expect in the regression table on the next slide where the dependent variable is “boxer 10” (percent of county vote for Boxer in 2010)? correlate boxer10 brown10 coll00 medinc08 (obs=58) | boxer10 brown10 coll00 medinc boxer10 | brown10 | coll00 | medinc08 |