POSC 202A: Lecture Lecture: Substantive Significance, Relationship between Variables 1.

Slides:



Advertisements
Similar presentations
Lesson 10: Linear Regression and Correlation
Advertisements

Chap 12-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 12 Simple Regression Statistics for Business and Economics 6.
Inference for Regression
Bivariate Analyses.
1 SSS II Lecture 1: Correlation and Regression Graduate School 2008/2009 Social Science Statistics II Gwilym Pryce
Correlation and regression
LECTURE 3 Introduction to Linear Regression and Correlation Analysis
Chapter 10 Simple Regression.
Chapter 12 Simple Regression
Chapter 13 Introduction to Linear Regression and Correlation Analysis
The Simple Regression Model
SIMPLE LINEAR REGRESSION
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 6: Correlation.
Chapter Topics Types of Regression Models
Linear Regression and Correlation Analysis
Topics: Regression Simple Linear Regression: one dependent variable and one independent variable Multiple Regression: one dependent variable and two or.
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Introduction to Regression Analysis, Chapter 13,
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Relationships Among Variables
Introduction to Linear Regression and Correlation Analysis
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
Correlation and Regression. The test you choose depends on level of measurement: IndependentDependentTest DichotomousContinuous Independent Samples t-test.
Chapter 15 Correlation and Regression
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
Statistics for Business and Economics 7 th Edition Chapter 11 Simple Regression Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch.
Introduction to Linear Regression
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
POSC 202A: Lecture 12/10 Announcements: “Lab” Tomorrow; Final ed out tomorrow or Friday. I will make it due Wed, 5pm. Aren’t I tender? Lecture: Substantive.
EQT 373 Chapter 3 Simple Linear Regression. EQT 373 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value.
Production Planning and Control. A correlation is a relationship between two variables. The data can be represented by the ordered pairs (x, y) where.
Examining Relationships in Quantitative Research
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
Lecture 10: Correlation and Regression Model.
Chapter Eight: Using Statistics to Answer Questions.
Statistics for Managers Using Microsoft® Excel 5th Edition
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
PO 141: INTRODUCTION TO PUBLIC POLICY Summer I (2015) Claire Leavitt Boston University.
Chapter 13 Linear Regression and Correlation. Our Objectives  Draw a scatter diagram.  Understand and interpret the terms dependent and independent.
Module II Lecture 1: Multiple Regression
Chapter 13 Simple Linear Regression
Statistical analysis.
Chapter 14 Introduction to Multiple Regression
Regression Analysis.
Regression Analysis AGEC 784.
Statistics for Managers using Microsoft Excel 3rd Edition
Statistical analysis.
The Simple Linear Regression Model: Specification and Estimation
Chi-Square X2.
Linear Regression and Correlation Analysis
Hypothesis Testing Review
Chapter 11 Simple Regression
Correlation and Simple Linear Regression
Correlation and Regression
CHAPTER 29: Multiple Regression*
CHAPTER 26: Inference for Regression
Correlation and Simple Linear Regression
UNDERSTANDING RESEARCH RESULTS: STATISTICAL INFERENCE
Simple Linear Regression and Correlation
Product moment correlation
SIMPLE LINEAR REGRESSION
15.1 The Role of Statistics in the Research Process
Chapter Nine: Using Statistics to Answer Questions
Introduction to Regression
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

POSC 202A: Lecture Lecture: Substantive Significance, Relationship between Variables 1

Substantive Significance 2

Statistical Significance vs. Substantive Importance Statistical significance speaks to how confident or certain we are that the results are not the product of chance. Substantive importance speaks to how large the results are and the extent to which they matter. 3

Statistical Significance vs. Substantive Importance Recall that it is easier to attain statistical significance as the sample we take gets larger. The denominator in significance tests depends on the SD which is heavily influenced by the sample size. Dividing by a smaller denominator leads to a larger test statistic (Z Score). 4

Statistical Significance vs. Substantive Importance But, none of this speaks to whether the size of the effect we observe is large or small, or whether it matters for some social process. All we are saying with a statistically significant result is that the results are not likely the product of chance. Statistical significance is about confidence NOT importance. 5

Statistical Significance vs. Substantive Importance In the context of regression, the question of substantive significance is always: Is the slope big?   What’s big or not is always to some degree subjective. But the question is only to some degree subjective. It is also partly objective. The challenge of assessing substantive significance is to make a correct assessment with respect to the objective part and a reasonable assessment with respect to the subjective part. 6

Relationships among Variables We began this course by talking about how to describe data, where we were examining one variable. Measures of central tendency Measures of dispersion Now we can apply these same concepts to relationships among and between variables. 7

Relationships among Variables Association- Variables are associated if larger (or smaller) values of one variable occur more frequently with larger (or smaller) values of other variables. 8

Relationships among Variables How to describe relationships? Tables Graphs summary statistics. 9

Relationships among Variables Tables-describe simple relationships. Usually if you don’t see a relationship in a simple table you wont find it using more complex methods. Look to diagonals Null Hypothesis (expected) Alternative Hypothesis (Observed) Education Education Low High Total 50% (50) 0% Low High Total 25% (25) 50% Income Income 10

Relationships among Variables Tables Relationships often summarized using the chi-squared statistic Where the observed and expected are calculated for each cell and this result is added across all cells. We treat this statistic as we would a Z score, but use the distribution to determine significance (page T-20). 11

Relationships among Variables So for the table of income and education we would get the following result: To find the area in the tail multiply the number of rows -1 by the number of columns-1 or (r-1)*(c-1) to find the df and use that line in table F on page T-20 12

Relationships among Variables Potential Problems: The unit of analysis can conceal important factors. Simpson’s Paradox- The direction of a relationship can change when the level of data analysis goes from individual to group levels of association. 13

Relationships among Variables University Admissions Decisions Male Female Admit 3500 2000 Deny 4500 4000 Total 8000 6000 Acceptance Rates Female: 2000/6000= 33% Male: 3500/8000= 44% Is there bias? 14

Relationships among Variables University Admissions Decisions by College Sciences Humanities Male Female Admit 3000 1000 Deny Total 6000 2000 Male Female Admit 500 1000 Deny 1500 3000 Total 2000 4000 Acceptance Rates Male: 50% Female: 50% Acceptance rates Male: 25% Female: 25% Is there bias? What is going on? 15

Relationships among Variables Lurking Variables: A variable not included in the analysis but that affects the result. In this case it was that men and women had preferred different majors. Men preferred the easier to get into major. 16

Relationships among Variables Lurking Variable: A variable not included in the analysis but that affects the result. X Y 17

Relationships among Variables Lurking Variable: A variable not included in the analysis but that affects the result. X Y Z 18

Relationships among Variables We can examine the strength of relationships graphically. Scatter plots- Show the relationship between two variables when the data are measured on the ordinal, interval, or ratio scales. 19

Relationships among Variables Scatter plot- (Example here) Graphs stolen from fiverthirtyeight.com 20

Relationships among Variables They can also be used to compare across events. 2008 2010 21

Relationships among Variables We can examine the strength of relationships Statistically. Measures of Association- 22

Relationships among Variables We can examine the strength of relationships statistically. Measures of Association- We mentioned the statistic. The workhorse is the correlation coefficient. 23

Correlation Coefficient

Relationships among Variables Correlation Coefficient- Tells us the linear association between two variables. Ranges from -1.0 (perfect negative association) to +1.0 a perfect positive association. Abbreviated as ‘r’ Answers the question: How far are the points (on average) from the line. 25

Relationships among Variables 26

Relationships among Variables As a general rule: r=.7 is a very strong relationship r=.5 is a strong relationship r=.3 is a weak relationship r=0 is no relationship But it varies depending on how noisy the data are. 27

Relationships among Variables Weaknesses: Does not tell us the magnitude. Example: correlation between education and income =.8. Should you get an MBA? How can we account for intervening variables? 28

How much increasing one variable changes another variable. Regression Tells us not only the direction but the precise strength of the relationship. How much increasing one variable changes another variable. 29

Regression To clarify this concept we need to be more precise in how we define our variables. Dependent Variable (Y)- The thing being explained. Usually the phenomenon we are interested in studying. Think of this as “the effect” 30

Regression Independent Variable (X)- The thing that affects what we seek to explain. In overly simplistic terms think of this as “the cause” or “an influence” 31

Regression Regression tells us in precise terms the strength of the relationship between the dependent and independent variable. How? By fitting a line through the data that provides the least squared errors. 32

Regression OLS picks the line with the smallest sum of squared errors. 33

Regression Regression is a process that evaluates the relationship between two variables and selects the line that minimizes the sum of squared errors around the line. Where: Y = dependent variable = intercept =slope X= independent variable e = residual 34

Regression The relationship between the independent and dependent variable is summarized by the regression coefficient which tells us the angle or slope of the line. 35

Regression Regression Coefficient (b)- Tells us how much a one unit change in X causes in Y. This is called the slope which reflects the angle of the line. 36

Regression Intercept- Tells us what the value of Y is when X is zero. Also called the constant. 37

Its how well a line fits or describes the data. Regression R squared (R2) Tells us how much of the variation in the dependent variable (Y) our model explained. Its how well a line fits or describes the data. Ranges from 0 to 100%. 38

ALWAYS begin by graphing your data. Regression What is the relationship between the vote for Ross Perot and Bob Dole in Florida in 1996? ALWAYS begin by graphing your data. 39

Regression OLS picks the line with the smallest sum of squared errors. 40

Regression Intercept Slope (b) R2 41

Regression: Interpretation Three main results: The slope: for every additional vote Bob Dole received, Perot got .18 more votes, or for every 100 for Dole, Perot got 18. Where Dole got no votes, we expect Perot to get 1055. The model explains about 84% of the variation in Ross Perot’s vote. 42

Regression Slope (b)=.184 R2=.843 Intercept =1055 43

High vs. Low R2 R2=.06 R2=.45 Used to compare how well different models explain the data. Higher R2 indicates a better fit. 44

Regression Standard Error To this point, we have assumed that we know the standard deviation of the population. In practice, we seldom know this, so we estimate it (just as we estimate the population mean). This estimate is called the standard error. 45

Regression We estimate a standard error for the slope and the intercept and use it like we did the standard deviation—to perform significance tests. 46

Regression Here we are conducting a significance test of whether the observed slope and intercept differ from the null hypothesis. This statistic follows a T distribution which accounts for the additional uncertainty that comes from estimating the standard error. 47

Regression Intercept Slope (b) R2 Standard Error Standard Error 48

We can simply plug the figures in from the Stata output. Regression We can simply plug the figures in from the Stata output. This statistic follows a T distribution which accounts for the additional uncertainty that comes from estimating the standard error. (See inside back cover of M&M). To determine the degrees of freedom subtract the number of variables (2) in the model from the number of observations (67). This is abbreviated as n-k. 49

We can simply plug the figures in from the Stata output. Regression We can simply plug the figures in from the Stata output. Or, we can use our rule of thumb—if the T statistic is greater than 2 it is significant! 50

Stata also reports these p values for us. Regression If we look to the table we see that the p value corresponds to less than .0000 or less than 1 time in 10,000 would we see a result as big as .184 if the true value were zero. Stata also reports these p values for us. 51

Regression: Interpreting the Rest T Stats P Value F Statistic- all variables=0 52

Regression: Residuals OK—lets interpret this. 53

Regression: Residuals Next Up: Residuals Regression assumptions Multiple regression. 54