Scatterplots & Regression Week 3 Lecture MG461 Dr. Meredith Rolfe.

Slides:



Advertisements
Similar presentations
Managerial Economics in a Global Economy
Advertisements

Forecasting Using the Simple Linear Regression Model and Correlation
Inference for Regression
Simple Linear Regression. Start by exploring the data Construct a scatterplot  Does a linear relationship between variables exist?  Is the relationship.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester Eng. Tamer Eshtawi First Semester
Correlation and regression
Objectives (BPS chapter 24)
Bivariate Regression Analysis
1-1 Regression Models  Population Deterministic Regression Model Y i =  0 +  1 X i u Y i only depends on the value of X i and no other factor can affect.
Chapter 10 Simple Regression.
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
The Simple Regression Model
Lesson #32 Simple Linear Regression. Regression is used to model and/or predict a variable; called the dependent variable, Y; based on one or more independent.
Chapter Topics Types of Regression Models
Topics: Regression Simple Linear Regression: one dependent variable and one independent variable Multiple Regression: one dependent variable and two or.
Introduction to Probability and Statistics Linear Regression and Correlation.
Ch. 14: The Multiple Regression Model building
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
Analysis of Individual Variables Descriptive – –Measures of Central Tendency Mean – Average score of distribution (1 st moment) Median – Middle score (50.
1 Relationships We have examined how to measure relationships between two categorical variables (chi-square) one categorical variable and one measurement.
Lecture 19 Simple linear regression (Review, 18.5, 18.8)
Correlation and Regression Analysis
Introduction to Regression Analysis, Chapter 13,
Simple Linear Regression Analysis
1 Simple Linear Regression 1. review of least squares procedure 2. inference for least squares lines.
Simple Linear Regression
Lecture 5 Correlation and Regression
Correlation & Regression
Active Learning Lecture Slides
Regression and Correlation Methods Judy Zhong Ph.D.
Introduction to Linear Regression and Correlation Analysis
Regression Analysis Regression analysis is a statistical technique that is very useful for exploring the relationships between two or more variables (one.
Inference for regression - Simple linear regression
Chapter 11 Simple Regression
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
Managerial Economics Demand Estimation. Scatter Diagram Regression Analysis.
Soc 3306a Lecture 9: Multivariate 2 More on Multiple Regression: Building a Model and Interpreting Coefficients.
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
Ch4 Describing Relationships Between Variables. Section 4.1: Fitting a Line by Least Squares Often we want to fit a straight line to data. For example.
Multiple Regression & OLS violations Week 4 Lecture MG461 Dr. Meredith Rolfe.
Examining Relationships in Quantitative Research
Go to Table of Content Single Variable Regression Farrokh Alemi, Ph.D. Kashif Haqqi M.D.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
1Spring 02 First Derivatives x y x y x y dy/dx = 0 dy/dx > 0dy/dx < 0.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
10B11PD311 Economics REGRESSION ANALYSIS. 10B11PD311 Economics Regression Techniques and Demand Estimation Some important questions before a firm are.
Political Science 30: Political Inquiry. Linear Regression II: Making Sense of Regression Results Interpreting SPSS regression output Coefficients for.
Lecture 10: Correlation and Regression Model.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
Examining Relationships in Quantitative Research
Simple linear regression Tron Anders Moger
Chapter 8: Simple Linear Regression Yang Zhenlin.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
Psychology 202a Advanced Psychological Statistics October 22, 2015.
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 7: Regression.
Chapters 8 Linear Regression. Correlation and Regression Correlation = linear relationship between two variables. Summarize relationship with line. Called.
Lecturer: Ing. Martina Hanová, PhD.. Regression analysis Regression analysis is a tool for analyzing relationships between financial variables:  Identify.
Correlation and Regression Basic Concepts. An Example We can hypothesize that the value of a house increases as its size increases. Said differently,
Bivariate Regression. Bivariate Regression analyzes the relationship between two variables. Bivariate Regression analyzes the relationship between two.
Regression Analysis AGEC 784.
Inference for Least Squares Lines
Linear Regression.
AP Statistics Chapter 14 Section 1.
(Residuals and
Correlation and Simple Linear Regression
CHAPTER 29: Multiple Regression*
CHAPTER 26: Inference for Regression
Correlation and Simple Linear Regression
Presentation transcript:

Scatterplots & Regression Week 3 Lecture MG461 Dr. Meredith Rolfe

Key Goals of the Week What is regression? When is regression used? Formal statement of linear model equation Identify components of linear model Interpret regression results: decomposition of variance and goodness of fit estimated regression coefficients significance tests for coefficients MG461, Week 3 Seminar 2

Who has studied regression or linear models before? 1.Taken before 2.Not taken before

Which group are you in? 1.Group 1 2.Group 2 3.Group 3 4.Group 4 5.Group 5 6.Group 6 7.Group 7 8.Group 8

Regression is a set of statistical tools to model the conditional expectation… 1.of one variable on another variable. 2.of one variable on one or more other variables.

LINEAR MODEL BACKGROUND MG461, Week 3 Seminar 6

Recap: Theoretical System MG461, Week 3 Seminar 7

What is regression? Regression is the study of relationships between variables It provides a framework for testing models of relationships between variables Regression techniques are used to assess the extent to which the outcome variable of interest, Y, changes dependent on changes in the independent variable(s), X MG461, Week 3 Seminar 8

When to use Regression We want to know whether the outcome, y, varies depending on x We can use regression to study correlation (not causation) or make predictions Continuous variables (but many exceptions) MG461, Week 3 Seminar 10

Questions we might care about Do higher paid employees contribute more to organizational success? Do large companies earn more? Do they have lower tax rates? MG461, Week 3 Seminar 11 Does an change in X lead to an change in Y

How to answer the questions? Observational (Field) Study Collect data on income and a measure of contributions to the organization Collect data on corporate profits in various regions (which companies, which regions) Experimental Study Random assignment to different contribution levels or levels of pay (?) ? Random assignment to country and/or tax rate, quasi-experiment? MG461, Week 3 Seminar 12

When to use Regression We want to know whether the outcome, y, varies depending on x Continuous variables (but many exceptions) Observational data (mostly) MG461, Week 3 Seminar 13

Example 1: Pay and Performance MG461, Week 3 Seminar 14 Performance Pay Runs Yearly Salary

Scatterplot: Salaries vs. Runs MG461, Week 3 Seminar 15

Scatterplot: Salaries vs. Runs MG461, Week 3 Seminar 16 Δx Δy

What is the equation for a line? 1.y=ax 2 2.y=ax 3.y=ax+b 4.y=x+b

Equation of a (Regression) Line MG461, Week 3 Seminar 18 Intercept Slope But… x and y are random variables, we need an equation that accounts for noise and signal Population Parameters

Simple Linear Model MG461, Week 3 Seminar 19 Dependent Variable Independent Variable Intercept Coefficient (Slope) Error Observation or data point, i, goes from 1…n

The relationship must be LINEAR MG461, Week 3 Seminar 20 The linear model assumes a LINEAR relationship You can get results even if the relationship is not linear LOOK at the data! Check for linearity

When to use Regression We want to know whether the outcome, y, varies depending on x Continuous variables (but many exceptions) Observational data (mostly) The relationship between x and y is linear MG461, Week 3 Seminar 21

Understanding the key points What is regression? When is regression used? Formal statement of linear model equation MG461, Week 3 Seminar 22

Understand what regression is… 1.Strongly Agree 2.Agree 3.Disagree 4.Strongly Disagree Mean =Median =

Understand when to use regression 1.Strongly Agree 2.Agree 3.Disagree 4.Strongly Disagree Mean =Median =

Know how to make formal statement of linear model 1.Strongly Agree 2.Agree 3.Disagree 4.Strongly Disagree Mean =Median =

Discussion What is regression? When is regression used? Formal statement of linear model equation MG461, Week 3 Seminar 26

MODEL ESTIMATION MG461, Week 3 Seminar 27

Which model parameter do we NOT need to estimate? 1.β 0 2.β 1 3.x i 4.σ 2

Goal: Estimate the Relationship between X and Y Estimate the population parameters β 0 and β 1 MG461, Week 3 Seminar 29 We can also estimate the error variance σ 2 as ("beta hat zero") ("beta hat one")

What would “good” estimates do? 1.Minimize explained variance 2.Minimize distance to outliers 3.Minimize unexplained variance

Finding the Best Line: Unexplained Variance MG461, Week 3 Seminar 31 eieieiei

Ordinary Least Squares (OLS) Criteria MG461, Week 3 Seminar 32 ("beta hat zero") ("beta hat one") Minimize “noise” (unexplained variance) defined as residual sum of squares (RSS)

OLS Estimates of Beta-hat MG461, Week 3 Seminar 33

Mean of x and y MG461, Week 3 Seminar 34

Variance of x MG461, Week 3 Seminar 35

Variance of y MG461, Week 3 Seminar 36

Covariance of x and y MG461, Week 3 Seminar 37

OLS Estimates of Beta-hat MG461, Week 3 Seminar 38 Note the similarity between ß 1 and the slope of a line: change in y over change in x (rise over run)

Why squared residuals? Geometric intuition: X and Y are vectors, find the shortest line between them: MG461, Week 3 Seminar 39 X Y

Decomposing the Variance As in Anova, we now have: Explained Variation Unexplained Variation Total Variation This decomposition of variance provides one way to think about how well the estimated model fits the data MG461, Week 3 Seminar 40

Total Squared Residuals ( SYY ) MG461, Week 3 Seminar 41

Explained vs. Unexplained Squared Residuals MG461, Week 3 Seminar 42

R 2 and goodness of fit MG461, Week 3 Seminar 43 RSS (residual sum of squares) = unexplained variation TSS (total sum of squares) = SYY, total variation of dependent variable y ESS (explained sum of squares) = explained variation (TSS-RSS)

Examples of high and low R 2 MG461, Week 3 Seminar 44 Graph 1 Graph 2

Which graph had a high R 2 ? 1.Graph 1 2.Graph 2

Recall that R 2 = ESS/TSS or 1-(RSS/TSS). What values can R 2 take on? 1.Can be any number 2.Any number between -1 and 1 3.Any number between 0 and 1 4.Any number between 1 and 100

Examples of high and low R 2 R 2 =0.29 R 2 =0.87 MG461, Week 3 Seminar 47

Interpretation of ß-hats ß-hat 0 : intercept, value of y i when x i is 0 ß-hat 1 : average or expected change in y i for every 1 unit change in x i MG461, Week 3 Seminar 48

Visualization of Coefficients MG461, Week 3 Seminar 49 Δx=1 Δy=β 1 β0β0β0β0

OLS estimates: Pay for Runs Coefficients.e.tp-value (sig) Intercept Runs < R2nR2n MG461, Week 3 Seminar 50 Salary i = Beta-hat0 + Beta-hat1 * Runs i + error i

OLS estimates of Regression Line MG461, Week 3 Seminar 51 Salary = *Runs

Interpretation of ß-hats ß-hat 0 : players with no runs don’t get paid (not really – come back to this next week!) ß-hat 1 : Each additional run translates into $27,470 in salary per year OR: a difference of almost $1 million/year between a player with an average (median) and an above average (80%) number of runs MG461, Week 3 Seminar 52 y (Salary) = x (Runs)

Significance of Results Model Significance H 0 : None of the 1 (or more) independent variables covary with the dependent variable H A : At least one of the independent variables covaries with d.v. Application: compare two fitted models Test: Anova/F-Test **assumes errors (e i ) are normally distributed Coefficient Significance H 0 : ß 1 =0, there is no relationship (covariation) between x and y H A : ß 1 ≠0, there is a relationship (covariation) between x and y Application: a single estimated coefficient Test: t-test **assumes errors (e i ) are normally distributed MG461, Week 3 Seminar 53

OLS estimates: Pay for Runs Coefficients.e.tp-value (sig) Intercept Runs < R2nR2n MG461, Week 3 Seminar 54 Assuming normality, we can derive estimated standard errors for the coefficients

OLS estimates: Pay for Runs Coefficients.e.tp-value (sig) Intercept Runs < R2nR2n MG461, Week 3 Seminar 55 And using these, calculate a t-statistic and test for whether or not the coefficients are equal to zero

OLS estimates: Pay for Runs Coefficients.e.tp-value (sig) Intercept Runs < R2nR2n MG461, Week 3 Seminar 56 And finally, the probability of being wrong (Type 1) if we reject H 0

Plotting Confidence Intervals MG461, Week 3 Seminar 57

Agree or Disagree, “The lecture was clear and easy to follow” 1.Strongly Agree 2.Agree 3.Somewhat Agree 4.Neutral 5.Somewhat Disagree 6.Disagree 7.Strongly Disagree

Next time.. Multiple independent variable OLS assumptions What to do when OLS assumptions are violated MG461, Week 3 Seminar 59

Team Scores