Learning Objectives Describe the linear regression model

Slides:



Advertisements
Similar presentations
EcoTherm Plus WGB-K 20 E 4,5 – 20 kW.
Advertisements

Números.
Trend for Precision Soil Testing % Zone or Grid Samples Tested compared to Total Samples.
AGVISE Laboratories %Zone or Grid Samples – Northwood laboratory
Trend for Precision Soil Testing % Zone or Grid Samples Tested compared to Total Samples.
PDAs Accept Context-Free Languages
EuroCondens SGB E.
Worksheets.
STATISTICS Linear Statistical Models
STATISTICS HYPOTHESES TEST (I)
STATISTICS INTERVAL ESTIMATION Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.
Addition and Subtraction Equations
Add Governors Discretionary (1G) Grants Chapter 6.
CALENDAR.
CHAPTER 18 The Ankle and Lower Leg
Chapter 7 Sampling and Sampling Distributions
The 5S numbers game..
Simple Linear Regression 1. review of least squares procedure 2
A Fractional Order (Proportional and Derivative) Motion Controller Design for A Class of Second-order Systems Center for Self-Organizing Intelligent.
Numerical Analysis 1 EE, NCKU Tien-Hao Chang (Darby Chang)
Stationary Time Series
Break Time Remaining 10:00.
The basics for simulations
Factoring Quadratics — ax² + bx + c Topic
PP Test Review Sections 6-1 to 6-6
Regression with Panel Data
Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.
Progressive Aerobic Cardiovascular Endurance Run
MaK_Full ahead loaded 1 Alarm Page Directory (F11)
When you see… Find the zeros You think….
2011 WINNISQUAM COMMUNITY SURVEY YOUTH RISK BEHAVIOR GRADES 9-12 STUDENTS=1021.
Before Between After.
2011 FRANKLIN COMMUNITY SURVEY YOUTH RISK BEHAVIOR GRADES 9-12 STUDENTS=332.
1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)
Chapter 10 Correlation and Regression
Static Equilibrium; Elasticity and Fracture
Ch 14 實習(2).
Resistência dos Materiais, 5ª ed.
Copyright © 2013 Pearson Education, Inc. All rights reserved Chapter 11 Simple Linear Regression.
Lial/Hungerford/Holcomb/Mullins: Mathematics with Applications 11e Finite Mathematics with Applications 11e Copyright ©2015 Pearson Education, Inc. All.
Module 20: Correlation This module focuses on the calculating, interpreting and testing hypotheses about the Pearson Product Moment Correlation.
Simple Linear Regression Analysis
Multiple Regression and Model Building
9. Two Functions of Two Random Variables
A Data Warehouse Mining Tool Stephen Turner Chris Frala
1 Dr. Scott Schaefer Least Squares Curves, Rational Representations, Splines and Continuity.
Chart Deception Main Source: How to Lie with Charts, by Gerald E. Jones Dr. Michael R. Hyman, NMSU.
1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)
Schutzvermerk nach DIN 34 beachten 05/04/15 Seite 1 Training EPAM and CANopen Basic Solution: Password * * Level 1 Level 2 * Level 3 Password2 IP-Adr.
Forecasting Using the Simple Linear Regression Model and Correlation
EPI 809/Spring Probability Distribution of Random Error.
Simple Linear Regression and Correlation
Statistics for Business and Economics
Pengujian Parameter Koefisien Korelasi Pertemuan 04 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Chapter Topics Types of Regression Models
Statistics for Business and Economics Chapter 11 Multiple Regression and Model Building.
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
Chapter 7 Forecasting with Simple Regression
Statistics for Business and Economics Chapter 10 Simple Linear Regression.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
12a - 1 © 2000 Prentice-Hall, Inc. Statistics Multiple Regression and Model Building Chapter 12 part I.
© 2003 Prentice-Hall, Inc.Chap 13-1 Basic Business Statistics (9 th Edition) Chapter 13 Simple Linear Regression.
© 2001 Prentice-Hall, Inc. Statistics for Business and Economics Simple Linear Regression Chapter 10.
Multiple Regression Learning Objectives n Explain the Linear Multiple Regression Model n Interpret Linear Multiple Regression Computer Output n Test.
11-1 Copyright © 2014, 2011, and 2008 Pearson Education, Inc.
Statistics for Managers using Microsoft Excel 3rd Edition
Statistics for Business and Economics
PENGOLAHAN DAN PENYAJIAN
Statistics for Business and Economics
Presentation transcript:

Learning Objectives Describe the linear regression model State the regression modeling steps Explain least squares Compute regression coefficients Describe residual analysis Predict the response variable Understand correlational analysis As a result of this class, you will be able to... 2

Probabilistic Models Hypothesize 2 components Deterministic Random error Example: Sales volume is 10 times advertising spending plus random error Y = 10X + e Random error may be due to factors other than advertising 6

Types of Probabilistic Models 7

Regression Models Answer ‘What is the relationship between the variables?’ Equation used 1 numerical dependent (response) variable What is to be predicted 1 or more numerical or categorical independent (explanatory) variables Used mainly for prediction 8

Regression Modeling Steps Define problem or question Specify model Collect data Do descriptive data analysis Estimate unknown parameters Evaluate model Use model for prediction 9

Problem Definition Most critical step What are the model objectives? Don’t want right answer to wrong question What are the model objectives? Who will use the model? What will be the benefits? Are resources available (data etc.)? How will the results be implemented? 12

Specifying the Model Define variables Conceptual (e.g., advertising, price) Empirical (e.g., list price, regular price) Measurement (e.g., $, units) Hypothesize nature of relationship Expected effects (i.e., coefficients’ signs) Functional form (linear or non-linear) Interactions 15

Model Specification Is Based on Theory Economic & business theory Mathematical theory Previous research ‘Common sense’ 16

Types of Regression Models This teleology is based on the number of explanatory variables & nature of relationship between X & Y. 25

Linear Equations High School Teacher © 1984-1994 T/Maker Co. 28

Linear Regression Model Relationship between variables is a linear function Population Y-Intercept Population Slope Random Error Dependent (Response) Variable Independent (Explanatory) Variable 29

Sample Linear Regression Model ei = Random error Unsampled observation Observed value 36

Scatter Diagram Plot of all (Xi, Yi) pairs Suggests how well model will fit 39

Thinking Challenge How would you draw a line through the points? How do you determine which line ‘fits best’? Alone Group Class 42

Least Squares ‘Best fit’ means difference between actual Y values & predicted Y values are a minimum But positive differences off-set negative LS minimizes the sum of the squared differences (or errors) 51

Least Squares Graphically 52

Coefficient Equations Sample regression equation # (Xi, Yi) pairs Sample slope Average Xi’s, then square Sample Y-intercept 53

Computation Table 54

Interpretation of Coefficients Slope (b1) Estimated Y changes by b1 for each 1 unit increase in X Example: If b1 = 2, then Sales (Y) is expected to increase by 2 for each 1 unit increase in Advertising (X) Y-Intercept (b0) Average value of Y when X = 0 Example: If b0 = 4, then average Sales (Y) is expected to be 4 when Advertising (X) is 0 55

Parameter Estimation Example You’re a marketing analyst for Hasbro Toys. You gather the following data: Ad $ Sales (Units) 1 1 2 1 3 2 4 2 5 4 What is the relationship between sales & advertising? 56

Scatter Diagram Sales vs. Advertising 57

Parameter Estimation Solution Table 58

Coefficient Interpretation Solution Slope (b1) Sales Volume (Y) is expected to increase by .7 units for each $1 increase in Advertising (X) Y-Intercept (b0) Average value of Sales Volume (Y) is -.10 units when Advertising (X) is 0 Difficult to explain to Marketing Manager Expect some sales without advertising 60

Parameter Estimation Excel Output bP b0 b1 61

Evaluating the Model How well does the model describe the relationship between the variables? Closeness of ‘best fit’ Closer the points to the line the better Assumptions met Significance of parameter estimates 71

Evaluating Model Steps Examine variation measures Do residual analysis Test coefficients for significance 72

Random Error Variation Variation of actual Y from predicted Y Measured by standard error of estimate Sample standard deviation of e Denoted SYX Affects several factors Parameter significance Prediction accuracy 75

Standard Error of Estimate The mean error is 0. 76

Measures of Variation in Regression Total sum of squares (SST) Measures variation of observed Yi around the mean`Y Explained variation (SSR) Variation due to relationship between X & Y Unexplained variation (SSE) Variation due to other factors 77

Variation Measures Yi Unexplained sum of squares (Yi - Yi)2 ^ Total sum of squares (Yi -`Y)2 Explained sum of squares (Yi -`Y)2 ^ 78

Coefficient of Determination Proportion of variation ‘explained’ by relationship between X & Y 0 £ r2 £ 1 79

r 2 Examples r2 = 1 r2 = 1 r2 = .8 r2 = 0 80

Adjusted Coefficient of Determination Proportion of variation ‘explained’ by relationship between X & Y Reflects Sample size Number of independent variables 81

Coef. of Determination Excel Output r2 adjusted for number of explanatory variables & sample size SYX 86

Residual Analysis Graphical analysis of residuals Purposes Plot residuals vs. Xi values Residuals are also called errors Difference between actual Yi & predicted Yi Purposes Examine functional form (linear vs. non-linear model) Evaluate violations of assumptions 89

Linear Regression Assumptions Normality Y values are normally distributed for each X Probability distribution of error is normal Homoscedasticity (constant variance) Independence of errors Linearity 90

Residual Plot for Functional Form Add X2 Term Correct Specification 92

Residual Plot for Homoscedasticity Heteroscedasticity Correct Specification Fan-shaped. Standardized residuals used typically. 93

Residual Plot for Independence Not Independent Correct Specification Plots reflect sequence data were collected. 94

Residual Analysis Excel Output The plot is standardized (student) residuals for each observation. For observation 5, the standardized residual is large. You can save the residuals & do descriptive analysis on them, including a normal probability plot. There are not enough observations here to make further analysis meaningful. 95

Residual Plot Excel Output

Test of Slope Coefficient Tests if there is a linear relationship between X & Y Involves population slope b1 Hypotheses H0: b1 = 0 (No linear relationship) H1: b1 ¹ 0 (Linear relationship) Theoretical basis is sampling distribution of slopes 101

Test of Slope Parameter Solution H0: b1 = 0 H1: b1 ¹ 0 a = .05 df = 5 - 2 = 3 Critical Value(s): Test Statistic: Decision: Conclusion: Reject at a = .05 There is evidence of a relationship 109

Test Statistic Solution 110

Test of Slope Parameter Excel Output ‘Standard Error’ is the estimated standard deviation of the sampling distribution, sbP. bP Sb t = bP /Sb P P P-Value 111

Prediction With Regression Models Types of predictions Point estimates Interval estimates What is predicted Population mean response (mYX) for given X Point on population regression line Individual response (Yi) for given X 114

What Is Predicted 115

Factors Affecting Interval Width Level of confidence (1 - a) Width increases as confidence increases Data dispersion (SYX) Width increases as variation increases Sample size Width decreases as sample size increases Distance of Xgiven from mean`X Width increases as distance increases 117

Regression Cautions Violated assumptions Relevancy of historical data Level of significance Extrapolation Cause & effect Relevancy of Historical Data Even if interpolating, conditions may have changed. Level of Significance r2 may be high, but at what level? Extrapolation Prediction Outside the Range of X Values Used to Develop Equation Interpolation Prediction Within the Range of X Values Used to Develop Equation Based on smallest & largest X Values Cause & Effect The # of teachers is highly correlated with liquor consumption due to population size! 126

Extrapolation Extrapolation Prediction Outside the Range of X Values Used to Develop Equation Interpolation Prediction Within the Range of X Values Used to Develop Equation Based on smallest & largest X Values 127

Cause & Effect Liquor Consumption # Teachers The # of teachers is highly correlated with liquor consumption due to population size! # Teachers 128

Types of Probabilistic Models 130

Correlation Models Answer ‘How strong is the linear relationship between 2 variables?’ Coefficient of correlation used Population correlation coefficient denoted r (rho) Values range from -1 to +1 Measures degree of association Used mainly for understanding 131

Sample Coefficient of Correlation Pearson Product-Moment Coefficient of Correlation: 132

Correlation & Regression Line 141

Test of Correlation Coefficient Shows if there is a linear relationship between 2 numerical variables Same conclusion as testing population slope b1 Hypotheses H0: r = 0 (No correlation) H1: r ¹ 0 (Correlation) 142

Conclusion Described the linear regression model Stated the regression modeling steps Explained least squares Computed regression coefficients Described residual analysis Predicted the response variable As a result of this class, you will be able to... 143

Learning Objectives Explain the linear multiple regression model Interpret linear multiple regression computer output Explain multicollinearity As a result of this class, you will be able to... 2

Multiple Regression Models 10

Linear Multiple Regression Model Relationship between 1 dependent & 2 or more independent variables is a linear function Population Y-intercept Population slopes Random error Dependent (response) variable Independent (explanatory) variables 11

Population Multiple Regression Model Bivariate model 12

Sample Multiple Regression Model Bivariate model 13

Regression Modeling Steps Define problem or question Specify model Collect data Do descriptive data analysis Estimate unknown parameters Evaluate model Use model for prediction 14

Linear Multiple Regression Model Parameter Estimation Linear Multiple Regression Model 15

Multiple Linear Regression Equations Too complicated by hand! Ouch! 16

Interpretation of Estimated Coefficients Slope (bP) Estimated Y changes by bP for each 1 unit increase in XP holding all other variables constant Example: If b1 = 2, then Sales (Y) is expected to increase by 2 for each 1 unit increase in Advertising (X1) given the Number of Sales Rep’s (X2) Y-Intercept (b0) Average value of Y when XP = 0 17

Parameter Estimation Example You work in advertising for the New York Times. You want to find the effect of ad size (sq. in.) & newspaper circulation (000) on the number of ad responses (00). You’ve collected the following data: Resp Size Circ 1 1 2 4 8 8 1 3 1 3 5 7 2 6 4 4 10 6 Is this model specified correctly? What other variables could be used (color, photo’s etc.)? 18

Parameter Estimation Excel Output bP b0 b2 b1 19

Interpretation of Coefficients Solution Slope (b1) # Responses to Ad is expected to increase by .2049 (20.49) for each 1 sq. in. increase in Ad Size holding Circulation constant Slope (b2) # Responses to Ad is expected to increase by .2805 (28.05) for each 1 unit (1,000) increase in Circulation holding Ad Size constant Y-intercept is difficult to interpret. How can you have any responses with no circulation? 20

Evaluating the Model 21

Regression Modeling Steps Define problem or question Specify model Collect data Do descriptive data analysis Estimate unknown parameters Evaluate model Use model for prediction F 22

Evaluating Multiple Regression Model Steps Examine variation measures Do residual analysis Test parameter significance Overall model Portions of model Individual coefficients Test for multicollinearity 23

Coef. of Determination Excel Output r2Y.12 r2adj means 95.61% of variation in Y is due to Ad Size & Circulation SYX 29

Coefficient of Partial Determination Proportion of variation in Y ‘explained’ by variable XP holding all others constant Must estimate separate models Denoted r2Y1.2 in two X variables case Coefficient of partial determination of X1 with Y holding X2 constant Useful in selecting X variables 30

r 2Y1.2 Excel Output ANOVA df SS Regression 2 9.2497 Residual 3 0.2503 Total 5 9.5000 32

Testing Parameters 33

Evaluating Multiple Regression Model Steps Expanded! Examine variation measures Do residual analysis Test parameter significance Overall model Portions of model Individual coefficients Test for multicollinearity F New! New! New! 34

Testing Overall Significance Shows if there is a linear relationship between all X variables together & Y Uses F test statistic Hypotheses H0: b1 = b2 = ... = bP = 0 No linear relationship H1: At least one coefficient is not 0 At least one X variable affects Y Less chance of error than separate t-tests on each coefficient. Doing a series of t-tests leads to a higher overall Type I error than a. 35

Overall Significance Excel Output n - P -1 MSR / MSE n - 1 P-value 36

Testing Model Portions Examines the contribution of a set of X variables to the relationship with Y Null hypothesis: Variables in set do not improve significantly the model when all other variables are included Must estimate separate models Used in selecting X variables 37

Testing Model Portions Test Statistic Test H0: b1 = 0 in a 2 variable model From ANOVA section of regression for From ANOVA section of regression for 38

Multicollinearity 39

Evaluating Multiple Regression Model Steps Expanded! Examine variation measures Do residual analysis Test parameter significance Overall model Portions of model Individual coefficients Test for multicollinearity New! New! New! F 40

Multicollinearity High correlation between X variables Coefficients measure combined effect Leads to unstable coefficients depending on X variables in model Always exists; matter of degree Example: Using both Sales & Profit as explanatory variables in same model 41

Detecting Multicollinearity Examine correlation matrix Correlations between pairs of X variables are more than with Y variable Examine variance inflation factor (VIF) If VIFj > 5, multicollinearity exists Few remedies Obtain new sample data Eliminate one correlated X variable 42

Correlation Matrix Excel Output rY1 rY2 r12 43

VIF Excel Output Regress X1 on X2 44

This Class... Please take a moment to answer the following questions in writing: What was the most important thing you learned in class today? What do you still have questions about? How can today’s class be improved? As a result of this class, you will be able to... 144 10