Download presentation

Published byKasey Oldford Modified over 3 years ago

1
**Learning Objectives Describe the linear regression model**

State the regression modeling steps Explain least squares Compute regression coefficients Describe residual analysis Predict the response variable Understand correlational analysis As a result of this class, you will be able to... 2

2
**Probabilistic Models Hypothesize 2 components**

Deterministic Random error Example: Sales volume is 10 times advertising spending plus random error Y = 10X + e Random error may be due to factors other than advertising 6

3
**Types of Probabilistic Models**

7

4
Regression Models Answer ‘What is the relationship between the variables?’ Equation used 1 numerical dependent (response) variable What is to be predicted 1 or more numerical or categorical independent (explanatory) variables Used mainly for prediction 8

5
**Regression Modeling Steps**

Define problem or question Specify model Collect data Do descriptive data analysis Estimate unknown parameters Evaluate model Use model for prediction 9

6
**Problem Definition Most critical step What are the model objectives?**

Don’t want right answer to wrong question What are the model objectives? Who will use the model? What will be the benefits? Are resources available (data etc.)? How will the results be implemented? 12

7
**Specifying the Model Define variables**

Conceptual (e.g., advertising, price) Empirical (e.g., list price, regular price) Measurement (e.g., $, units) Hypothesize nature of relationship Expected effects (i.e., coefficients’ signs) Functional form (linear or non-linear) Interactions 15

8
**Model Specification Is Based on Theory**

Economic & business theory Mathematical theory Previous research ‘Common sense’ 16

9
**Types of Regression Models**

This teleology is based on the number of explanatory variables & nature of relationship between X & Y. 25

10
Linear Equations High School Teacher © T/Maker Co. 28

11
**Linear Regression Model**

Relationship between variables is a linear function Population Y-Intercept Population Slope Random Error Dependent (Response) Variable Independent (Explanatory) Variable 29

12
**Sample Linear Regression Model**

ei = Random error Unsampled observation Observed value 36

13
**Scatter Diagram Plot of all (Xi, Yi) pairs**

Suggests how well model will fit 39

14
Thinking Challenge How would you draw a line through the points? How do you determine which line ‘fits best’? Alone Group Class 42

15
Least Squares ‘Best fit’ means difference between actual Y values & predicted Y values are a minimum But positive differences off-set negative LS minimizes the sum of the squared differences (or errors) 51

16
**Least Squares Graphically**

52

17
**Coefficient Equations**

Sample regression equation # (Xi, Yi) pairs Sample slope Average Xi’s, then square Sample Y-intercept 53

18
Computation Table 54

19
**Interpretation of Coefficients**

Slope (b1) Estimated Y changes by b1 for each 1 unit increase in X Example: If b1 = 2, then Sales (Y) is expected to increase by 2 for each 1 unit increase in Advertising (X) Y-Intercept (b0) Average value of Y when X = 0 Example: If b0 = 4, then average Sales (Y) is expected to be 4 when Advertising (X) is 0 55

20
**Parameter Estimation Example**

You’re a marketing analyst for Hasbro Toys. You gather the following data: Ad $ Sales (Units) What is the relationship between sales & advertising? 56

21
**Scatter Diagram Sales vs. Advertising**

57

22
**Parameter Estimation Solution Table**

58

23
**Coefficient Interpretation Solution**

Slope (b1) Sales Volume (Y) is expected to increase by .7 units for each $1 increase in Advertising (X) Y-Intercept (b0) Average value of Sales Volume (Y) is units when Advertising (X) is 0 Difficult to explain to Marketing Manager Expect some sales without advertising 60

24
**Parameter Estimation Excel Output**

bP b0 b1 61

25
Evaluating the Model How well does the model describe the relationship between the variables? Closeness of ‘best fit’ Closer the points to the line the better Assumptions met Significance of parameter estimates 71

26
**Evaluating Model Steps**

Examine variation measures Do residual analysis Test coefficients for significance 72

27
**Random Error Variation**

Variation of actual Y from predicted Y Measured by standard error of estimate Sample standard deviation of e Denoted SYX Affects several factors Parameter significance Prediction accuracy 75

28
**Standard Error of Estimate**

The mean error is 0. 76

29
**Measures of Variation in Regression**

Total sum of squares (SST) Measures variation of observed Yi around the mean`Y Explained variation (SSR) Variation due to relationship between X & Y Unexplained variation (SSE) Variation due to other factors 77

30
**Variation Measures Yi Unexplained sum of squares (Yi - Yi)2 ^**

Total sum of squares (Yi -`Y)2 Explained sum of squares (Yi -`Y)2 ^ 78

31
**Coefficient of Determination**

Proportion of variation ‘explained’ by relationship between X & Y 0 £ r2 £ 1 79

32
r 2 Examples r2 = 1 r2 = 1 r2 = .8 r2 = 0 80

33
**Adjusted Coefficient of Determination**

Proportion of variation ‘explained’ by relationship between X & Y Reflects Sample size Number of independent variables 81

34
**Coef. of Determination Excel Output**

r2 adjusted for number of explanatory variables & sample size SYX 86

35
**Residual Analysis Graphical analysis of residuals Purposes**

Plot residuals vs. Xi values Residuals are also called errors Difference between actual Yi & predicted Yi Purposes Examine functional form (linear vs. non-linear model) Evaluate violations of assumptions 89

36
**Linear Regression Assumptions**

Normality Y values are normally distributed for each X Probability distribution of error is normal Homoscedasticity (constant variance) Independence of errors Linearity 90

37
**Residual Plot for Functional Form**

Add X2 Term Correct Specification 92

38
**Residual Plot for Homoscedasticity**

Heteroscedasticity Correct Specification Fan-shaped. Standardized residuals used typically. 93

39
**Residual Plot for Independence**

Not Independent Correct Specification Plots reflect sequence data were collected. 94

40
**Residual Analysis Excel Output**

The plot is standardized (student) residuals for each observation. For observation 5, the standardized residual is large. You can save the residuals & do descriptive analysis on them, including a normal probability plot. There are not enough observations here to make further analysis meaningful. 95

41
**Residual Plot Excel Output**

42
**Test of Slope Coefficient**

Tests if there is a linear relationship between X & Y Involves population slope b1 Hypotheses H0: b1 = 0 (No linear relationship) H1: b1 ¹ 0 (Linear relationship) Theoretical basis is sampling distribution of slopes 101

43
**Test of Slope Parameter Solution**

H0: b1 = 0 H1: b1 ¹ 0 a = .05 df = = 3 Critical Value(s): Test Statistic: Decision: Conclusion: Reject at a = .05 There is evidence of a relationship 109

44
**Test Statistic Solution**

110

45
**Test of Slope Parameter Excel Output**

‘Standard Error’ is the estimated standard deviation of the sampling distribution, sbP. bP Sb t = bP /Sb P P P-Value 111

46
**Prediction With Regression Models**

Types of predictions Point estimates Interval estimates What is predicted Population mean response (mYX) for given X Point on population regression line Individual response (Yi) for given X 114

47
What Is Predicted 115

48
**Factors Affecting Interval Width**

Level of confidence (1 - a) Width increases as confidence increases Data dispersion (SYX) Width increases as variation increases Sample size Width decreases as sample size increases Distance of Xgiven from mean`X Width increases as distance increases 117

49
**Regression Cautions Violated assumptions Relevancy of historical data**

Level of significance Extrapolation Cause & effect Relevancy of Historical Data Even if interpolating, conditions may have changed. Level of Significance r2 may be high, but at what level? Extrapolation Prediction Outside the Range of X Values Used to Develop Equation Interpolation Prediction Within the Range of X Values Used to Develop Equation Based on smallest & largest X Values Cause & Effect The # of teachers is highly correlated with liquor consumption due to population size! 126

50
**Extrapolation Extrapolation**

Prediction Outside the Range of X Values Used to Develop Equation Interpolation Prediction Within the Range of X Values Used to Develop Equation Based on smallest & largest X Values 127

51
**Cause & Effect Liquor Consumption # Teachers**

The # of teachers is highly correlated with liquor consumption due to population size! # Teachers 128

52
**Types of Probabilistic Models**

130

53
Correlation Models Answer ‘How strong is the linear relationship between 2 variables?’ Coefficient of correlation used Population correlation coefficient denoted r (rho) Values range from -1 to +1 Measures degree of association Used mainly for understanding 131

54
**Sample Coefficient of Correlation**

Pearson Product-Moment Coefficient of Correlation: 132

55
**Correlation & Regression Line**

141

56
**Test of Correlation Coefficient**

Shows if there is a linear relationship between 2 numerical variables Same conclusion as testing population slope b1 Hypotheses H0: r = 0 (No correlation) H1: r ¹ 0 (Correlation) 142

57
**Conclusion Described the linear regression model**

Stated the regression modeling steps Explained least squares Computed regression coefficients Described residual analysis Predicted the response variable As a result of this class, you will be able to... 143

58
**Learning Objectives Explain the linear multiple regression model**

Interpret linear multiple regression computer output Explain multicollinearity As a result of this class, you will be able to... 2

59
**Multiple Regression Models**

10

60
**Linear Multiple Regression Model**

Relationship between 1 dependent & 2 or more independent variables is a linear function Population Y-intercept Population slopes Random error Dependent (response) variable Independent (explanatory) variables 11

61
**Population Multiple Regression Model**

Bivariate model 12

62
**Sample Multiple Regression Model**

Bivariate model 13

63
**Regression Modeling Steps**

Define problem or question Specify model Collect data Do descriptive data analysis Estimate unknown parameters Evaluate model Use model for prediction 14

64
**Linear Multiple Regression Model**

Parameter Estimation Linear Multiple Regression Model 15

65
**Multiple Linear Regression Equations**

Too complicated by hand! Ouch! 16

66
**Interpretation of Estimated Coefficients**

Slope (bP) Estimated Y changes by bP for each 1 unit increase in XP holding all other variables constant Example: If b1 = 2, then Sales (Y) is expected to increase by 2 for each 1 unit increase in Advertising (X1) given the Number of Sales Rep’s (X2) Y-Intercept (b0) Average value of Y when XP = 0 17

67
**Parameter Estimation Example**

You work in advertising for the New York Times. You want to find the effect of ad size (sq. in.) & newspaper circulation (000) on the number of ad responses (00). You’ve collected the following data: Resp Size Circ Is this model specified correctly? What other variables could be used (color, photo’s etc.)? 18

68
**Parameter Estimation Excel Output**

bP b0 b2 b1 19

69
**Interpretation of Coefficients Solution**

Slope (b1) # Responses to Ad is expected to increase by (20.49) for each 1 sq. in. increase in Ad Size holding Circulation constant Slope (b2) # Responses to Ad is expected to increase by (28.05) for each 1 unit (1,000) increase in Circulation holding Ad Size constant Y-intercept is difficult to interpret. How can you have any responses with no circulation? 20

70
Evaluating the Model 21

71
**Regression Modeling Steps**

Define problem or question Specify model Collect data Do descriptive data analysis Estimate unknown parameters Evaluate model Use model for prediction F 22

72
**Evaluating Multiple Regression Model Steps**

Examine variation measures Do residual analysis Test parameter significance Overall model Portions of model Individual coefficients Test for multicollinearity 23

73
**Coef. of Determination Excel Output**

r2Y.12 r2adj means 95.61% of variation in Y is due to Ad Size & Circulation SYX 29

74
**Coefficient of Partial Determination**

Proportion of variation in Y ‘explained’ by variable XP holding all others constant Must estimate separate models Denoted r2Y1.2 in two X variables case Coefficient of partial determination of X1 with Y holding X2 constant Useful in selecting X variables 30

75
**r 2Y1.2 Excel Output ANOVA df SS Regression 2 9.2497 Residual 3 0.2503**

Total 5 9.5000 32

76
Testing Parameters 33

77
**Evaluating Multiple Regression Model Steps**

Expanded! Examine variation measures Do residual analysis Test parameter significance Overall model Portions of model Individual coefficients Test for multicollinearity F New! New! New! 34

78
**Testing Overall Significance**

Shows if there is a linear relationship between all X variables together & Y Uses F test statistic Hypotheses H0: b1 = b2 = ... = bP = 0 No linear relationship H1: At least one coefficient is not 0 At least one X variable affects Y Less chance of error than separate t-tests on each coefficient. Doing a series of t-tests leads to a higher overall Type I error than a. 35

79
**Overall Significance Excel Output**

n - P -1 MSR / MSE n - 1 P-value 36

80
**Testing Model Portions**

Examines the contribution of a set of X variables to the relationship with Y Null hypothesis: Variables in set do not improve significantly the model when all other variables are included Must estimate separate models Used in selecting X variables 37

81
**Testing Model Portions Test Statistic**

Test H0: b1 = 0 in a 2 variable model From ANOVA section of regression for From ANOVA section of regression for 38

82
Multicollinearity 39

83
**Evaluating Multiple Regression Model Steps**

Expanded! Examine variation measures Do residual analysis Test parameter significance Overall model Portions of model Individual coefficients Test for multicollinearity New! New! New! F 40

84
**Multicollinearity High correlation between X variables**

Coefficients measure combined effect Leads to unstable coefficients depending on X variables in model Always exists; matter of degree Example: Using both Sales & Profit as explanatory variables in same model 41

85
**Detecting Multicollinearity**

Examine correlation matrix Correlations between pairs of X variables are more than with Y variable Examine variance inflation factor (VIF) If VIFj > 5, multicollinearity exists Few remedies Obtain new sample data Eliminate one correlated X variable 42

86
**Correlation Matrix Excel Output**

rY1 rY2 r12 43

87
VIF Excel Output Regress X1 on X2 44

88
This Class... Please take a moment to answer the following questions in writing: What was the most important thing you learned in class today? What do you still have questions about? How can today’s class be improved? As a result of this class, you will be able to... 144 10

Similar presentations

Presentation is loading. Please wait....

OK

Before Between After.

Before Between After.

© 2018 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on views in dbms tutorial Ppt on history of olympics game Ppt on game theory prisoner's dilemma Ppt on polynomials download Ppt on eco friendly agricultural practices Ppt on attendance management system using rfid Ppt on water softening techniques of alcohol Ppt on condition monitoring system Ppt on railway track Ppt on stop and wait protocol