Download presentation

Presentation is loading. Please wait.

Published byFelipe Gatliff Modified about 1 year ago

1
Types of regression models Regression Models Simple Multiple 2° order 1° order 2° order 1° order Interaction Higher order

2
A quadratic second order model E(Y)=β 0 + β 1 x+ β 2 x 2 Interpretation of model parameters: β 0 : y-intercept. The value of E(Y) when x 1 = x 2 = 0 β 1 : is the shift parameter; β 2 : is the rate of curvature;

3
Example with quadratic terms The true model, supposedly unknown, is Y i = 2 + x i 2 + ε i, with ε i ~N(0,2) Data: (x,y). See SQM.sav

4
Model 1: E(Y) = β 0 + β 1 x

5

6
Model 2: E(Y) = β 0 + β 1 x 2 Smaller variance and SE

7

8
Model 3: E(Y) = β 0 + β 1 x + β 2 x 2

9
Types of regression models Regression Models Simple Multiple 2° order 1° order 2° order 1° order Interaction Higher order

10
3 < 0 3 > 0 A third order model with 1 IV E(Y)=β 0 + β 1 x+ β 2 x 2 + β 3 x 3 Use with caution given numerical problems that could arise

11
Types of regression models Regression Models Simple Multiple 2° order 1° order 2° order 1° order Interaction Higher order

12
First-Order model in k Quantitative variables E(Y)=β 0 +β 1 x 1 +β 2 x β k x k Interpretation of model parameters: β 0 : y-intercept. The value of E(Y) when x 1 = x 2 =...= x k = 0 β 1 : change in E(Y) for a 1-unit increase in x 1 when x 2,.., x k are held fixed; β 2 : change in E(Y) for a 1-unit increase in x 2 when x 1, x 3,..., x k are held fixed;...

13
A bivariate model Changing x 2 changes only the y-intercept. E(Y)=β 0 +β 1 x 1 +β 2 x 2 In the first order model a 1-unit change in one independent variable will have the same effect on the mean value of y regardless of the other independent variables.

14
A bivariate model

15
Example: executive salaries Y = Annual salary (in dollars) x 1 = Years of experience x 2 = Years of education x 3 = Gender : 1 if male; 0 if female x 4 = Number of employees supervised x 5 = Corporate assets (in millions of dollars) Data: ExecSal.sav E(Y)=β 0 + β 1 x 1 + β 2 x 2 + β 4 x 4 + β 5 x 5 Do not consider x 3 (Gender) for the moment

16
Exsecutive salaries: Computer Output Riepilogo del modello Modello RR-quadrato R-quadrato corretto Deviazione standard Errore della stima,870 a,757, ,309 a. Predittori: (Costante), Corporate assets (in million $), Years of Experience, Years of Education, Number of Employees supervised Riepilogo del modello Modello RR-quadrato R-quadrato corretto Deviazione standard Errore della stima dimension0 1,783 a,613, ,006 a. Predittori: (Costante), Years of Experience Simple regression Multiple regression

17
Coefficient of determination The coefficient R 2 is computed exactly as in the simple regression case. SSE (Error) SSR (Regression) SST (Total) A drawback of R 2 : it increases with the number of added variables, even if these are NOT relevant to the problem.

18
A solution: Adjusted R 2 –Each additional variable reduces adjusted R 2, unless SSE varies enough to compensate Adjusted R 2 and estimate of the variance σ 2 An unbiased estimator of the variance σ 2 is computed as

19
Coefficienti a Model Coefficienti non standardizzati Coefficienti standardizz ati t Sig. B Deviazione standard Errore Beta 1 (Costante) , ,089-2,175,032 Years of Experience 2696,360173,647,78515,528,000 Years of Education 2656,017563,476,2434,714,000 Number of Employees supervised 41,0927,807,2725,264,000 Corporate assets (in million $) 244,56983,420,1492,932,004 Variabile dipendente: Annual salary in $ Exsecutive salaries: Computer Output (2) Variables T-tests

20
1.Shows If There Is a Linear Relationship Between All X Variables Together & Y 2.Uses F Test Statistic 3.Hypotheses –H 0 : 1 = 2 =... = k = 0 No Linear Relationship –H a : At Least One Coefficient Is Not 0 At Least One X Variable Affects Y The F-test for 1 single coefficient is equivalent to the t-test Testing overall significance: the F-test

21
Anova table Anova b Modello Somma dei quadratidf Media dei quadratiFSig. 1 Regressione4,766E1041,192E1074,045,000 a Residuo1,529E10951,609E8 Totale6,295E1099 a. Predittori: (Costante), Corporate assets (in million $), Years of Experience, Years of Education, Number of Employees supervised b. Variabile dipendente: Annual salary in $ F-statistic MSE (mean square error), the estimate of variance df = k: number of regression slopes df = n-1: n= number of observations p-vale of F-test Decision: reject H 0, i.e. accept this model

22
Interaction (second order) model E(Y)=β 0 + β 1 x 1 + β 2 x 2 + β 3 x 1 x 2 Interpretation of model parameters: β 0 : y-intercept. The value of E(Y) when x 1 = x 2 = 0 β 1 + β 3 x 2 : change in E(Y) for a 1-unit increase in x 1 when x 2 is held fixed; β 2 + β 3 x 1 : change in E(Y) for a 1-unit increase in x 2 when x 1 is held fixed; β 3 : controls the rate of change of the surface.

23
Interaction (second order) model Contour lines are not parallel E(Y)=β 0 + β 1 x 1 + β 2 x 2 + β 3 x 1 x 2 The effect of one variable depends on the level of the other

24
Example: Antique grandfather clocks auction Clocks are sold at an auction on competitive offers. Data are: –Y : auction price in dollars –X 1 : age of clocks –X 2 : number of bidders Model 1: E(Y) = β 0 + β 1 x 1 + β 2 x 2 Model 2: E(Y) = β 0 + β 1 x 1 + β 2 x 2 + β 3 x 1 x 2 Data: GFCLOCKS.sav

25
Data summaries If data are Normal Skewness is 0 If data are Normal (eccess) Kurtosis is 0 Note: Skewness and Kurtosis are not enough to establish Normality

26
P-P plot for Normality If data are Normal. Points should be along the straight line. In this example the situation is fairly good

27
Bivariate scatter-plots

28
Model 1: E(Y) = β 0 + β 1 x 1 + β 2 x 2

29
Model 2: E(Y) = β 0 + β 1 x 1 + β 2 x 2 + β 3 x 1 x 2

30
Interpreting interaction models The coefficient for the interaction term is significant. If an interaction term is present then also the corresponding first order terms need to be included to correctly interpret the model. In the example an uncareful analyst could estimate the effect of Bidders as negative, since b 2 = Since an interaction term is present, the slope estimate for Bidders (x 2 ) is b 2 + b 3 x 1 For x 1 = 150 (age) the estimated slope for Bidders is (150) = Note: b = β ^

31
Models with qualitative X’s Regression models can also include qualitative (or categorical) independent variables (QIV). The categories of a QIV are called levels Since the levels of a QIV are not measured on a natural numerical scale in order to avoid introducing fictitious linear relations in the model we need to use a specific type of coding. Coding is done by using IV which assume only two values: 0 or 1. These coded IV are called dummy variables

32
Models with QIV Suppose we want to model Income (Y) as a function of Sex (x) -> use coded, or dummy, variables x = 1 if Male, x = 0 if Female E(Y) = β 0 + β 1 x E(Y) = β 0 + β 1 if x =1, i.e. Male E(Y) = β 0 if x =0, i.e. Female β 0 is the base level, i.e Female is the reference category β 1 is the additional effect if Male In this simple model, only the means for the two groups are modeled

33
QIV with q levels As a general rule, if a QIV has q levels we need q-1 dummies for coding. The uncoded level is the reference one. Example: a QIV has three levels, A, B and C Define x 1 = 1 level A, x 1 = 0 if not x 2 = 1 level B, x 2 = 0 if not C is the reference level Model: E(Y) = β 0 + β 1 x 1 + β 2 x 2 Interpreting β’s β 0 = μ C (mean for base level C) β 1 = μ A - μ C (additional effect wrt C if level A) β 2 = μ B - μ C (additional effect wrt C if level B)

34
Models with dummies Dummies can be used in combination with any other dummies and quantitative X’s to construct models with first order effects (or main effects) and interactions to test hypotheses of interest. Even if models which consider only dummy variables do in practice estimate the means of various groups, the testing machinery of the regression setup can be useful for group comparisons. In order to define dummies in SPSS see “Computing dummy vars in SPSS.ppt”

35
Example: executive salaries A managing consulting firms has developed a regression model in order to analyze executive’s salary structure Y = Annual salary (in dollars) x 1 = Years of experience x 2 = Years of education x 3 = Gender : 1 if male; 0 if female x 4 = Number of employees supervised x 5 = Corporate assets (in millions of dollars) Data: ExecSal.sav

36
A simple model: E(Y) = β 0 + β 3 x 3 This model estimates the means of the two groups (M,F) We wanto to test if the difference in means is significant, i.e. not due to chance Male group Female group

37
Regression Output Salary difference between groups is significant Mean increment for Male C.I. for mean increment

38
Model 2: E(Y) = β 0 + β 1 x 1 + β 3 x 3 Model 2 considers same slope but different intercepts It seems that the two groups are separated If x 3 = 0 (female) then E(Y) = β 0 + β 1 x 1 If x 3 = 1 (male) then E(Y) = β 0 + β 3 + β 1 x 1

39
Computer output for model 2 R square improved greatly New intercept for Male is significant In this model effect of experience is assumed equal for the two groups

40
Model 3: E(Y) = β 0 + β 1 x 1 + β 3 x 3 + β 4 x 1 x 3 With this model we want to test whether gender and experience interacts, i.e. if male salary tend to grow at a faster (slower) rate with experience. If x 3 = 0 (female) then E(Y) = β 0 + β 1 x 1 If x 3 = 1 (male) then E(Y) = ( β 0 + β 3 ) + (β 1 + β 4 )x 1 New intercept for male New slope for male Remark: running regression for the two groups together allows to have higher degrees of freedom (n) for estimating parameters and model variance.

41
Model 3: E(Y) = β 0 + β 1 x 1 + β 3 x 3 + β 4 x 1 x 3 Model 3 considers different slope and different intercepts

42
Computer output for model 3 There is evidence that salaries for the two groups grow at different rate with experience Estimated lines: Y = *(Years of Experience) for female Y = *(Years of Experience) for male ^ ^

43
A complete second order model E(Y)=β 0 + β 1 x 1 + β 2 x 2 + β 3 x 1 x 2 + β 4 x β 5 x 2 2 Interpretation of model parameters: β 0 : y-intercept. The value of E(Y) when x 1 = x 2 = 0 β 1 and β 2 : shifts along the x 1 and x 2 axes; β 3 : rotation of the surface; β 4 and β 5 : controls the rate of curvature.

44
Back to Executive salaries What about if suspect that rate of growth changes and has opposite signs for M and F? E(Y)=β 0 + β 1 x 1 + β 2 x 3 + β 3 x 1 x 3 + β 4 x 1 2 x 1 = Years of experience x 3 = Gender (1 if Male) E(Y)=β 0 + β 1 x 1 + β 2 x 3 + β 3 x 1 x 3 + β 4 x β 5 x 3 x 1 2 Model 4 Model 5 Note: x 3 2 = x 3 since it is a dummy

45
Comparing Model 4 and 5 If x 3 = 0 (female) then E(Y) = β 0 + β 1 x 1 + β 4 x 1 2 If x 3 = 1 (male) then E(Y) = ( β 0 + β 2 ) + (β 1 + β 3 )x 1 + β 4 x 1 2 Model 4 Different intercept and slope for M and F but same curvature Model 5 If x 3 = 0 (female) then E(Y) = β 0 + β 1 x 1 + β 4 x 1 2 If x 3 = 1 (male) then E(Y) = ( β 0 + β 2 ) + (β 1 + β 3 )x 1 + (β 4 +β 5 )x 1 2 Different intercept, slope and curvature for M and F

46
Model 5: computer output Riepilogo del modello Modello RR-quadrato R-quadrato corretto Deviazione standard Errore della stima dimension0 1,875 a,766, ,735 a. Predittori: (Costante), Exp2Gen, Gender, Years of Experience, ExpSqu, ExpGen Anova b Modello Somma dei quadratidf Media dei quadratiFSig. 1 Regressione4,824E1059,648E961,673,000 a Residuo1,471E10941,564E8 Totale6,295E1099 a. Predittori: (Costante), Exp2Gen, Gender, Years of Experience, ExpSqu, ExpGen b. Variabile dipendente: Annual salary in $

47
Model 5: computer output Coefficienti a Modello Coefficienti non standardizzati tSig. B Deviazion e standard ErroreBeta 1 (Costante)52391, ,9718,063,000 Years of Experience 3373, ,248,9822,895,005 Gender21122, ,802,3992,549,012 ExpGen-2081, ,842-,724-1,426,157 ExpSqu-53,18145,001-,422-1,182,240 Exp2Gen112,83654,950,9042,053,043 a. Variabile dipendente: Annual salary in $ Which model is preferable? Model 3 or model 5?

48
A test for comparing nested models Two models are nested if one model contains all the terms of the other model and at least one additional term. The more complex of the two models is called the complete (or full) model. The other is called the reduced (or restricted) model. Example: model 1 is nested in model 2 Model 1: E(Y)=β 0 + β 1 x 1 + β 2 x 2 + β 3 x 1 x 2 Model 2: E(Y)=β 0 + β 1 x 1 + β 2 x 2 + β 3 x 1 x 2 + β 4 x β 5 x 2 2 To compare the two models we are interested in testing H 0 : β 4 = β 5 = 0, vs. H 1 : at least one, β 4 or β 5, differs from 0

49
F-test for comparing nested models Reduced model: E(Y) = β 0 + β 1 x 1 + … + β 2 x g Complete Model: E(Y) = β 0 + β 1 x 1 + … + β 2 x g + β g+1 x g+1 + … + β k x k To test H 0 : β g+1 = … = β k = 0 H 1 : at least one of the parameters being tested is not 0 Reject H 0 when F > F α, where F α is the level α critical point of an F distribution with (k-g, n-(k+1)) d.f. Compute

50
F-test for nested models Where: SSE R = Sum of squared errors for the reduced model; SSE C = Sum of squared errors for the complete model; MSE C = Mean square error for the complete model; Remark: k – g = number of parameters tested k +1 = number of parameters in the complete model n = total sample size

51
Compute partial F-tests with SPSS 1.Enter your complete model in the Regression dialog box –choose the Method “Enter” 2.Click on “Next” 3.In the new box for Independent variables, enter those you want to remove (i.e. those you’d like to test) –choose the Method “Remove” 4. In the “Statistics” option select “R squared change” 5. Ok.

52
Applying the F-test Model 3: E(Y) = β 0 + β 1 x 1 + β 2 x 3 + β 3 x 1 x 3 Let us use the F-test to compare Model 3 and Model 5 in the executive salaries example. Model 5: E(Y) = β 0 + β 1 x 1 + β 2 x 3 + β 3 x 1 x 3 + β 4 x β 5 x 3 x 1 2 Note that Model 3 is nested in Model 5 Apply the F-test for H 0 : β 4 = β 5 = 0

53
Computer output Variabili inserite/rimosse c Modello Variabili inserite Variabili rimosseMetodo 1Exp2Gen, Gender, Years of Experience, ExpSqu, ExpGen a.Per blocchi 2.a.a Exp2Gen, ExpSqu b Rimuovi a. Tutte le variabili richieste sono state immesse. b. Tutte le variabili richieste sono state rimosse. c. Variabile dipendente: Annual salary in $ Riepilogo del modello Model R R- quadr ato R- quadrat o corretto Deviazione standard Errore della stima Variazione dell'adattamento Variazione di R- quadrato Variazio ne di Fdf1df2 Sig. Variazio ne di F 1,875 °,766, ,735,76661,673594,000 2,868 b,754, ,080-,0122,488294,089 a. Predittori: (Costante), Exp2Gen, Gender, Years of Experience, ExpSqu, ExpGen b. Predittori: (Costante), Gender, Years of Experience, ExpGen F-statistic F p-value Do NOT reject H 0 : β 4 = β 5 = 0, i.e. Model 3 is better

54
A quadratic model example: Shipping costs –Y : cost of shipment in dollars –X 1 : package weight in pounds –X 2 : distance shipped in miles Model: E(Y) = β 0 + β 1 x 1 + β 2 x 2 + β 3 x 1 x 2 + β 4 x β 5 x 2 2 Data: Express.sav Although a regional delivery service bases the charge for shipping a package on the package weight and distance shipped, its profit per package depends on the package size (volume of space it occupies) and the size and nature of the delivery truck. The company conducted a study to investigate the relationship between the cost of shipment and the variables that control the shipping charge: weight and distance. It is suspected that non linear effect may be present

55
Scatter plots Scatter plots in multiple regression often do not show too much information

56
Model: E(Y) = β 0 + β 1 x 1 + β 2 x 2 + β 3 x 1 x 2 + β 4 x β 5 x 2 2 Not significant, try to eliminate Distance squared

57
Model: E(Y) = β 0 + β 1 x 1 + β 2 x 2 + β 3 x 1 x 2 + β 4 x 1 2

58
Applying the F-test: Shipping costs –Y : cost of shipment in dollars –X 1 : package weight in pounds –X 2 : distance shipped in miles Model 1: E(Y) = β 0 + β 1 x 1 + β 2 x 2 + β 3 x 1 x 2 + β 4 x β 5 x 2 2 Data: Express.sav A company conducted a study to investigate the relationship between the cost of shipment and the variables that control the shipping charge: weight and distance. It is suspected that non linear effect may be present, use the F-test for nested models to decide between Model 2: E(Y) = β 0 + β 1 x 1 + β 2 x 2 + β 3 x 1 x 2

59
ANOVA Tables Full model Reduced model

60
F-statistic To test H 0 : β 4 = β 5 = 0, from the ANOVA tables we have The critical value F α (at 5% level) for and F-distribution with 2 and 14 d.f. is 3.74 Since F (9.92) > F α (3.74) the null hypothesis is rejected at the 5% significance level. I.e. the model with quadratic terms is preferred over the reduced one.

61
Computer output F-statistic F p-value Reject H 0 : β 4 = β 5 = 0

62
Executive salaries: a final model (?) Y = Annual salary (in dollars) x 1 = Years of experience x 2 = Years of education x 3 = Gender : 1 if male; 0 if female x 4 = Number of employees supervised x 5 = Corporate assets (in millions of dollars) Try adding other variables to model 3 E(Y) = β 0 + β 1 x 1 + β 2 x 2 + β 3 x 3 + β 4 x 1 x 3 + β 5 x 4 + β 6 x 5 Model 6

63
Computer Output: Model 6 Riepilogo del modello Modello RR-quadrato R-quadrato corretto Errore della stima 1,963 a,927, ,089 a. Predittori: (Costante), Corporate assets (in million $), Years of Experience, Years of Education, Gender, Number of Employees supervised, ExpGender Anova b Model Somma dei quadratidf Media dei quadratiFSig. 1 Regressione5,836E1069,727E9197,384,000 a Residuo4,583E9934,928E7 Totale6,295E1099 a. Predittori: (Costante), Corporate assets (in million $), Years of Experience, Years of Education, Gender, Number of Employees supervised, ExpGender

64
Computer Output: Model 6 Coefficients Model Coefficienti non standardizzati Coefficient i standardiz zati tSig. B Deviazion e standard ErroreBeta 1 (Costante)-38331, ,238-4,021,000 Years of Experience2178,964171,979,634 12,670,000 Gender13203, ,775,249 4,208,000 ExpGender669,546209,042,233 3,203,002 Years of Education2689,594311,914,246 8,623,000 Number of Employees supervised 53,2394,470,353 11,910,000 Corporate assets (in million $) 180,31046,600,110 3,869,000 a. Variabile dipendente: Annual salary in $

65
Executive salaries: comparison of models Mod.PredictorsAdj. R 2 Standard error F-stat 1x1, x2, x4, x x1, x x1, x3, x1 ∙ x x1, x3, x1 ∙ x3, x4, x

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google