Presentation on theme: "1 Chapter 9 Supplement Model Building. 2 Introduction Introduction Regression analysis is one of the most commonly used techniques in statistics. It is."— Presentation transcript:
2 Introduction Introduction Regression analysis is one of the most commonly used techniques in statistics. It is considered powerful for several reasons: –It can cover a variety of mathematical models linear relationships. non - linear relationships. nominal independent variables. –It provides efficient methods for model building
3 Polynomial Models Polynomial Models There are models where the independent variables (x i ) may appear as functions of a smaller number of predictor variables. Polynomial models are one such example.
4 y = 0 + 1 x 1 + 2 x 2 +…+ p x p + y = 0 + 1 x + 2 x 2 + …+ p x p + Polynomial Models with One Predictor Variable Polynomial Models with One Predictor Variable
5 y 0 1 x First order model (p = 1) y = 0 + 1 x + 2 x 2 + 2 < 0 2 > 0 Second order model (p=2) Polynomial Models with One Predictor Variable Polynomial Models with One Predictor Variable
6 y = 0 + 1 x + 2 x 2 + 3 x 3 + 3 < 0 3 > 0 Third order model (p = 3) Polynomial Models with One Predictor Variable Polynomial Models with One Predictor Variable
7 First order model y = 0 + 1 x 1 + Polynomial Models with Two Predictor Variables Polynomial Models with Two Predictor Variables x1x1 x2x2 y 2 x 2 + 1 < 0 1 > 0 x1x1 x2x2 y 2 > 0 2 < 0
8 First order model, two predictors, and interaction y = 0 + 1 x 1 + 2 x 2 + 3 x 1 x 2 + x1x1 X 2 = 2 X 2 = 3 X 2 =1 0 + 2 (1)] +[ 1 + 3 (1)]x 1 0 + 2 (3)] +[ 1 + 3 (3)]x1 0 + 2 (2)] +[ 1 + 3 (2)]x 1 The two variables interact to affect the value of y. First order model y = 0 + 1 x 1 + 2 x 2 + The effect of one predictor variable on y is independent of the effect of the other predictor variable on y. x1x1 X 2 = 1 X 2 = 2 X 2 = 3 0 + 2 (1)] + 1 x 1 0 + 2 (2)] + 1 x 1 0 + 2 (3)] + 1 x 1 Polynomial Models with Two Predictor Variables Polynomial Models with Two Predictor Variables
9 Second order model with interaction y = 0 + 1 x 1 + 2 x 2 + 3 x 1 2 + 4 x 2 2 + y = [ 0 + 2 (2)+ 4 (2 2 )]+ 1 x 1 + 3 x 1 2 + Second order model y = 0 + 1 x 1 + 2 x 2 + 3 x 1 2 + 4 x 2 2 + X 2 =1 X 2 = 2 X 2 = 3 y = [ 0 + 2 (1)+ 4 (1 2 )]+ 1 x 1 + 3 x 1 2 + x1x1 X 2 =1 X 2 = 2 X 2 = 3 y = [ 0 + 2 (3)+ 4 (3 2 )]+ 1 x 1 + 3 x 1 2 + Polynomial Models with Two Predictor Variables Polynomial Models with Two Predictor Variables 5 x 1 x 2 +
10 Selecting a Model Several models have been introduced. How do we select the right model? Selecting a model: –Use your knowledge of the problem (variables involved and the nature of the relationship between them) to select a model. –Test the model using statistical techniques.
11 Selecting a Model; Example Selecting a Model; Example Example: The location of a new restaurant –A fast food restaurant chain tries to identify new locations that are likely to be profitable. –The primary market for such restaurants is middle- income adults and their children (between the age 5 and 12). –Which regression model should be proposed to predict the profitability of new locations?
12 –Quadratic relationships between Revenue and each predictor variable should be observed. Why? Members of middle-class families are more likely to visit a fast food restaurant than members of poor or wealthy families. Income Low Middle High Revenue Families with very young or older kids will not visit the restaurant as frequent as families with mid-range ages of kids. age Revenue Low Middle High Selecting a Model; Example Selecting a Model; Example Solution –The dependent variable will be Gross Revenue
13 Selecting a Model; Example Selecting a Model; Example Solution –The quadratic regression model built is Sales = 0 + 1 INCOME + 2 AGE + 3 INCOME 2 + 4 AGE 2 + 5 ( INCOME )( AGE ) + Sales = 0 + 1 INCOME + 2 AGE + 3 INCOME 2 + 4 AGE 2 + 5 ( INCOME )( AGE ) + Include interaction term when in doubt, and test its relevance later. SALES = annual gross sales INCOME = median annual household income in the neighborhood AGE = mean age of children in the neighborhood
14 To verify the validity of the proposed model for recommending the location of a new fast food restaurant, 25 areas with fast food restaurants were randomly selected. –Each area included one of the firm’s and three competing restaurants. –Data collected included (Xm9-01.xls):Xm9-01.xls Previous year’s annual gross sales. Mean annual household income. Mean age of children Selecting a Model; Example Selecting a Model; Example
15 Xm9-01.xls Collected data Added data Selecting a Model; Example Selecting a Model; Example
16 The Quadratic Relationships – Graphical Illustration
17 Model Validation This is a valid model that can be used to make predictions. But…
18 Model Validation The model can be used to make predictions... …but multicollinearity is a problem!! The t-tests may be distorted, therefore, do not interpret the coefficients or test them. In excel: Tools > Data Analysis > Correlation Reducing multicollinearity
19 Nominal Independent Variables Nominal Independent Variables In many real-life situations one or more independent variables are nominal. Including nominal variables in a regression analysis model is done via indicator variables. An indicator variable (I) can assume one out of two values, “zero” or “one”. 1 if a first condition out of two is met 0 if a second condition out of two is met I= 1 if data were collected before 1980 0 if data were collected after 1980 1 if the temperature was below 50 o 0 if the temperature was 50 o or more 1 if a degree earned is in Finance 0 if a degree earned is not in Finance
20 Nominal Independent Variables; Example: Auction Price of Cars A car dealer wants to predict the auction price of a car. Xm9-02a_supp –The dealer believes now that odometer reading and the car color are variables that affect a car’s price. –Three color categories are considered: White Silver Other colors Note: Color is a nominal variable.
21 data - revised (Xm9-02b_supp)Xm9-02b_supp I 1 = 1 if the color is white 0 if the color is not white I 2 = 1 if the color is silver 0 if the color is not silver The category “Other colors” is defined by: I 1 = 0; I 2 = 0 Nominal Independent Variables; Example: Auction Price of Cars
22 Note: To represent the situation of three possible colors we need only two indicator variables. Conclusion: To represent a nominal variable with m possible categories, we must create m-1 indicator variables. How Many Indicator Variables?
23 Solution –the proposed model is y = 0 + 1 (Odometer) + 2 I 1 + 3 I 2 + –The data White car Other color Silver color Nominal Independent Variables; Example: Auction Car Price
24 From Excel we get the regression equation PRICE = 16701-.0555(Odometer)+90.48(I-1)+295.48(I-2) A white car sells, on the average, for $90.48 more than a car of the “Other color” category A silver color car sells, on the average, for $295.48 more than a car of the “Other color” category. For one additional mile the auction price decreases by 5.55 cents. Example: Auction Car Price The Regression Equation
25 Odometer Price Price = 16701 -.0555(Odometer) + 90.48(0) + 295.48(1) Price = 16701 -.0555(Odometer) + 90.48(1) + 295.48(0) Price = 16701 -.0555(Odometer) + 45.2(0) + 148(0) 16701 -.0555(Odometer) 16791.48 -.0555(Odometer) 16996.48 -.0555(Odometer) The equation for an “other color” car. The equation for a white color car. The equation for a silver color car. From Excel (Xm9-02b_supp) we get the regression equationXm9-02b PRICE = 16701-.0555(Odometer)+90.48(I-1)+295.48(I-2) Example: Auction Car Price The Regression Equation
26 There is insufficient evidence to infer that a white color car and a car of “other color” sell for a different auction price. There is sufficient evidence to infer that a silver color car sells for a larger price than a car of the “other color” category. Xm9-02bXm9-02b_supp Example: Auction Car Price The Regression Equation
27 The Dean wants to evaluate applications for the MBA program by predicting future performance of the applicants. The following three predictors were suggested: –Undergraduate GPA –GMAT score –Years of work experience It is now believed that the type of undergraduate degree should be included in the model. Nominal Independent Variables; Example: MBA Program Admission (MBA II) MBA IIMBA II Note: The undergraduate degree is nominal data.
28 Nominal Independent Variables; Example: MBA Program Admission I 1 = 1 if B.A. 0 otherwise I 2 = 1 if B.B.A 0 otherwise The category “Other group” is defined by: I 1 = 0; I 2 = 0; I 3 = 0 I 3 = 1 if B.Sc. or B.Eng. 0 otherwise
29 Nominal Independent Variables; Example: MBA Program Admission MBA-II
30 Applications in Human Resources Management: Pay-Equity Pay-equity can be handled in two different forms: –Equal pay for equal work –Equal pay for work of equal value. Regression analysis is extensively employed in cases of equal pay for equal work.
31 Human Resources Management: Pay-Equity Example (Xm9-03_supp)Xm9-03 –Is there sex discrimination against female managers in a large firm? –A random sample of 100 managers was selected and data were collected as follows: Annual salary Years of education Years of experience Gender
32 Solution –Construct the following multiple regression model: y = 0 + 1 Education + 2 Experience + 3 Gender + –Note the nature of the variables: Education – Interval Experience – Interval Gender – Nominal (Gender = 1 if male; =0 otherwise). Human Resources Management: Pay-Equity
33 Solution – Continued (Xm9-03)Xm9-03 Human Resources Management: Pay-Equity Analysis and Interpretation The model fits the data quite well. The model is very useful. Experience is a variable strongly related to salary. There is no evidence of sex discrimination.
34 Solution – Continued (Xm9-03)Xm9-03 Human Resources Management: Pay-Equity Analysis and Interpretation Further studying the data we find: Average experience (years) for women is 12. Average experience (years) for men is 17 Average salary for female manager is $76,189 Average salary for male manager is $97,832
35 Stepwise Regression Multicollinearity may prevent the study of the relationship between dependent and independent variables. The correlation matrix may fail to detect multicollinearity because variables may relate to one another in various ways. To reduce multicollinearity we can use stepwise regression. In stepwise regression variables are added to or deleted from the model one at a time, based on their contribution to the current model.
36 Model Building Identify the dependent variable, and clearly define it. List potential predictors. –Bear in mind the problem of multicollinearity. –Consider the cost of gathering, processing and storing data. –Be selective in your choice (try to use as few variables as possible).
37 Identify several possible models. –A scatter diagram of the dependent variables can be helpful in formulating the right model. –If you are uncertain, start with first order and second order models, with and without interaction. –Try other relationships (transformations) if the polynomial models fail to provide a good fit. Use statistical software to estimate the model. Gather the required observations (have at least six observations for each independent variable).
38 Determine whether the required conditions are satisfied. If not, attempt to correct the problem. Select the best model. –Use the statistical output. –Use your judgment!!