Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 22 – Thurs., Nov. 25 Nominal explanatory variables (Chapter 9.3) Inference for multiple regression (Chapter 10.1-10.2)

Similar presentations


Presentation on theme: "Lecture 22 – Thurs., Nov. 25 Nominal explanatory variables (Chapter 9.3) Inference for multiple regression (Chapter 10.1-10.2)"— Presentation transcript:

1 Lecture 22 – Thurs., Nov. 25 Nominal explanatory variables (Chapter 9.3) Inference for multiple regression (Chapter 10.1-10.2)

2 Nominal Variables To incorporate nominal variables in multiple regression analysis, we use indicator variables. Indicator variable to distinguish between two groups –The time onset (early vs. late) is a nominal variable. To incorporate it into multiple regression analysis, we used indicator variable early which equals 1 if early, 0 if late.

3 Nominal Variables with More than Two Categories To incorporate nominal variables with more than two categories, we use multiple indicator variables. If there are k categories, we need k-1 indicator variables.

4 Nominal Explanatory Variables Example: Auction Car Prices A car dealer wants to predict the auction price of a car. –The dealer believes that odometer reading and the car color are variables that affect a car’s price (data from sample of cars in auctionprice.JMP) –Three color categories are considered: White Silver Other colors Note: Color is a nominal variable.

5 I 1 = 1 if the color is white 0 if the color is not white I 2 = 1 if the color is silver 0 if the color is not silver The category “Other colors” is defined by: I 1 = 0; I 2 = 0 Indicator Variables in Auction Car Prices

6 Solution –the proposed model is –The data White car Other color Silver color Auction Car Price Model

7 Odometer Price Price = 16701 -.0555(Odometer) + 90.48(0) + 295.48(1) Price = 16701 -.0555(Odometer) + 90.48(1) + 295.48(0) Price = 6350 -.0278(Odometer) + 45.2(0) + 148(0) 16701 -.0555(Odometer) 16791.48 -.0555(Odometer) 16996.48 -.0555(Odometer) The equation for an “other color” car. The equation for a white color car. The equation for a silver color car. From JMP we get the regression equation PRICE = 16701-.0555(Odometer)+90.48(I-1)+295.48(I-2) Example: Auction Car Price The Regression Equation

8 From JMP we get the regression equation PRICE = 16701-.0555(Odometer)+90.48(I-1)+295.48(I-2) A white car sells, on the average, for $90.48 more than a car of the “Other color” category A silver color car sells, on the average, for $295.48 more than a car of the “Other color” category. For one additional mile the auction price decreases by 5.55 cents. Example: Auction Car Price The Regression Equation

9 There is insufficient evidence to infer that a white color car and a car of “other color” sell for a different auction price. There is sufficient evidence to infer that a silver color car sells for a larger price than a car of the “other color” category. Xm18-02b Example: Auction Car Price The Regression Equation

10 Shorthand Notation for Nominal Variables Shorthand Notation for regression model with Nominal Variables. Use all capital letters for nominal variables –Parallel Regression Lines model: –Separate Regression Lines model:

11 Nominal Variables in JMP It is not necessary to create indicator variables yourself to represent a nominal variable. Nominal variables in JMP: Make sure that the nominal variable’s modeling type is in fact nominal. Include the nominal variable in the Construct Model Effects box in Fit Model JMP will create indicator variables. The brackets indicate the category of the nominal variable for which the indicator variable is 1. JMP will leave out the level which is highest alphabetically or numerically.

12 Specially Constructed Explanatory Variables Types of specially constructed explanatory variables: –Powers of variables –Products of variables (interactions) –Indicator variables to represent nominal variables –Transformations of variables (e.g., log) Use matrix of pairwise scatterplots to initially examine the data and look for needed transformations, powers of variables.

13 Inference for Multiple Regression Chapter 10.2 –Tests for single coefficients –Confidence intervals for single coefficients –Confidence intervals for mean response at –Prediction intervals for Chapter 10.3 –F-test for overall significance of regression –F-test for joint significance of several terms (will not cover)

14 Case Study 10.1.2 Question: Do echolocating bats expend more energy than nonecholocating bats after accounting for body size? Data: Body mass and flight energy expenditure for 4 nonecholocating bats, 12 non-echolocating birds and 4 echolocating bats. Strategy: Build a multiple regression model for mean energy expended as a function of type of flying vertebrate (echolocating bat, nonecholocating bat, nonecholocating bird) and body size. –Explore (resolve need for transformation) –Test for interaction –If no interaction, answer question with the three parallel lines model

15 Coded Scatterplots To construct a coded scatterplot, create columns energy nonecholocating bat, energy nonecholocating bird and energy echolocating bat. The column energy nonecholocating bat should contain only the energies for nonecholocating bats and a blank for all other species. Click graph, overlay plot, put energy nonecholocating bat, energy nonecholocating bird and energy echolocating bat in Y and mass in X.

16 Coded Scatterplots

17 Separate/Parallel Regression Lines Model Separate regression lines model: Parallel regression lines model:

18 Inferences for Echolocating Bats Is the parallel regression lines model appropriate? Test and There is no evidence against the parallel regression lines model so we go ahead and use it to answer the question of interest – do echolocating bats use less energy than nonecholating bats of the same body size ( ) and nonecholocating birds of the same body size.( )

19 Inferences for Echolocating Bats Cont. No strong evidence that echolocating bats use less energy than either nonecholocating bats (p-value = 0.35) or nonecholocating birds (p- value = 0.77) of same body size. 95% Confidence interval for difference in mean of log energy for nonecholocating bats and echolocating bats of same body size: (- 0.51,0.35). This means that 95% confidence interval for ratio of median energy for nonecholocating bats and echolocating bats of same body size is Summary of findings: Although there is no strong evidence that echolocating bats use less energy than nonecholocating bats of same body size, it is still plausible that they use quite a less bit energy (60% as much at the median). Study is inconclusive.

20 Prediction Intervals To find a 95% prediction interval for the mean log energy of a flying vertebrate of a given type and mass, –Fit the multiple regression model –Click red triangle next to response log energy, click save columns, click predicted values and also click indiv confid interval. This saves the predicted values, lower 95% prediction interval endpoint and upper 95% prediction interval endpoint for each observation in data set. –To get prediction interval for X’s that are not in the data set, enter a row with those X’s and then exclude the observation.


Download ppt "Lecture 22 – Thurs., Nov. 25 Nominal explanatory variables (Chapter 9.3) Inference for multiple regression (Chapter 10.1-10.2)"

Similar presentations


Ads by Google