Presentation is loading. Please wait.

Presentation is loading. Please wait.

Managerial Economics & Decision Sciences Department tyler realty  old faithful  business analytics II Developed for © 2016 kellogg school of management.

Similar presentations


Presentation on theme: "Managerial Economics & Decision Sciences Department tyler realty  old faithful  business analytics II Developed for © 2016 kellogg school of management."— Presentation transcript:

1 Managerial Economics & Decision Sciences Department tyler realty  old faithful  business analytics II Developed for © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II ▌ assignment two - solutions week 1 week 2 week 3

2 © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II readings ► statistics & econometrics ► (MSN)  test hypotheses for linear regression parameters  build, run and interpret results for linear regression learning objectives  visualize data through graphs/diagrams  run a linear regression  perform tests for linear regression parameters ►  Chapter 3 ► (CS)  Tyler Realty  Old Faithful assignment two - solutions the linear regression model business analytics II Developed for

3 Managerial Economics & Decision Sciences Department assignment two - solutions the linear regression model business analytics II Developed for tyler realty ◄ old faithful ◄ © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II | page1 Tyler Realty: visualize data and regression ► First step in analyzing (a two-dimensional data problem) is to visualize the relation between the variables. To find the parameters of the linear fit (intercept and slope) we should run the regression of price on sqfoot. Remark. The linear regression provides estimates b 0 and b 1 of true parameters  0 and  1 assumed to reflect the relation between mean of price and sqfoot at population level: population level: E [ price ]   0   1  sqfoot sample-based: Est. E [ price ]  b 0  b 1  sqfoot assignment two twoway (scatter price sqfoot) (lfit price sqfoot) regress price sqfoot Figure 1. Graphical relation: price and sqfoot price | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------+---------------------------------------------------------------- sqfoot |.0442925.00162 27.34 0.000.0410776.0475074 _cons | 11.54127 3.83751 3.01 0.003 3.925861 19.15669 Figure 2. Results for linear regression of price on sqfoot ► The estimated parameters are b 0  11.541 and b 1  0.044. These values are stored by STATA as _b[_cons] and _b[sqfoot] respectively; these can be referred as such in subsequent calculations.

4 Managerial Economics & Decision Sciences Department assignment two - solutions the linear regression model business analytics II Developed for tyler realty ◄ old faithful ◄ © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II | page2 Tyler Realty: interpret the linear regression ► The estimated regression line is: Est. E [ price ]  11.541  0.044  sqfoot assignment two ► How do we interpret the regression results ? Each additional square foot of space in a new home raises the average selling price by.0442925·(1,000) of dollars – that is by $44. The interpretation of the estimated constant term would be: the average selling price of a new home of size 0 square feet is estimated to be $11,541. As an aside, notice that a house with 0 square feet is nonsensical here – in this application one would not be interested in the estimated constant term alone, only as a piece helping estimate the average selling prices for larger homes. regress price sqfoot generate pricehat  _b[_cons]  _b[sqfoot]*sqfoot twoway (scatter price sqfoot) (connected pricehat sqfoot, sort msymbol(i))  In the first step we perform the regression which provides the estimates for the linear coefficients. We generate then the fitted values of price for each available observation of sqfoot. We usually call this fitted value as varnamehat and its calculation is fairly intuitive: use the estimated coefficients and “plug” the values for sqfoot.  Add msymbol(i) as an option to the connected graph in order to remove the markers along the line. Est. E[ price ]  11.54127  0.0442925· sqfoot Figure 3. Linear regression graphical representation

5 Managerial Economics & Decision Sciences Department assignment two - solutions the linear regression model business analytics II Developed for tyler realty ◄ old faithful ◄ © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II | page3 Tyler Realty: prediction assignment two ► For a 2,000 square feet house the predicted average price according to the estimated regression is: Est.E [ price | sqfoot  2000 ]  11.54127  0.0442925·2000  100.1263 (thousands) predicted average price for sqfoot = 2000 Figure 4. Prediction: graphical representation  Use display _b[_cons] + _b[sqfoot]*2000 to get the result. Remark. Graphically, the predicted price lies on the fitted line corresponding to the estimated regression. Why?  Whenever you “plug” values for independent variables into the regression equation you basically pick points on the regression line.

6 Managerial Economics & Decision Sciences Department assignment two - solutions the linear regression model business analytics II Developed for tyler realty ◄ old faithful ◄ © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II | page4 Tyler Realty: hypothesis testing assignment two ► We are trying to evaluate the claim that “adding 10 square feet will be associated with an increase in the average price of more than $440”. Before we even state the null/alternative hypotheses let’s make sure we know is claim referring to in terms of our regression results. ► The claim is about the change in the y variable ( price ) versus the change in a x variable ( sqfoot ). This is captured by the coefficient of that x variable. For our regression: at the population level this is given by  1 while the regression- based estimate of this parameter is b 1. ► Thus the null/alternative hypotheses are about by  1 while the test will be based on b 1. The benchmark X 0 is 0.044 (notice the units of measurement) ► Let’s use first the claim as the null hypothesis: hypothesis test decision H 0 :  1  0.044 H a :  1  0.044 calculate calculate (left tail) p  value  Pr[ T  ttest ] reject the null hypothesis if p  value   set hypotheses Remark. Notice that the way we calculate the ttest is consistent across all applications:  The degrees of freedom are set equal to n  k  1 with n the number of observations, k is the number of x variables and 1 stands for the constant used in the regression (if any).

7 Managerial Economics & Decision Sciences Department assignment two - solutions the linear regression model business analytics II Developed for tyler realty ◄ old faithful ◄ © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II | page5 Tyler Realty: hypothesis testing assignment two ► The regression table provides all the information needed to calculate the ttest : price | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------+---------------------------------------------------------------- sqfoot |.0442925.00162 27.34 0.000.0410776.0475074 _cons | 11.54127 3.83751 3.01 0.003 3.925861 19.15669 Figure 5. Results for linear regression of price on sqfoot _b[sqfoot] _se[sqfoot] ► The calculated results: scalar t_test_sqfoot  (_b[sqfoot]  0.044)/_se[sqfoot] display t_test_sqfoot display 1  ttail(98,t_test_sqfoot) ► We can “manually” calculate the ttest as ► The (left tail) (left tail) p  value  Pr[ T  ttest ]  1  ttail(98,t_test_sqfoot)  0.57145687 ► Obviously we cannot reject the stated null H 0 :  1  0.044 for   5% (in fact we cannot reject the null for any choice of  up to about 58%.)

8 Managerial Economics & Decision Sciences Department assignment two - solutions the linear regression model business analytics II Developed for tyler realty ◄ old faithful ◄ © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II | page6 Tyler Realty: hypothesis testing assignment two ► What if we use the claim as the alternate hypothesis? hypothesis test decision H 0 :  1  0.044 H a :  1  0.044 calculate calculate (right tail) p  value  Pr[ T  ttest ] reject the null hypothesis if p  value   set hypotheses ► Again we cannot reject the stated null H 0 :  1  0.044 for   5% (in fact we cannot reject the null for any choice of  up to about 43%.) ► The (right tail) p  value  Pr[ T  ttest ]  ttail(98,t_test_sqfoot)  0.42854313 ► We already calculate the ttest as

9 Managerial Economics & Decision Sciences Department assignment two - solutions the linear regression model business analytics II Developed for tyler realty ◄ old faithful ◄ © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II | page7 Tyler Realty: hypothesis testing assignment two ► This is an interesting situation: we started with the null as being the claim and concluded that we cannot reject the null and then we “flipped” the null/alternative and we concluded that we cannot reject the “opposite” of the claim either. Figure 6. Calculated ttest and p  value area to the left of ttest 57.15% area to the right of ttest 42.85% H 0 :  1  0.044 H a :  1  0.044 H 0 :  1  0.044 H a :  1  0.044 ► Is our analysis flawed? The diagram below shows the calculated ttest and associated p  value for both sets of null/alternative hypotheses. Since the calculated ttest is close to 0 it means that the left and right tail are both close to 50% which means that based on the provided sample we cannot reject the neither the null nor its opposite beyond any reasonable doubt ! We cannot have enough negative evidence against either of the claim or the opposite of the claim! Remark. If you consider now the null with equality: H 0 :  1  0.044 H a :  1  0.044  Running a two-tail ttest, you find p  value  Pr[ T  | ttest |]  2*ttail(98,abs(t_test_sqfoot))  0.85708626  You cannot reject the null  1  0.044.

10 Managerial Economics & Decision Sciences Department assignment two - solutions the linear regression model business analytics II Developed for tyler realty ◄ old faithful ◄ © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II | page8 Old Faithful: the estimated regression assignment two ► From the regression table we can infer immediately the estimated regression line. Time | Coef. Std. Err. t P>|t| ------+--------------------------------------- Dur | 9.79006 1.29990618 7.5314 0.000 _cons | 31.01311 4.41658492 7.022 0.000 Figure 7. Regression results ► The estimated regression equation is Est. mean TIME  31.013  9.79· DUR ► The interpretation of the intercept estimate is that if the last eruption lasted zero minutes, the estimate average time until the next eruption is a little more than 31 minutes. The interpretation of the slope estimate is that for each additional minute that an eruption lasts, the average time until the next eruption is estimated to increase by 9.79 minutes. ► At a 5% significance level both the intercept and the coefficient of DUR variable are significant. This may be seen by noting that the reported p -values for the corresponding columns are 0.00 (which are clearly less than 0.05). Recall that an intercept being significant means that you cannot prove it is equal to zero, and a variable being significant means that you cannot prove its coefficient (or slope) is equal to zero. ► The estimate for the average time until the next eruption, when DUR = 3, is 31.013 + 9.79·3 = 60.38 minutes


Download ppt "Managerial Economics & Decision Sciences Department tyler realty  old faithful  business analytics II Developed for © 2016 kellogg school of management."

Similar presentations


Ads by Google