Presentation is loading. Please wait.

Presentation is loading. Please wait.

business analytics II ▌assignment one - solutions autoparts 

Similar presentations


Presentation on theme: "business analytics II ▌assignment one - solutions autoparts "— Presentation transcript:

1 business analytics II ▌assignment one - solutions autoparts 
Managerial Economics & Decision Sciences Department Developed for business analytics II week 1 ▌assignment one - solutions week 2 autoparts  mba demographics  week 3 © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II

2 learning objectives ► statistics ► readings ► (MSN) ► (CS)
assignment one - solutions statistical models: hypotheses, tests & confidence intervals Developed for business analytics II learning objectives ► statistics  null and alternative hypotheses  testing a hypothesis  pvalue  test significance level and test power: type I and type II errors  confidence intervals: construction and interpretation  load, modify and save data  basic statistical tools and graphics  perform tests and build confidence intervals readings ► (MSN)  Chapter 2 ► (CS)  Autoparts  MBA Demographics © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II

3 Autoparts: hypothesis, test and decision
Managerial Economics & Decision Sciences Department assignment one - solutions statistical models: hypotheses, tests & confidence intervals Developed for business analytics II autoparts ◄ mba demographics ◄ Autoparts: hypothesis, test and decision ► The claim is that the auto-parts (“Skokie Auto”) last year sales were lower than one would expect for an end-cap location due to construction of a new building, which partially obstructs the view of the auto-parts store from the street. ► Available information: last year sales for Skokie Auto and another 24 auto-parts stores in end-cap locations. ► To keep notations consistent with the notes: let X (as a random variable) be the last year sales for all Illinois auto- parts companies in end-cap locations. Let X0 be last year sales for Skokie Auto, thus X0  $1,883,000. ► Skokie Auto’s claim is thus E[X]  X0 and our analysis could be organized as below: hypothesis H0: E[X]  X0 Ha: E[X]  X0 test calculate based on the sample of 24 comparable companies decision reject the null hypothesis (and therefore Skokie Auto’s claim) if pvalue   Remark. Here X stands for the sample-based mean and sX for standard error of the sample mean. © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II assignment one | page 1

4 Autoparts: hypothesis, test and decision
Managerial Economics & Decision Sciences Department assignment one - solutions statistical models: hypotheses, tests & confidence intervals Developed for business analytics II autoparts ◄ mba demographics ◄ Autoparts: hypothesis, test and decision hypothesis H0: E[X]  X0 Ha: E[X]  X0 Remark. Skokie Auto’s claim is that its sales are lower than otherwise expected, i.e. lower than other similar companies also located in end-cap locations. test ttest sales  1883 You have to specify the variable for which you conduct the test (sales) and what is the benchmark. Here we use the sales for the sample of 24 comparable companies and Skokie Auto’s sales as benchmark. X0 Figure 1. Results of ttest command Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] sales | mean = mean(sales) t = Ho: mean = degrees of freedom = Ha: mean < Ha: mean != Ha: mean > 1883 Pr(T < t) = Pr(|T| > |t|) = Pr(T > t) = decision cannot reject the null hypothesis (and therefore Skokie Auto’s claim) since pvalue    5% © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II assignment one | page 2

5 Autoparts: hypothesis, test and decision
Managerial Economics & Decision Sciences Department assignment one - solutions statistical models: hypotheses, tests & confidence intervals Developed for business analytics II autoparts ◄ mba demographics ◄ Autoparts: hypothesis, test and decision ► Our conclusion is that we cannot reject Skokie Auto’s claim that last year sales were lower than one would expect for an end-cap location due to construction of a new building, which partially obstructs the view of the auto-parts store from the street. However, keep in mind that this conclusion is based on the available “control” sample and a simple analysis of average sales. Figure 2. Graphical results of ttest command Remark. The left tail pvalue is calculated as 1ttail(df,ttest) and in our case this is 1ttail(23,5.6168)  To reject null you would need a significance level of almost 1. © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II assignment one | page 3

6 Autoparts: confidence interval
Managerial Economics & Decision Sciences Department assignment one - solutions statistical models: hypotheses, tests & confidence intervals Developed for business analytics II autoparts ◄ mba demographics ◄ Autoparts: confidence interval ► The confidence interval is calculated for E[X] based on the sample mean X and standard error of the sample mean sX as with probability 1   ci means sales, level(90) You have to specify for which statistics you want the confidence interval, i.e. you have to include “means” if you want a confidence interval for the mean of the variable of interest. 1 Figure 3. Results of ci command Figure 4. The confidence interval Variable | Obs Mean Std. Err. [90% Conf. Interval] sales | Remark. The interval was calculated for a level   10% thus the area between 1.714 and under the t distribution between is 90%. This leaves an area of /2  5% in each of the tails. The cutoffs are calculated based on the degrees of freedom (df) and level  as invttail(df,/2) that is invttail(23,0.05)  (and the negative of this number.) 5.00% 90.00% 5.00% –1.714 –1.714 © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II assignment one | page 4

7 MBA Demographics: sample-based inference
Managerial Economics & Decision Sciences Department assignment one - solutions statistical models: hypotheses, tests & confidence intervals Developed for business analytics II autoparts ◄ mba demographics ◄ MBA Demographics: sample-based inference ► The sequence of commands that would solve the requirements is provided below. The outcome is specific to each generated sample. generate random  runiform() sort random sample 40, count drop Indx drop random ttest Age  28.8 ci means Age, level(90)  The first sequence of five commands will generate the random sample of 40 observations after shuffling the original data set of 500 observations. The two drop commands are not really necessary but they are “cleaning” the resulting sample keeping only the relevant information.  The ttest command will provide the calculated ttest and the three pvalues for the corresponding three possible null/alternative hypotheses. You should choose the pvalue that corresponds to the null/alternative pair you are actually using.  Finally, the ci command will provide the confidence interval for the desired level (here at 90%). Note that you have to specify that the confidence interval is for means (try for example variances to see the result). © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II assignment one | page 5

8 MBA Demographics: different samples
Managerial Economics & Decision Sciences Department assignment one - solutions statistical models: hypotheses, tests & confidence intervals Developed for business analytics II autoparts ◄ mba demographics ◄ MBA Demographics: different samples ► The diagrams below represents the mean and confidence intervals provided by you (on the left) and for samples generated for level 90% using the sequence provided in the previous slide (on the right). We are in the position to compare the inference based on our samples with the true mean of (for the original data set - the red line in the two diagrams.) Figure 5. Graphical results of class generated samples Figure 6. Graphical results of 800 generated samples sample average true mean upper bound lower bound Remark. Notice how some intervals do not contain the true mean! This is why we call them “confidence intervals” with a given probability! © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II assignment one | page 6

9 MBA Demographics: by categories
Managerial Economics & Decision Sciences Department assignment one - solutions statistical models: hypotheses, tests & confidence intervals Developed for business analytics II autoparts ◄ mba demographics ◄ MBA Demographics: by categories ► First let’s look at the summary statistics for the specific sample that was generated. summarize Figure 7. Results of summarize command Variable | Obs Mean Std. Dev. Min Max Age | Gender | ► In the specific sample from file SampleMBA.dta the average age is and the female proportion is 32.5%. If you would like to find the exact number of females you can use: count if Gender  1 ► You will get that there are 13 females (and similarly you can run count if Gender  0 to get that there are 27 males in the sample). Here the mean age of is for all the observations in the sample. © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II assignment one | page 7

10 MBA Demographics: by categories
Managerial Economics & Decision Sciences Department assignment one - solutions statistical models: hypotheses, tests & confidence intervals Developed for business analytics II autoparts ◄ mba demographics ◄ MBA Demographics: by categories ► We can actually summarize now the variable Age by Gender (remember that Gender  1 means female) by Gender, sort: summarize Age  With command by varname, sort STATA will split the sample according to the values of the varname that follow the command by.  As expected the number of observations for females is 13 and for males is 27.  Note that now we are provided the average age by gender: for males and for females. Figure 8. Results of by command -> Gender = 0 Variable | Obs Mean Std. Dev. Min Max Age | -> Gender = 1 Age | ► How different is the true mean age for males from than of females? Note that here the two average ages are sample-based averages! ► Say XM stands for male age and XF for female age. Then E[XM] is the true mean (at the population level) for males and E[XF] is the true mean (at the population level) for females. The null/alternative hypotheses are: © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II assignment one | page 8

11 mean age for males (Gender  0)  mean age for females (Gender 1)
Managerial Economics & Decision Sciences Department assignment one - solutions statistical models: hypotheses, tests & confidence intervals Developed for business analytics II autoparts ◄ mba demographics ◄ MBA Demographics: by categories hypothesis H0: E[XM]  E[XF] Ha: E[XM]  E[XF]  Quite conveniently STATA offers the ttest command with specification that the variable tested is done by category. STATA interprets the null as mean age for males (Gender  0)  mean age for females (Gender 1) i.e. in increasing order for the values of the category variable.  There are only 38  40  2 degrees of freedom as we are using two variables in this test (age for males and age for females) test ttest Age, by(Gender) Figure 9. Results of ttest command Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] 0 | 1 | combined | diff | diff = mean(0) - mean(1) t = Ho: diff = degrees of freedom = Ha: diff < Ha: diff != Ha: diff > 0 Pr(T < t) = Pr(|T| > |t|) = Pr(T > t) = decision cannot reject the null hypothesis (that the two means are equal) since pvalue    5% © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II assignment one | page 9

12 MBA Demographics: by categories
Managerial Economics & Decision Sciences Department assignment one - solutions statistical models: hypotheses, tests & confidence intervals Developed for business analytics II autoparts ◄ mba demographics ◄ MBA Demographics: by categories ► At the population level: by Gender, sort: summarize Age Figure 10. Results of by command -> Gender = 0 Variable | Obs Mean Std. Dev Min Max Age | -> Gender = 1 Age | ► At the population level the proportion of females is 32.2% (run summarize) and the two means look fairly close to each other (from the table above) © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II assignment one | page 10

13 Appendix: Two-Sample t-test
Managerial Economics & Decision Sciences Department assignment one - solutions statistical models: hypotheses, tests & confidence intervals Developed for business analytics II autoparts ◄ mba demographics ◄ Appendix: Two-Sample t-test ► There are two ways to run a t-test for means of two samples depending on your underlying assumption on standard deviation for the two populations from where the samples come. ttest Age, by(Gender) ttest Age, by(Gender) unequal  The first command will assume that the standard deviation is the same for both populations. At the samples level, the estimated standard deviation is calculated for the pooled sample (all observations).  The second command will assume that the standard deviations are different for the populations. At the samples level, the estimated standard deviations are calculated separately for each sample. ► The ttest is calculated as but what differs are the standard errors and degrees of freedom:  equal std.dev. assumption:  unequal std.dev. assumption: ► In the formulas above the subscripts 1 and 2 refer to samples 1 and 2 respectively, while p to the pooled sample. © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II assignment one | page 11

14 Appendix: Two-Sample t-test
Managerial Economics & Decision Sciences Department assignment one - solutions statistical models: hypotheses, tests & confidence intervals Developed for business analytics II autoparts ◄ mba demographics ◄ Appendix: Two-Sample t-test Figure 11. Results of ttest command: equal variances Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] 0 | 1 | combined | diff | diff = mean(0) - mean(1) t = Ho: diff = degrees of freedom = Ha: diff < Ha: diff != Ha: diff > 0 Pr(T < t) = Pr(|T| > |t|) = Pr(T > t) = Figure 12. Results of ttest command: unequal variances Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] 0 | 1 | combined | diff | diff = mean(0) - mean(1) t = Ho: diff = Satterthwaite's degrees of freedom = Ha: diff < Ha: diff != Ha: diff > 0 Pr(T < t) = Pr(|T| > |t|) = Pr(T > t) = © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II assignment one | page 12


Download ppt "business analytics II ▌assignment one - solutions autoparts "

Similar presentations


Ads by Google