Presentation is loading. Please wait.

Presentation is loading. Please wait.

MBA7025_04.ppt/Jan 27, 2015/Page 1 Georgia State University - Confidential MBA 7025 Statistical Business Analysis Descriptive Statistics Jan 27, 2015.

Similar presentations


Presentation on theme: "MBA7025_04.ppt/Jan 27, 2015/Page 1 Georgia State University - Confidential MBA 7025 Statistical Business Analysis Descriptive Statistics Jan 27, 2015."— Presentation transcript:

1 MBA7025_04.ppt/Jan 27, 2015/Page 1 Georgia State University - Confidential MBA 7025 Statistical Business Analysis Descriptive Statistics Jan 27, 2015

2 MBA7025_04.ppt/Jan 27, 2015/Page 2 Georgia State University - Confidential Agenda Central Limit Theorem Descriptive Summary Measures 1. Measures of Central Location Mean, Median, Mode 2. Measures of Variation The Range, Percentile, Variance and Standard Deviation 3. Measures of Association Coefficient of Variation Confidence Interval

3 MBA7025_04.ppt/Jan 27, 2015/Page 3 Georgia State University - Confidential 1. It is the Arithmetic Average of data values: 2. The Most Common Measure of Central Tendency 3. Affected by Extreme Values (Outliers) 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14 Mean = 5Mean = 6 Sample Mean Mean

4 MBA7025_04.ppt/Jan 27, 2015/Page 4 Georgia State University - Confidential 0 1 2 3 4 5 6 7 8 9 100 1 2 3 4 5 6 7 8 9 10 12 14 Median = 5 1.Important Measure of Central Tendency 2.In an ordered array, the median is the “middle” number. If n is odd, the median is the middle number. If n is even, the median is the average of the 2 middle numbers. 3.Not Affected by Extreme Values Median

5 MBA7025_04.ppt/Jan 27, 2015/Page 5 Georgia State University - Confidential 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Mode = 9 1.A Measure of Central Tendency 2.Value that Occurs Most Often 3.Not Affected by Extreme Values 4.There May Not be a Mode 5.There May be Several Modes 6.Used for Either Numerical or Categorical Data 0 1 2 3 4 5 6 No Mode Mode

6 MBA7025_04.ppt/Jan 27, 2015/Page 6 Georgia State University - Confidential Describes How Data Are Distributed Measures of Shape: Symmetric or skewed Right-Skewed Left-Skewed Symmetric Mean =Median =Mode Mean Median Mode Median Mean Mode Shape

7 MBA7025_04.ppt/Jan 27, 2015/Page 7 Georgia State University - Confidential Agenda Central Limit Theorem Descriptive Summary Measures 1. Measures of Central Location Mean, Median, Mode 2. Measures of Variation The Range, Percentile, Variance and Standard Deviation 3. Measures of Association Coefficient of Variation Confidence Interval

8 MBA7025_04.ppt/Jan 27, 2015/Page 8 Georgia State University - Confidential Measure of Variation Difference Between Largest & Smallest Observations: Range = Ignores How Data Are Distributed: The Range 7 8 9 10 11 12 Range = 12 - 7 = 5 7 8 9 10 11 12 Range = 12 - 7 = 5

9 MBA7025_04.ppt/Jan 27, 2015/Page 9 Georgia State University - Confidential Percentile 1.Arrange data in ascending order. 2.The middle number is the median. 3.The number halfway to the median is the first quartile. 4.The number halfway past the median is the 3 rd quartile. 5.A number with (no more than) 66% of the values less than it is the 66 th percentile, and so forth.

10 MBA7025_04.ppt/Jan 27, 2015/Page 10 Georgia State University - Confidential Percentile Obs Medals Obs Medals Obs Medals Obs Medals Obs Medals 111012242310346453 21001319249356463 3721418258366472 4 1518268375482 54616 277385492 6411715287395502 7401814297404512 8311913306414521 9282011316424531 10272110326434541 11252210336443551 2008 Olympic Medal Tally for top 55 nations. What is the percentile score for a country with 9 medals? What is the 50 th percentile?

11 MBA7025_04.ppt/Jan 27, 2015/Page 11 Georgia State University - Confidential Percentile Solutions Order all data (ascending or descending). 1. Country with 9 medals ranks 24 th out of 55. There are 31 nations (56.36%) below it and 23 nations (41.82%) above it. Hence it can be considered a 57 th or 58 th percentile score. 2. The medal tally that corresponds to a 50 th percentile is the one in the middle of the group, or the 28 th country, with 7 medals. Hence the 50 th percentile (Median) is 7.

12 MBA7025_04.ppt/Jan 27, 2015/Page 12 Georgia State University - Confidential Box Plot Median Q1Q3 SmallestLargest

13 MBA7025_04.ppt/Jan 27, 2015/Page 13 Georgia State University - Confidential Important Measure of Variation Shows Variation About the Mean: For the Population: For the Sample: Variance For the Population: use N in the denominator. For the Sample : use n - 1 in the denominator. or

14 MBA7025_04.ppt/Jan 27, 2015/Page 14 Georgia State University - Confidential Most Important Measure of Variation Shows Variation About the Mean: For the Population: For the Sample: Standard Deviation For the Population: use N in the denominator. For the Sample : use n - 1 in the denominator. or

15 MBA7025_04.ppt/Jan 27, 2015/Page 15 Georgia State University - Confidential Computing Standard Deviation Computing Sample Variance and Standard Deviation Mean of X = 6 Deviation XFrom MeanSquared 3-39 4-24 600 824 939 26Sum of Squares 6.50Variance = SS/n-1 2.55Stdev = Sqrt(Variance)

16 MBA7025_04.ppt/Jan 27, 2015/Page 16 Georgia State University - Confidential The Normal Distribution A property of normally distributed data is as follows: Distance from Mean Percent of observations included in that range ± 1 standard deviation Approximately 68% ± 2 standard deviations Approximately 95% ± 3 standard deviations Approximately 99.74%

17 MBA7025_04.ppt/Jan 27, 2015/Page 17 Georgia State University - Confidential Comparing Standard Deviations Mean = 15.5 s = 3.338 11 12 13 14 15 16 17 18 19 20 21 Data B Data A Mean = 15.5 s =.9258 11 12 13 14 15 16 17 18 19 20 21 Mean = 15.5 s = 4.57 Data C

18 MBA7025_04.ppt/Jan 27, 2015/Page 18 Georgia State University - Confidential Outliers Typically, a number beyond a certain number of standard deviations is considered an outlier. In many cases, a number beyond 3 standard deviations (about 0.25% chance of occurring) is considered an outlier. If identifying an outlier is more critical, one can make the rule more stringent, and consider 2 standard deviations as the limit.

19 MBA7025_04.ppt/Jan 27, 2015/Page 19 Georgia State University - Confidential Agenda Central Limit Theorem Descriptive Summary Measures 1. Measures of Central Location Mean, Median, Mode 2. Measures of Variation The Range, Percentile, Variance and Standard Deviation 3. Measures of Association Coefficient of Variation Confidence Interval

20 MBA7025_04.ppt/Jan 27, 2015/Page 20 Georgia State University - Confidential Measure of Relative Variation Always a % Shows Variation Relative to Mean Used to Compare 2 or More Groups Formula (for Sample): Coefficient of Variation

21 MBA7025_04.ppt/Jan 27, 2015/Page 21 Georgia State University - Confidential Stock A: Average Price last year = $50 Standard Deviation = $5 Stock B: Average Price last year = $100 Standard Deviation = $5 Coefficient of Variation: Stock A: CV = 10% Stock B: CV = 5% Computing Coefficient of Variation

22 MBA7025_04.ppt/Jan 27, 2015/Page 22 Georgia State University - Confidential Agenda Central Limit Theorem Descriptive Summary Measures Confidence Interval

23 MBA7025_04.ppt/Jan 27, 2015/Page 23 Georgia State University - Confidential Central Limit Theorem Regardless of the population distribution, the distribution of the sample means is approximately normal for sufficiently large sample sizes (n>=30), with For a Sample Sizes of 30 or More, Distribution of the Sample Mean Will Be Normal, with –mean of sample means = population mean, and –standard error = [population deviation] / [sqrt(n)] and

24 MBA7025_04.ppt/Jan 27, 2015/Page 24 Georgia State University - Confidential Level of Significance & Level of Confidence Level of Significance – α (alpha), equals the maximum allowed percent of error. If the maximum allowed error is 5%, then α = 0.05. Level of Confidence is the desired degree of certainty. A 95% Confidence Level is the most common. A 95% Confidence Level would correspond to a 95% Confidence Interval of the Mean. This would state that the actual population mean has a 95% probability of lying within the calculated interval. A 95% Confidence Level corresponds to a 5% level of significance, or α = 0.05. The Confidence Level therefore equals 1- α.

25 MBA7025_04.ppt/Jan 27, 2015/Page 25 Georgia State University - Confidential Why Does Central Limit Theorem Work? As Sample Size Increases: 1.Most Sample Means will be Close to Population Mean, 2.Some Sample Means will be Either Relatively Far Above or Below Population Mean. 3.A Few Sample Means will be Either Very Far Above or Below Population Mean.

26 MBA7025_04.ppt/Jan 27, 2015/Page 26 Georgia State University - Confidential Agenda Confidence Interval Descriptive Summary Measures Central Limit Theorem

27 MBA7025_04.ppt/Jan 27, 2015/Page 27 Georgia State University - Confidential Confidence Intervals The population mean is within 2 Standard Errors (SE) of the sample mean, 95% of the time. Thus, is in the range defined by: 2*SE, about 95% of the time. (2 *SE) is also called the Margin of Error (MOE). 95% is called the confidence level. Sample Mean + Margin of Error (MOE) Called a Confidence Interval

28 MBA7025_04.ppt/Jan 27, 2015/Page 28 Georgia State University - Confidential The Standard Normal Distribution 68% 95% 99.7%

29 MBA7025_04.ppt/Jan 27, 2015/Page 29 Georgia State University - Confidential Confidence Interval for Mean In general, the confidence interval for is given by z. is the sample mean z is the confidence factor. It is the number of standard errors one has to go from the mean in order to include a certain percent of observations. For 95% confidence the value is 1.96 (approximately 2.00). is the standard error of the sample means. In Excel, compute z with 95% confidence level (i.e. level of significance = 0.05) z score = normsinv(1-0.05/2) = 1.96

30 MBA7025_04.ppt/Jan 27, 2015/Page 30 Georgia State University - Confidential Confidence Interval for Mean Since is generally not known we substitute the sample standard deviation, ‘s’. This changes the distribution of the sample means from z (standard normal) to a t-distribution, a close relative. t. The t value is slightly larger than the z for a given confidence level, thereby increasing the margin of error. That is the price of using s in place of

31 MBA7025_04.ppt/Jan 27, 2015/Page 31 Georgia State University - Confidential Confidence Interval for Mean (Example 1) Gas Price A sample of 49 gas stations nationwide shows average price of unleaded is $ 3.87 and a standard deviation of $ 0.15. Estimate the mean price of gas nationwide with 95% confidence. In Excel, compute t with 5% error and (n-1), or 48 degrees of freedom =tinv(0.05,48) = 2.010635, rounded to 2.01. 95% CI for the Mean is: t =3.87 ± [2.01 * (0.15/√49)] = $ 3.87 ± 0.043 Thus, $3.827 < < $3.913 Interpret the result!

32 MBA7025_04.ppt/Jan 27, 2015/Page 32 Georgia State University - Confidential Confidence Interval for Mean (Example 2) Federal Aid Problem Suppose a census tract with 5000 families is eligible for aid under program HR- 247 if average income of families of 4 is between $7500 and $8500 (those lower than 7500 are eligible in a different program). A random sample of 12 families yields data below. 7,300 7,700 8,100 8,400 7,800 8,300 8,500 7,600 7,400 7,800 8,300 8,600 Representative Sample

33 MBA7025_04.ppt/Jan 27, 2015/Page 33 Georgia State University - Confidential Confidence Interval for Mean (Example 2) Federal Aid Problem 7,300 7,700 8,100 8,400 7,800 8,300 8,500 7,600 7,400 7,800 8,300 8,600 Representative Sample In Excel, compute t with 5% error and (n-1), or 11 degrees of freedom =tinv(0.05,11) = 2.201.

34 MBA7025_04.ppt/Jan 27, 2015/Page 34 Georgia State University - Confidential Confidence Interval for Mean (Example 2) Federal Aid Problem In Excel, compute t with 5% error and (n-1), or 11 degrees of freedom =tinv(0.05,11) = 2.201. 95% CI for the Mean is: t =7,983 ± MOE =7,983 ± [2.201 * (441/√12)] = 7,983 ± 280 Thus, $7,703 < < $8,263 Interpretation of Confidence Interval PopulationNot Sample) 95% Confident that Interval $7,983 + $280 Contains Unknown Population (Not Sample) Mean Income. If We Selected 1,000 Samples of Size 12 and Constructed 1,000 Confidence Intervals, about 950 Would Contain Unknown Population Mean and 50 Would Not.

35 MBA7025_04.ppt/Jan 27, 2015/Page 35 Georgia State University - Confidential Confidence Interval for Proportions For proportions, p = population proportion = sample proportion Confidence Interval for p is given by ± z.

36 MBA7025_04.ppt/Jan 27, 2015/Page 36 Georgia State University - Confidential Confidence Interval for Proportions (Example 1) Presidential Election The Wall Street Journal for Sept 10, 2008 reports that a poll of 860 people shows a 46% support for Sen. Obama as President. Find the 95% CI for the proportion of the population that supports him. In Excel, compute z with 95% confidence level (i.e. level of significance = 0.05) z score = normsinv(1-0.05/2) = 1.960 95% CI for the Proportions is: = 0.46 ± 0.033 Thus,.427 < p <.493

37 MBA7025_04.ppt/Jan 27, 2015/Page 37 Georgia State University - Confidential Confidence Interval for Proportions (Example 2) Japan Business Survey N =200 Californians Yes = 116 No = 84 Is Japan the Foremost Economic Power Today?

38 MBA7025_04.ppt/Jan 27, 2015/Page 38 Georgia State University - Confidential Confidence Interval for Proportions (Example 2) Japan Business Survey In Excel, compute z with 95% confidence level (i.e. level of significance = 0.05) z score = normsinv(1-0.05/2) = 1.960 95% CI for the Proportions is: = 0.58 ± MOE = 0.58 ± 0.068 Thus,.512 < p <.648 In Excel, compute z with 90% confidence level (i.e. level of significance = 0.10) z score = normsinv(1-0.10/2) = 1.645 90% CI for the Proportions is: = 0.58 ± MOE = 0.58 ± 0.057 Thus,.523 < p <.637

39 MBA7025_04.ppt/Jan 27, 2015/Page 39 Georgia State University - Confidential Sample Means versus Sample Proportion Income/Loss Time to Complete Loan Papers Number of Fat Calories in Burger Breaking Strength of Cellular Phone Housing Americans Who Believe that Japan is #1 Economic Power Circuit Boards with One or More Failed Solder Connections African-Americans Who Pass CPA MeanProportion of Means and Proportions Not the Same!!!!

40 MBA7025_04.ppt/Jan 27, 2015/Page 40 Georgia State University - Confidential Similarities and Differences Between Sample Means and Proportions Sample Means Measured tComputed from Data that Are Measured. tEstimate Population Means. Sample Proportions Counted tComputed from Data that Are Counted. tEstimate Population Proportions.


Download ppt "MBA7025_04.ppt/Jan 27, 2015/Page 1 Georgia State University - Confidential MBA 7025 Statistical Business Analysis Descriptive Statistics Jan 27, 2015."

Similar presentations


Ads by Google