Chapter 3 – Descriptive Statistics

Slides:



Advertisements
Similar presentations
1 1 Slide © 2003 South-Western/Thomson Learning TM Slides Prepared by JOHN S. LOUCKS St. Edwards University.
Advertisements

Chapter 3 - Part A Descriptive Statistics: Numerical Methods
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
1 1 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
1 1 Slide © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Descriptive Statistics: Numerical Measures
1 1 Slide © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Calculating & Reporting Healthcare Statistics
Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson2-1 Lesson 2: Descriptive Statistics.
Chap 3-1 EF 507 QUANTITATIVE METHODS FOR ECONOMICS AND FINANCE FALL 2008 Chapter 3 Describing Data: Numerical.
B a c kn e x t h o m e Parameters and Statistics statistic A statistic is a descriptive measure computed from a sample of data. parameter A parameter is.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-1 Statistics for Business and Economics 7 th Edition Chapter 2 Describing Data:
1 1 Slide © 2003 South-Western/Thomson Learning TM Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Slides by JOHN LOUCKS St. Edward’s University.
Basic Business Statistics 10th Edition
Chapter 3, Part 1 Descriptive Statistics II: Numerical Methods
Chap 3-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 3 Describing Data: Numerical Statistics for Business and Economics.
1 1 Slide © 2003 South-Western/Thomson Learning TM Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Describing Data: Numerical
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 4.1 Chapter Four Numerical Descriptive Techniques.
Chapter 3 - Part B Descriptive Statistics: Numerical Methods
1 1 Slide © 2001 South-Western /Thomson Learning  Anderson  Sweeney  Williams Anderson  Sweeney  Williams  Slides Prepared by JOHN LOUCKS  CONTEMPORARYBUSINESSSTATISTICS.
1 1 Slide © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Numerical Descriptive Techniques
1 1 Slide © 2009 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS St. Edward’s University.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 3 Descriptive Statistics: Numerical Methods.
Business Statistics: Communicating with Numbers
1 1 Slide © 2003 Thomson/South-Western. 2 2 Slide © 2003 Thomson/South-Western Chapter 3 Descriptive Statistics: Numerical Methods Part A n Measures of.
1 1 Slide Descriptive Statistics: Numerical Measures Location and Variability Chapter 3 BA 201.
Chapter 3 Descriptive Statistics: Numerical Methods Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Descriptive Statistics: Numerical Methods
STAT 280: Elementary Applied Statistics Describing Data Using Numerical Measures.
1 1 Slide STATISTICS FOR BUSINESS AND ECONOMICS Seventh Edition AndersonSweeneyWilliams Slides Prepared by John Loucks © 1999 ITP/South-Western College.
1 1 Slide Slides Prepared by JOHN S. LOUCKS St. Edward’s University © 2002 South-Western/Thomson Learning.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved.
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 3 Descriptive Statistics: Numerical Methods.
Variation This presentation should be read by students at home to be able to solve problems.
1 1 Slide IS 310 – Business Statistics IS 310 Business Statistics CSU Long Beach.
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 3-1 Chapter 3 Numerical Descriptive Measures Business Statistics, A First Course.
Econ 3790: Business and Economics Statistics Instructor: Yogesh Uppal
1 1 Slide IS 310 – Business Statistics IS 310 Business Statistics CSU Long Beach.
1 1 Slide © 2006 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Chapter 3, Part A Descriptive Statistics: Numerical Measures n Measures of Location n Measures of Variability.
Chapter 3, Part B Descriptive Statistics: Numerical Measures n Measures of Distribution Shape, Relative Location, and Detecting Outliers n Exploratory.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS St. Edward’s University.
Data Summary Using Descriptive Measures Sections 3.1 – 3.6, 3.8
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-1 Business Statistics, 4e by Ken Black Chapter 3 Descriptive Statistics.
Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
1 1 Slide Slides Prepared by JOHN S. LOUCKS St. Edward’s University © 2002 South-Western /Thomson Learning.
1 1 Slide © 2003 South-Western/Thomson Learning TM Chapter 3 Descriptive Statistics: Numerical Methods n Measures of Variability n Measures of Relative.
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
Statistics -Descriptive statistics 2013/09/30. Descriptive statistics Numerical measures of location, dispersion, shape, and association are also used.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS St. Edward’s University.
Chapter 2 Describing Data: Numerical
St. Edward’s University
Business and Economics 6th Edition
Chapter 3 Descriptive Statistics: Numerical Measures Part A
St. Edward’s University
St. Edward’s University
Descriptive Statistics
Chapter 3 Describing Data Using Numerical Measures
Numerical Descriptive Measures
St. Edward’s University
Essentials of Statistics for Business and Economics (8e)
St. Edward’s University
Business and Economics 7th Edition
Econ 3790: Business and Economics Statistics
Presentation transcript:

Chapter 3 – Descriptive Statistics Numerical Measures

Chapter Outline Measures of Central Location Measures of Variability Mean Median Mode Percentile (Quartile, Quintile, etc.) Measures of Variability Range Variance (Standard Deviation, Coefficient of Variation)

A Recall A sample is a subset of a population. Numerical measures calculated for sample data are called sample statistics. Numerical measures calculated for population data are called population parameters. A sample statistic is referred to as the point estimator of the corresponding population parameter.

Mean As a measure of central location, mean is simply the arithmetic average of all the data values. The sample mean is the point estimator of the population mean .

Sample Mean The symbol  (called sigma) means ‘sum up’. is the value of th observation in the sample. n is the number of observations in the sample.

Population Mean The symbol  (called sigma) means ‘sum up’. is the value of th observation in the sample. N is the number of observations in the population. is pronounced as ‘miu’.

Sample Mean Example: Sales of Starbucks Stores 50 Starbucks stores are randomly chosen in the NYC. The table below shows the sales of those stores in December 2012.

Sample Mean Example: Sales of Starbucks Stores

Median The median of a data set is the value in the middle when the data items are arranged in ascending order. Whenever a data set has extreme values, the median is the preferred measure of central location. The median is the measure of location most often reported for annual income and property value data. A few extremely large incomes or property values can inflate the mean since the calculation of mean uses all the data items.

Median For an odd number of observations: 26 18 27 12 14 27 19 in ascending order the median is the middle value. Median = 19

Median For an even number of observations: 26 18 27 12 14 27 19 30 in ascending order the median is the average of the middle two values. Median = (19 + 26)/2 = 22.5

Mean vs. Median As noted, extremes values can change means remarkably, while medians might not be affected much by extreme values. Therefore, in that regard, median is a better representative of central location. 30 12 14 18 19 26 27 27 30 280 For the previous example, the median is 22.5 and the mean is 21.6. If we add one large number (280) to the data, the median becomes 26 (the value in the middle). But the mean becomes 50.3. In this case we prefer median to mean as a measure of central location.

Mode The mode of a data set is the value that occurs most frequently. The greatest frequency can occur at two or more different values. If the data have exactly two modes, the data are bimodal. If the data have more than two modes, the data are multimodal. Caution: If the data are bimodal or multimodal, Excel’s MODE function will incorrectly identify a single mode.

Mode 12 14 18 19 26 27 27 30 For the example above, 27 shows up twice while all the other data values show up once. So, the mode is 27.

Percentiles A percentile provides information about how the data are spread over the interval from the smallest value to the largest value. Admission test scores for colleges and universities are frequently reported in terms of percentiles. The pth percentile of a data set is a value such that at least p percent of the items are less than or equal to this value and at least (100 - p) percent of the items are more than or equal to this value. The 50th percentile is simply the median.

Percentiles Arrange the data in ascending order. Compute index i, the position of the pth percentile. i = (p/100)n If i is not an integer, round up. The p th percentile is the value in the i th position. If i is an integer, the p th percentile is the average of the values in positions i and i +1.

So, averaging the 6th and 7th data values: Percentiles Find the 75th percentile of the following data 12 14 18 19 26 27 29 30 Note: The data is already in ascending order. i = (p/100)n = (75/100)8 = 6 So, averaging the 6th and 7th data values: 75th percentile = (27 + 29)/2 = 28

Percentiles Find the 20th percentile of the following data 12 14 18 19 26 27 29 30 Note: The data is already in ascending order. i = (p/100)n = (20/100)8 = 1.6, which is rounded up to 2. So, the 20th percentile is simply the 2nd data value, i.e. 14.

Quartiles Quartiles are specific percentiles. First Quartile = 25th percentile Second Quartile = 50th percentile = Median Third Quartile = 75th percentile

Measures of Variability It is often desirable to consider measures of variability (dispersion), as well as measures of central location. For example, when two stocks provide the same average return of 5% a year, but stock A’s return is very stable – close to 5% and stock B’s return is volatile ( it could be as low as –10%), are you indifferent with regard to which stock to invest in? For another example, in choosing supplier A or supplier B we might consider not only the average delivery time for each, but also the variability in delivery time for each.

Measures of Variability Range Interquartile Range Variance/Standard Deviation Coefficient of Variation

Range The range of a data set is the difference between the largest and smallest data values. It is the simplest measure of variability. It is very sensitive to the smallest and largest data values.

Range Example: 12 14 18 19 26 27 29 30 Range = largest value - smallest value = 30 – 12 = 8

Interquartile Range The interquartile range of a data set is the difference between the 3rd quartile and the 1st quartile. It is the range of the middle 50% of the data. It overcomes the sensitivity to extreme data values.

Interquartile Range Example: 12 14 18 19 26 27 29 30 3rd Quartile (Q3) = 75th percentile = 28 1st Quartile (Q1) = 25th percentile = 16 Interquartile Range = Q3 – Q1 = 28 – 16 = 12

Variance The variance is a measure of variability that utilizes all the data. It is based on the difference between the value of each observation (xi) and the mean ( for a sample, for a population) The variance is useful in comparing the variability of two or more variables.

Variance The variance is the average of the squared differences between each data value and the mean. The variance is calculated as follows: for a sample for a population

Standard Deviation The standard deviation of a data set is the positive square root of the variance. It is measured in the same units as the data, making it more appropriately interpreted than the variance.

Standard Deviation The standard deviation is computed as follows: for a sample for a population

Variance and Standard Deviation Example 12 14 18 19 26 27 29 30 Variance Standard Deviation

Coefficient of Variation The coefficient of variation indicates how large the standard deviation is in relation to the mean. In a comparison between two data sets with different units or with the same units but a significant difference in magnitude, coefficient of variation should be used instead of variance.

Coefficient of Variation The coefficient of variation is computed as follows: for a sample for a population

Coefficient of Variation Example 12 14 18 19 26 27 29 30

Coefficient of Variation Example: Height vs. Weight In a class of 30 students, the average height is 5’5’’ with a standard deviation of 3’’ and the average weight is 120 lbs with a standard deviation of 20 lbs. Question, in which measure (height or weight) are students more different? Since height and weight don’t have the same unit, we have to use coefficient of variation to remove the units before comparing the variations in height and weight. As shown below, students’ weight is more variant than their height.

Measures of Distribution Shape, Relative Location, and Detecting Outliers z-Scores Chebyshev’s Theorem Empirical Rule Detecting Outliers

Distribution Shape: Skewness An important measure of the shape of a distribution is called skewness. The formula for the skewness of sample data is Skewness can be easily computed using statistical software.

Distribution Shape: Skewness Symmetric (not skewed) Skewness is zero. Mean and median are equal. Skewness = 0 .05 .10 .15 .20 .25 .30 .35 Relative Frequency

Distribution Shape: Skewness Skewed to the left Skewness is negative. Mean is usually less than the median. Skewness = - .33 .05 .10 .15 .20 .25 .30 .35 Relative Frequency

Distribution Shape: Skewness Skewed to the right Skewness is positive. Mean is usually more than the median. Skewness = .31 .05 .10 .15 .20 .25 .30 .35 Relative Frequency

Z-Scores The z-score is often called the standardized value. It denotes the number of standard deviations a data value xi is from the mean. Excel’s STANDARDIZE function can be used to compute the z-score.

A data value less than the sample mean has a negative z-score. Z-Scores An observation’s z-score is a measure of the relative location of the observation in a data set. A data value less than the sample mean has a negative z-score. A data value greater than the sample mean has a positive z-score. A data value equal to the sample mean has a z-score of zero.

Z-Scores Example 12 14 18 19 26 27 29 30    

Chebyshev’s Theorem At least (1 - 1/z2) of the items in any data set will be within z standard deviations of the mean, I.e. between ( ) and ( ), where z is any value greater than 1.     Chebyshev’s theorem requires z > 1, but z need not be an integer.

Chebyshev’s Theorem At least 55.6% of the data values must be within z = 1.5 standard deviations of the mean. At least 89% of the data values must be within z = 3 standard deviations of the mean. At least 94% of the data values must be within z = 4 standard deviations of the mean.    

Chebyshev’s Theorem Example: Given that = 10 and s = 2, at least what percentage of all the data values falls into 2 standard deviations of the mean? At least (1-1/22) = 1-1/4 = 75% of all the data values must be between 6 and 14. = 10-2(2) = 6 = 10+2(2) = 14    

Empirical Rule When the data are believed to approximate a bell-shaped distribution, the empirical rule can be used to determine the percentage of data values that must be within a specified number of standard deviations of the mean. The empirical rule is based on the normal distribution, which is covered in Chapter 6.

Empirical Rule For data having a bell-shaped distribution: About of values of a normal random variable are between  -  and  + . 68% Expected number of correct answers About of values of a normal random variable are between  - 2 and  + 2. 95% About of values of a normal random variable are between  - 3 and  + 3. 99%

Empirical Rule Expected number of correct answers x About 99% m – 3s m – 1s m + 1s m + 3s m – 2s m + 2s

Detecting Outliers An outlier is an unusually small or unusually large value in a data set. A data value with a z-score less than –3 or greater than +3 might be considered an outlier. It might be: An incorrectly recorded data value A data value that was incorrectly included in the data set. A correctly recorded data value that belongs in the data set.

Measures of Association Between Two Variables So far, we have examined numerical methods used to summarize the data for one variable at a time. Often a manager or decision maker is interested in the relationship between two variables. Two numerical measures of the relationship between two variables are covariance and correlation coefficient.

Covariance The covariance is a measure of the linear association between two variables. Positive values indicate a positive relationship. Negative values indicate a negative relationship.

Covariance The covariance is computed as follows: for samples for populations

Correlation Coefficient Correlation is a measure of linear association and not necessarily causation. Just because two variables are highly correlated, it does not mean that one variable is the cause of the other.

Correlation Coefficient The correlation coefficient is computed as follows: for samples for populations

Correlation Coefficient The correlation can take on values between –1 and +1. Values near –1 indicate a strong negative linear relationship. Values near +1 indicate a strong positive linear relationship. The closer the correlation is to zero, the weaker the relationship.

Covariance and Correlation Coefficient Example: Stock Returns The table below presents the monthly returns (in percentage) of the market index S&P 500 (SPY) and the Apple stock (AAPL) from December 2012 to May 2013.

Covariance and Correlation Coefficient Example: Stock Returns

Covariance and Correlation Coefficient Example: Stock Returns Sample Covariance Sample Correlation Coefficient