Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 1 Slide © 2007 Thomson South-Western. All Rights Reserved.

Similar presentations


Presentation on theme: "1 1 Slide © 2007 Thomson South-Western. All Rights Reserved."— Presentation transcript:

1 1 1 Slide © 2007 Thomson South-Western. All Rights Reserved

2 2 2 Slide © 2007 Thomson South-Western. All Rights Reserved Chapter 3 Descriptive Statistics: Numerical Measures n Measures of Location n Measures of Variability n Measures of Distribution Shape, Relative Location, and Detecting Outliers n Measures of Association Between Two Variables n Weighted Mean

3 3 3 Slide © 2007 Thomson South-Western. All Rights Reserved Measures of Location If the measures are computed for data from a sample, for data from a sample, they are called sample statistics. If the measures are computed for data from a population, for data from a population, they are called population parameters. A sample statistic is referred to as the point estimator of the corresponding population parameter. n Mean n Median n Mode n Percentiles n Quartiles

4 4 4 Slide © 2007 Thomson South-Western. All Rights Reserved Mean n The mean of a data set is the average of all the data values. The sample mean is the point estimator of the population mean . The sample mean is the point estimator of the population mean .

5 5 5 Slide © 2007 Thomson South-Western. All Rights Reserved Sample Mean Number of observations in the sample Number of observations in the sample Sum of the values of the n observations Sum of the values of the n observations

6 6 6 Slide © 2007 Thomson South-Western. All Rights Reserved Population Mean  Number of observations in the population Number of observations in the population Sum of the values of the N observations Sum of the values of the N observations

7 7 7 Slide © 2007 Thomson South-Western. All Rights Reserved Median Whenever a data set has extreme values, the median Whenever a data set has extreme values, the median is the preferred measure of central location. is the preferred measure of central location. A few extremely large incomes or property values A few extremely large incomes or property values can inflate the mean. can inflate the mean. The median is the measure of location most often The median is the measure of location most often reported for annual income and property value data. reported for annual income and property value data. The median of a data set is the value in the middle The median of a data set is the value in the middle when the data items are arranged in ascending order. when the data items are arranged in ascending order.

8 8 8 Slide © 2007 Thomson South-Western. All Rights Reserved Median 122226272724 28 For an odd number of observations: For an odd number of observations: in ascending order 26282722242712 7 observations the median is the middle value. Median = 26

9 9 9 Slide © 2007 Thomson South-Western. All Rights Reserved 28 Median For an even number of observations: For an even number of observations: in ascending order 27 8 observations the median is the average of the middle two values. Median = (26 + 27)/2 = 26.5 30 122226272724 26282722243012

10 10 Slide © 2007 Thomson South-Western. All Rights Reserved Mean VS Median n The mean IS affected by outliers (extreme observations) n The median IS NOT affected by outliers

11 11 Slide © 2007 Thomson South-Western. All Rights Reserved Mode The mode of a data set is the value that occurs with The mode of a data set is the value that occurs with greatest frequency. greatest frequency. The greatest frequency can occur at two or more The greatest frequency can occur at two or more different values. different values. If the data have exactly two modes, the data are If the data have exactly two modes, the data are bimodal. bimodal. If the data have more than two modes, the data are If the data have more than two modes, the data are multimodal. multimodal.

12 12 Slide © 2007 Thomson South-Western. All Rights Reserved Percentiles A percentile provides information about how the A percentile provides information about how the data are spread over the interval from the smallest data are spread over the interval from the smallest value to the largest value. value to the largest value. Admission test scores for colleges and universities Admission test scores for colleges and universities are frequently reported in terms of percentiles. are frequently reported in terms of percentiles.

13 13 Slide © 2007 Thomson South-Western. All Rights Reserved n The p th percentile of a data set is a value such that at least p percent of the items take on this value or less and at least (100 - p ) percent of the items take on this value or more. Percentiles

14 14 Slide © 2007 Thomson South-Western. All Rights Reserved Percentiles Arrange the data in ascending order. Arrange the data in ascending order. Compute index i, the position of the p th percentile. Compute index i, the position of the p th percentile. i = ( p /100) n If i is not an integer, round up. The p th percentile If i is not an integer, round up. The p th percentile is the value in the i th position. is the value in the i th position. If i is not an integer, round up. The p th percentile If i is not an integer, round up. The p th percentile is the value in the i th position. is the value in the i th position. If i is an integer, the p th percentile is the average If i is an integer, the p th percentile is the average of the values in positions i and i +1. of the values in positions i and i +1. If i is an integer, the p th percentile is the average If i is an integer, the p th percentile is the average of the values in positions i and i +1. of the values in positions i and i +1.

15 15 Slide © 2007 Thomson South-Western. All Rights Reserved Note on Excel’s Percentile Function The formula that Excel uses is different from the one used in the textbook! The formula that Excel uses is different from the one used in the textbook! In order to find the observation where the median occurs, Excel uses the following formula: L p = ( p /100) n + (1 – p /100) Once the observation is identified Excel will: 1.If L p is a whole number (e.g. 12), Excel’s result will be the same as the textbook’s. 2.If Lp is not a whole number (e.g. 12.3) Excel’s result will be different from the textbook’s.

16 16 Slide © 2007 Thomson South-Western. All Rights Reserved Quartiles Quartiles are specific percentiles. Quartiles are specific percentiles. First Quartile = 25th Percentile First Quartile = 25th Percentile Second Quartile = 50th Percentile = Median Second Quartile = 50th Percentile = Median Third Quartile = 75th Percentile Third Quartile = 75th Percentile

17 17 Slide © 2007 Thomson South-Western. All Rights Reserved Measures of Variability It is often desirable to consider measures of variability It is often desirable to consider measures of variability (dispersion), as well as measures of location. (dispersion), as well as measures of location. For example, in choosing supplier A or supplier B we For example, in choosing supplier A or supplier B we might consider not only the average delivery time for might consider not only the average delivery time for each, but also the variability in delivery time for each. each, but also the variability in delivery time for each.

18 18 Slide © 2007 Thomson South-Western. All Rights Reserved Measures of Variability n Range n Interquartile Range n Variance n Standard Deviation n Coefficient of Variation

19 19 Slide © 2007 Thomson South-Western. All Rights Reserved Range The range of a data set is the difference between the The range of a data set is the difference between the largest and smallest data values. largest and smallest data values. It is the simplest measure of variability. It is the simplest measure of variability. It is very sensitive to the smallest and largest data It is very sensitive to the smallest and largest data values. values.

20 20 Slide © 2007 Thomson South-Western. All Rights Reserved Interquartile Range The interquartile range of a data set is the difference The interquartile range of a data set is the difference between the third quartile and the first quartile. between the third quartile and the first quartile. It is the range for the middle 50% of the data. It is the range for the middle 50% of the data. It overcomes the sensitivity to extreme data values. It overcomes the sensitivity to extreme data values.

21 21 Slide © 2007 Thomson South-Western. All Rights Reserved The variance is a measure of variability that utilizes The variance is a measure of variability that utilizes all the data. all the data. Variance It is based on the difference between the value of It is based on the difference between the value of each observation ( x i ) and the mean ( for a sample, each observation ( x i ) and the mean ( for a sample,  for a population).  for a population).

22 22 Slide © 2007 Thomson South-Western. All Rights Reserved Variance The variance is computed as follows: The variance is computed as follows: The variance is the average of the squared The variance is the average of the squared differences between each data value and the mean. differences between each data value and the mean. The variance is the average of the squared The variance is the average of the squared differences between each data value and the mean. differences between each data value and the mean. for a sample population

23 23 Slide © 2007 Thomson South-Western. All Rights Reserved Standard Deviation The standard deviation of a data set is the positive The standard deviation of a data set is the positive square root of the variance. square root of the variance. It is measured in the same units as the data, making It is measured in the same units as the data, making it more easily interpreted than the variance. it more easily interpreted than the variance.

24 24 Slide © 2007 Thomson South-Western. All Rights Reserved The standard deviation is computed as follows: The standard deviation is computed as follows: for a sample population Standard Deviation

25 25 Slide © 2007 Thomson South-Western. All Rights Reserved The coefficient of variation is computed as follows: The coefficient of variation is computed as follows: Coefficient of Variation The coefficient of variation indicates how large the The coefficient of variation indicates how large the standard deviation is in relation to the mean. standard deviation is in relation to the mean. The coefficient of variation indicates how large the The coefficient of variation indicates how large the standard deviation is in relation to the mean. standard deviation is in relation to the mean. for a sample population

26 26 Slide © 2007 Thomson South-Western. All Rights Reserved Measures of Distribution Shape, Relative Location, and Detecting Outliers n Distribution Shape n z-Scores n Chebyshev’s Theorem n Empirical Rule n Detecting Outliers

27 27 Slide © 2007 Thomson South-Western. All Rights Reserved Distribution Shape: Skewness n An important measure of the shape of a distribution is called skewness. n The formula for computing skewness for a data set is somewhat complex. Skewness can be easily computed using statistical software. Skewness can be easily computed using statistical software. n Excel’s SKEW function can be used to compute the skewness of a data set. skewness of a data set.

28 28 Slide © 2007 Thomson South-Western. All Rights Reserved Distribution Shape: Skewness n Symmetric (not skewed) Skewness is zero. Skewness is zero. Mean and median are equal. Mean and median are equal. Relative Frequency.05.10.15.20.25.30.35 0 0 Skewness = 0 Skewness = 0

29 29 Slide © 2007 Thomson South-Western. All Rights Reserved Relative Frequency.05.10.15.20.25.30.35 0 0 Distribution Shape: Skewness n Moderately Skewed Left Skewness is negative. Skewness is negative. Mean will usually be less than the median. Mean will usually be less than the median. Skewness = .31 Skewness = .31

30 30 Slide © 2007 Thomson South-Western. All Rights Reserved Distribution Shape: Skewness n Moderately Skewed Right Skewness is positive. Skewness is positive. Mean will usually be more than the median. Mean will usually be more than the median. Relative Frequency.05.10.15.20.25.30.35 0 0 Skewness =.31 Skewness =.31

31 31 Slide © 2007 Thomson South-Western. All Rights Reserved The z-score is often called the standardized value. The z-score is often called the standardized value. It denotes the number of standard deviations a data It denotes the number of standard deviations a data value x i is from the mean. value x i is from the mean. It denotes the number of standard deviations a data It denotes the number of standard deviations a data value x i is from the mean. value x i is from the mean. z-Scores

32 32 Slide © 2007 Thomson South-Western. All Rights Reserved z-Scores A data value less than the sample mean will have a A data value less than the sample mean will have a z-score less than zero. z-score less than zero. A data value greater than the sample mean will have A data value greater than the sample mean will have a z-score greater than zero. a z-score greater than zero. A data value equal to the sample mean will have a A data value equal to the sample mean will have a z-score of zero. z-score of zero. An observation’s z-score is a measure of the relative An observation’s z-score is a measure of the relative location of the observation in a data set. location of the observation in a data set.

33 33 Slide © 2007 Thomson South-Western. All Rights Reserved Chebyshev’s Theorem At least (1 - 1/ z 2 ) of the items in any data set will be At least (1 - 1/ z 2 ) of the items in any data set will be within z standard deviations of the mean, where z is within z standard deviations of the mean, where z is any value greater than 1. any value greater than 1. At least (1 - 1/ z 2 ) of the items in any data set will be At least (1 - 1/ z 2 ) of the items in any data set will be within z standard deviations of the mean, where z is within z standard deviations of the mean, where z is any value greater than 1. any value greater than 1.

34 34 Slide © 2007 Thomson South-Western. All Rights Reserved At least of the data values must be At least of the data values must be within of the mean. within of the mean. At least of the data values must be At least of the data values must be within of the mean. within of the mean. 75%75% z = 2 standard deviations z = 2 standard deviations Chebyshev’s Theorem At least of the data values must be At least of the data values must be within of the mean. within of the mean. At least of the data values must be At least of the data values must be within of the mean. within of the mean.89%89% z = 3 standard deviations z = 3 standard deviations At least of the data values must be At least of the data values must be within of the mean. within of the mean. At least of the data values must be At least of the data values must be within of the mean. within of the mean. 94%94% z = 4 standard deviations z = 4 standard deviations

35 35 Slide © 2007 Thomson South-Western. All Rights Reserved Empirical Rule For data having a bell-shaped distribution: For data having a bell-shaped distribution: of the values of a normal random variable of the values of a normal random variable are within of its mean. are within of its mean. of the values of a normal random variable of the values of a normal random variable are within of its mean. are within of its mean.68.26%68.26% +/- 1 standard deviation of the values of a normal random variable of the values of a normal random variable are within of its mean. are within of its mean. of the values of a normal random variable of the values of a normal random variable are within of its mean. are within of its mean. 95.44%95.44% +/- 2 standard deviations of the values of a normal random variable of the values of a normal random variable are within of its mean. are within of its mean. of the values of a normal random variable of the values of a normal random variable are within of its mean. are within of its mean.99.72%99.72% +/- 3 standard deviations

36 36 Slide © 2007 Thomson South-Western. All Rights Reserved Empirical Rule x  – 3   – 1   – 2   + 1   + 2   + 3  68.26% 95.44% 99.72%

37 37 Slide © 2007 Thomson South-Western. All Rights Reserved Detecting Outliers An outlier is an unusually small or unusually large An outlier is an unusually small or unusually large value in a data set. value in a data set. A data value with a z-score less than -3 or greater A data value with a z-score less than -3 or greater than +3 might be considered an outlier. than +3 might be considered an outlier. It might be: It might be: an incorrectly recorded data value an incorrectly recorded data value a data value that was incorrectly included in the a data value that was incorrectly included in the data set data set a correctly recorded data value that belongs in a correctly recorded data value that belongs in the data set the data set

38 38 Slide © 2007 Thomson South-Western. All Rights Reserved Measures of Association Between Two Variables n Covariance n Correlation Coefficient

39 39 Slide © 2007 Thomson South-Western. All Rights Reserved Covariance Positive values indicate a positive relationship. Positive values indicate a positive relationship. Negative values indicate a negative relationship. Negative values indicate a negative relationship. The covariance is a measure of the linear association The covariance is a measure of the linear association between two variables. between two variables. The covariance is a measure of the linear association The covariance is a measure of the linear association between two variables. between two variables.

40 40 Slide © 2007 Thomson South-Western. All Rights Reserved Covariance The correlation coefficient is computed as follows: The correlation coefficient is computed as follows: forsamples forpopulations

41 41 Slide © 2007 Thomson South-Western. All Rights Reserved Correlation Coefficient Values near +1 indicate a strong positive linear Values near +1 indicate a strong positive linear relationship. relationship. Values near +1 indicate a strong positive linear Values near +1 indicate a strong positive linear relationship. relationship. Values near -1 indicate a strong negative linear Values near -1 indicate a strong negative linear relationship. relationship. Values near -1 indicate a strong negative linear Values near -1 indicate a strong negative linear relationship. relationship. The coefficient can take on values between -1 and +1. The coefficient can take on values between -1 and +1.

42 42 Slide © 2007 Thomson South-Western. All Rights Reserved The correlation coefficient is computed as follows: The correlation coefficient is computed as follows: forsamplesforpopulations Correlation Coefficient

43 43 Slide © 2007 Thomson South-Western. All Rights Reserved Correlation Coefficient Just because two variables are highly correlated, it Just because two variables are highly correlated, it does not mean that one variable is the cause of the does not mean that one variable is the cause of the other. other. Just because two variables are highly correlated, it Just because two variables are highly correlated, it does not mean that one variable is the cause of the does not mean that one variable is the cause of the other. other. Correlation is a measure of linear association and not Correlation is a measure of linear association and not necessarily causation. necessarily causation. Correlation is a measure of linear association and not Correlation is a measure of linear association and not necessarily causation. necessarily causation.

44 44 Slide © 2007 Thomson South-Western. All Rights Reserved A golfer is interested in investigating A golfer is interested in investigating the relationship, if any, between driving distance and 18-hole score. 277.6259.5269.1267.0255.6272.9 697170707169 Average Driving Distance (yds.) Average 18-Hole Score Covariance and Correlation Coefficient

45 45 Slide © 2007 Thomson South-Western. All Rights Reserved Weighted Mean When the mean is computed by giving each data When the mean is computed by giving each data value a weight that reflects its importance, it is value a weight that reflects its importance, it is referred to as a weighted mean. referred to as a weighted mean. In the computation of a grade point average (GPA), In the computation of a grade point average (GPA), the weights are the number of credit hours earned for the weights are the number of credit hours earned for each grade. each grade. When data values vary in importance, the analyst When data values vary in importance, the analyst must choose the weight that best reflects the must choose the weight that best reflects the importance of each value. importance of each value.

46 46 Slide © 2007 Thomson South-Western. All Rights Reserved Weighted Mean where: x i = value of observation i x i = value of observation i w i = weight for observation i w i = weight for observation i

47 47 Slide © 2007 Thomson South-Western. All Rights Reserved In class empirical exercises


Download ppt "1 1 Slide © 2007 Thomson South-Western. All Rights Reserved."

Similar presentations


Ads by Google