Presentation is loading. Please wait.

Presentation is loading. Please wait.

Yandell – Econ 216 Chap 3-1 Chapter 3 Numerical Descriptive Measures.

Similar presentations


Presentation on theme: "Yandell – Econ 216 Chap 3-1 Chapter 3 Numerical Descriptive Measures."— Presentation transcript:

1

2 Yandell – Econ 216 Chap 3-1 Chapter 3 Numerical Descriptive Measures

3 Yandell – Econ 216 Chap 3-2 After completing this chapter, you should be able to: Compute and interpret the mean, median, and mode for a set of data Find the range, variance, and standard deviation and know what these values mean Construct and interpret a box and whiskers plot Compute and explain the coefficient of variation Use numerical measures along with graphs, charts, and tables to describe data Chapter Goals

4 Yandell – Econ 216 Chap 3-3 Chapter Topics Measures of Center and Location Mean, median, mode, geometric mean, midrange Other measures of Location Weighted mean, percentiles, quartiles Measures of Variation Range, interquartile range, variance and standard deviation, coefficient of variation Skewness (shape) Linear correlation

5 Yandell – Econ 216 Chap 3-4 Summary Measures Center and Location Mean Median Mode Other Measures of Location Weighted Mean Describing Data Numerically Variation Variance Standard Deviation Coefficient of Variation Range Percentiles Interquartile Range Quartiles Skewness

6 Yandell – Econ 216 Chap 3-5 Notation Conventions Population Parameters are denoted with a letter from the Greek alphabet:  (mu) represents the population mean  (sigma) represents the population standard deviation Sample Statistics are commonly denoted with letters from the Roman alphabet: _ X (X-bar) represents the sample mean S represents the sample standard deviation

7 Yandell – Econ 216 Chap 3-6 Measures of Center and Location Center and Location MeanMedian ModeWeighted Mean Overview Midpoint of ranked values Most frequently observed value

8 Yandell – Econ 216 Chap 3-7 Mean (Arithmetic Average) The Mean is the arithmetic average of data values Sample mean Population mean n = Sample Size N = Population Size

9 Yandell – Econ 216 Chap 3-8 Mean (Arithmetic Average) The most common measure of central tendency Mean = sum of values divided by the number of values Affected by extreme values (outliers) (continued) 0 1 2 3 4 5 6 7 8 9 10 Mean = 3 0 1 2 3 4 5 6 7 8 9 10 Mean = 4

10 Yandell – Econ 216 Chap 3-9 Median Not affected by extreme values In an ordered array, the median is the “middle” number (50% above, 50% below) If n or N is odd, the median is the middle number If n or N is even, the median is the average of the two middle numbers 0 1 2 3 4 5 6 7 8 9 10 Median = 3 0 1 2 3 4 5 6 7 8 9 10 Median = 3

11 Yandell – Econ 216 Chap 3-10 Median To find the median, rank the n values in order of magnitude Find the value in the (n+1)/2 position If n is an even number, let the median be the mean of the two middle-most observations. (continued)

12 Yandell – Econ 216 Chap 3-11 Mode A measure of central tendency Value that occurs most often Not affected by extreme values Used for either numerical or categorical data There may may be no mode There may be several modes 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Mode = 5 0 1 2 3 4 5 6 No Mode

13 Yandell – Econ 216 Chap 3-12 Weighted Mean Used when values are grouped by frequency or relative importance Days to Complete Frequency 54 612 78 82 Example: Sample of 26 Repair Projects Weighted Mean Days to Complete:

14 Yandell – Econ 216 Chap 3-13 Five houses on a hill by the beach Review Example House Prices: $2,000,000 500,000 300,000 100,000 100,000

15 Yandell – Econ 216 Chap 3-14 Summary Statistics Mean: ($3,000,000/5) = $600,000 Median: middle value of ranked data = $300,000 Mode: most frequent value = $100,000 House Prices: $2,000,000 500,000 300,000 100,000 100,000 Sum $3,000,000

16 Yandell – Econ 216 Chap 3-15 Mean is generally used, unless extreme values (outliers) exist Then median is often used, since the median is not sensitive to extreme values Example: Median home prices may be reported for a region – less sensitive to outliers Which measure of location is the “best”?

17 Yandell – Econ 216 Chap 3-16 Note The mean and median values do not have to be values that are part of the data set. Example (four observations): 2, 3, 4, 11 Mean = (2+3+4+11)/4 = 5 Median = 3.5 (The median position is (N+1)/2 = 2.5 th position, so use the midpoint of middle-most values)

18 Yandell – Econ 216 Chap 3-17 Shape of a Distribution Describes how data is distributed Symmetric or skewed Mean = Median = Mode Mean < Median < Mode Mode < Median < Mean Right-Skewed Left-Skewed Symmetric (Longer tail extends to left)(Longer tail extends to right)

19 Yandell – Econ 216 Chap 3-18 Measuring Skewness A number called the coefficient of skewness (SK) is commonly used to measure skewness: _ 3 (X ! Median) SK = S where S is the sample standard deviation, _ X is the sample mean, and Median is the sample median.

20 Yandell – Econ 216 Chap 3-19 Measuring Skewness The magnitude of SK indicates the degree of skewness, where -3 # SK # +3 SK < 0 6 skewed left SK = 0 6 symmetric (not skewed) SK > 0 6 skewed right The coefficient of skewness is calculated and reported by many computer statistical software packages. (continued)

21 Yandell – Econ 216 Chap 3-20 Other Location Measures Other Measures of Location PercentilesQuartiles 1 st quartile = 25 th percentile 2 nd quartile = 50 th percentile = median 3 rd quartile = 75 th percentile The p th percentile in a data array: p% are less than or equal to this value (100 – p)% are greater than or equal to this value (where 0 ≤ p ≤ 100)

22 Yandell – Econ 216 Chap 3-21 Percentiles The p th percentile in an ordered array of n values is the value in i th position, where Example: The 60 th percentile in an ordered array of 19 values is the value in 12 th position:

23 Yandell – Econ 216 Chap 3-22 Quartiles Quartiles split the ranked data into 4 equal groups 25% Sample Data in Ordered Array: 11 12 13 16 16 17 18 21 22 Example: Find the first quartile (n = 9) Q1 = 25 th percentile, so find the (9+1) = 2.5 position so use the value half way between the 2 nd and 3 rd values, so Q1 = 12.5 25 100 Q1Q2Q3

24 Yandell – Econ 216 Chap 3-23 Quartile Formulas Find a quartile by determining the value in the x th position of the ranked data, where First quartile: Q 1 = (n+1)/4 Second quartile: Q 2 = (n+1)/2 (the median position) Third quartile: Q 3 = 3(n+1)/4 where n is the number of observed values

25 Yandell – Econ 216 Chap 3-24 Box and Whisker Plot A Graphical display of data using 5-number summary: Minimum -- Q1 -- Median -- Q3 -- Maximum Example: 25% 25%

26 Yandell – Econ 216 Chap 3-25 Shape of Box and Whisker Plots The Box and central line are centered between the endpoints if data is symmetric around the median A Box and Whisker plot can be shown in either vertical or horizontal format

27 Yandell – Econ 216 Chap 3-26 Distribution Shape and Box and Whisker Plot Right-SkewedLeft-SkewedSymmetric Q1Q2Q3Q1Q2Q3 Q1Q2Q3

28 Yandell – Econ 216 Chap 3-27 Box-and-Whisker Plot Example Below is a Box-and-Whisker plot for the following data: 0 2 2 2 3 3 4 5 5 10 27 This data is very right skewed, as the plot depicts 0 2 3 5 27 Min Q1 Q2 Q3 Max

29 Yandell – Econ 216 Chap 3-28 Using PHStat to construct a Box-and-Whisker Plot The PHStat add-in can be used to easily create a Box-and-Whisker Plot. If you have several data sets and wish to make comparisons, PHStat can create multiple plots in the same display window Click here to see a Box-and-Whisker plot created using PHStat

30 Yandell – Econ 216 Chap 3-29 Measures of Variation Variation Variance Standard DeviationCoefficient of Variation Population Variance Sample Variance Population Standard Deviation Sample Standard Deviation Range Interquartile Range

31 Yandell – Econ 216 Chap 3-30 Measures of variation give information on the spread or variability of the data values. Variation Same center, different variation

32 Yandell – Econ 216 Chap 3-31 Range Simplest measure of variation Difference between the largest and the smallest observations: Range = x maximum – x minimum 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Range = 14 - 1 = 13 Example:

33 Yandell – Econ 216 Chap 3-32 Ignores the way in which data are distributed Sensitive to outliers 7 8 9 10 11 12 Range = 12 - 7 = 5 7 8 9 10 11 12 Range = 12 - 7 = 5 Disadvantages of the Range 1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5 1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120 Range = 5 - 1 = 4 Range = 120 - 1 = 119

34 Yandell – Econ 216 Chap 3-33 Interquartile Range Can eliminate some outlier problems by using the interquartile range Eliminate some high- and low-valued observations and calculate the range from the remaining values Interquartile range = 3 rd quartile – 1 st quartile

35 Yandell – Econ 216 Chap 3-34 Interquartile Range Median (Q2) X maximum X minimum Q1Q3 Example: 25% 25% 12 30 45 57 70 Interquartile range = 57 – 30 = 27

36 Yandell – Econ 216 Chap 3-35 Average of squared deviations of values from the mean Sample variance: Population variance: Variance listen

37 Yandell – Econ 216 Chap 3-36 Standard Deviation Most commonly used measure of variation Shows variation about the mean Has the same units as the original data Sample standard deviation: Population standard deviation:

38 Yandell – Econ 216 Chap 3-37 Calculation Example: Sample Standard Deviation Sample Data (X i ) : 10 12 14 15 17 18 18 24 n = 8 Mean = X = 16

39 Yandell – Econ 216 Chap 3-38 Measuring variation Small standard deviation Large standard deviation

40 Yandell – Econ 216 Chap 3-39 Comparing Standard Deviations Mean = 15.5 s = 3.338 11 12 13 14 15 16 17 18 19 20 21 Data B Data A Mean = 15.5 s =.9258 11 12 13 14 15 16 17 18 19 20 21 Mean = 15.5 s = 4.57 Data C

41 Yandell – Econ 216 Chap 3-40 Advantages of Variance and Standard Deviation Each value in the data set is used in the calculation Values far from the mean are given extra weight (because deviations from the mean are squared)

42 Yandell – Econ 216 Chap 3-41 If the data distribution is bell-shaped, then the interval: contains about 68% of the values in the population or the sample The Empirical Rule X 68%

43 Yandell – Econ 216 Chap 3-42 contains about 95% of the values in the population or the sample contains about 99.7% of the values in the population or the sample The Empirical Rule 99.7%95%

44 Yandell – Econ 216 Chap 3-43 Coefficient of Variation Measures relative variation Always in percentage (%) Shows variation relative to mean Is used to compare two or more sets of data measured in different units Population Sample listen

45 Yandell – Econ 216 Chap 3-44 Comparing Coefficient of Variation Stock A: Average price last year = $50 Standard deviation = $5 Stock B: Average price last year = $100 Standard deviation = $5 Both stocks have the same standard deviation, but stock B is less variable relative to its price

46 Yandell – Econ 216 Chap 3-45 Using Microsoft Excel Descriptive Statistics are easy to obtain from Microsoft Excel Excel 2013: Data / data analysis / descriptive statistics (Excel 2007: Tools / data analysis / descriptive statistics) Enter details in dialog box Click here to open house price worksheet, then follow steps shown below to obtain descriptive statistics

47 Yandell – Econ 216 Chap 3-46 In Excel 2007 Use menu choice: tools / data analysis / descriptive statistics

48 In Excel 2010 or 2013 Yandell – Econ 216 Chap 3-47 Use “Data” tab: Data / data analysis / descriptive statistics

49 Yandell – Econ 216 Chap 3-48 Enter dialog box details Check box for summary statistics Click OK Using Excel (continued)

50 Yandell – Econ 216 Chap 3-49 Excel output Microsoft Excel descriptive statistics output, using the house price data: House Prices: $2,000,000 500,000 300,000 100,000 100,000 Click here to start demo

51 Yandell – Econ 216 Chap 3-50 Scatter Plots and Correlation A scatter plot (or scatter diagram) is used to show the relationship between two variables Correlation analysis is used to measure strength of the linear association (linear relationship) between two variables Only concerned with strength of the relationship No causal effect is implied

52 Yandell – Econ 216 Chap 3-51 Scatter Plot Examples Y X Y X Y Y X X Strong relationshipsWeak relationships

53 Yandell – Econ 216 Chap 3-52 Scatter Plot Examples Y X Y X No relationship (continued)

54 Yandell – Econ 216 Chap 3-53 Correlation Coefficient The population correlation coefficient ρ (rho) measures the strength of the association between the variables The sample correlation coefficient r is an estimate of ρ and is used to measure the strength of the linear relationship in the sample observations (continued)

55 Yandell – Econ 216 Chap 3-54 Features of ρ  and r Unit free Range between -1 and 1 The closer to -1, the stronger the negative linear relationship The closer to 1, the stronger the positive linear relationship The closer to 0, the weaker the linear relationship

56 Yandell – Econ 216 Chap 3-55 r = +.3r = +1 Examples of Approximate r Values Y X Y X Y X Y X Y X r = -1 r = -.6r = 0

57 Yandell – Econ 216 Chap 3-56 Calculating the Correlation Coefficient where: r = Sample correlation coefficient n = Sample size X = Value of the independent variable Y = Value of the dependent variable Sample correlation coefficient: or the algebraic equivalent:

58 Yandell – Econ 216 Chap 3-57 Calculation Example Tree Height Trunk Diameter YXXYY2Y2 X2X2 358280122564 499441240181 27718972949 336198108936 60137803600169 21714744149 45114952025121 51126122601144  =321  =73  =3142  =14111  =713

59 Yandell – Econ 216 Chap 3-58 Trunk Diameter, X Tree Height, Y Calculation Example (continued) r = 0.886 → relatively strong positive linear association between X and Y

60 Yandell – Econ 216 Chap 3-59 Excel Output Excel Correlation Output Tools / data analysis / correlation… Correlation between Tree Height and Trunk Diameter

61 Yandell – Econ 216 Chap 3-60 Chapter Summary Described measures of center and location mean, median, mode Discussed percentiles and quartiles Described measure of variation range, interquartile range, variance, standard deviation, coefficient of variation Created Box-and-Whisker plots Illustrated distribution shapes (symmetric, skewed) Discussed linear correlation

62 Yandell – Econ 216 Chap 3-61 Final Demonstration Click here to see a demo of a side-by-side box-and-whisker plot and see how to get summary statistics


Download ppt "Yandell – Econ 216 Chap 3-1 Chapter 3 Numerical Descriptive Measures."

Similar presentations


Ads by Google