Copyright © 2016 Brooks/Cole Cengage Learning Intro to Statistics Part II Descriptive Statistics Intro to Statistics Part II Descriptive Statistics Ernesto.

Slides:



Advertisements
Similar presentations
Brought to you by Tutorial Support Services The Math Center.
Advertisements

Chapter 2 Exploring Data with Graphs and Numerical Summaries
Chapter 13 Statistics © 2008 Pearson Addison-Wesley. All rights reserved.
Calculating & Reporting Healthcare Statistics
B a c kn e x t h o m e Parameters and Statistics statistic A statistic is a descriptive measure computed from a sample of data. parameter A parameter is.
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.
12.3 – Measures of Dispersion
12.2 – Measures of Central Tendency
Section 12-2 Measures of Central Tendency.
Describing Data: Numerical
Chapter 2 Describing Data with Numerical Measurements
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 4.1 Chapter Four Numerical Descriptive Techniques.
Chapter 13 Section 5 - Slide 1 Copyright © 2009 Pearson Education, Inc. AND.
Department of Quantitative Methods & Information Systems
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 4.1 Chapter Four Numerical Descriptive Techniques.
A Look at Means, Variances, Standard Deviations, and z-Scores
Chapter 13 Statistics © 2008 Pearson Addison-Wesley. All rights reserved.
12.3 – Measures of Dispersion Dispersion is another analytical method to study data. Two of the most common measures of dispersion are the range and the.
Chapter 3 Averages and Variations
© Copyright McGraw-Hill CHAPTER 3 Data Description.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.
STAT 280: Elementary Applied Statistics Describing Data Using Numerical Measures.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter.
1 PUAF 610 TA Session 2. 2 Today Class Review- summary statistics STATA Introduction Reminder: HW this week.
Chapter 2 Describing Data.
Lecture 3 Describing Data Using Numerical Measures.
SECTION 12-3 Measures of Dispersion Slide
1 CHAPTER 3 NUMERICAL DESCRIPTIVE MEASURES. 2 MEASURES OF CENTRAL TENDENCY FOR UNGROUPED DATA  In Chapter 2, we used tables and graphs to summarize a.
Measure of Central Tendency Measures of central tendency – used to organize and summarize data so that you can understand a set of data. There are three.
INVESTIGATION 1.
Understanding Basic Statistics Fourth Edition By Brase and Brase Prepared by: Lynn Smith Gloucester County College Chapter Three Averages and Variation.
Chap 3-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 3 Describing Data Using Numerical.
© Copyright McGraw-Hill CHAPTER 3 Data Description.
© 2008 Pearson Addison-Wesley. All rights reserved Chapter 1 Section 13-3 Measures of Dispersion.
Copyright © Cengage Learning. All rights reserved. 2 Descriptive Analysis and Presentation of Single-Variable Data.
1 Descriptive Statistics 2-1 Overview 2-2 Summarizing Data with Frequency Tables 2-3 Pictures of Data 2-4 Measures of Center 2-5 Measures of Variation.
© 2010 Pearson Education, Inc. All rights reserved Data Analysis/Statistics: An Introduction Chapter 10.
1 Descriptive Statistics Descriptive Statistics Ernesto Diaz Faculty – Mathematics Redwood High School.
Summary Statistics: Measures of Location and Dispersion.
Chapter 3 Averages and Variation Understanding Basic Statistics Fifth Edition By Brase and Brase Prepared by Jon Booze.
Statistics topics from both Math 1 and Math 2, both featured on the GHSGT.
Honors Statistics Chapter 3 Measures of Variation.
Descriptive Statistics(Summary and Variability measures)
Statistics Josée L. Jarry, Ph.D., C.Psych. Introduction to Psychology Department of Psychology University of Toronto June 9, 2003.
Slide Copyright © 2009 Pearson Education, Inc. Unit 9 Seminar Agenda Final Project and Due Dates Measures of Central Tendency Measures of Dispersion.
 2012 Pearson Education, Inc. Slide Chapter 12 Statistics.
MDFP Mathematics and Statistics 1. Univariate Data – Today’s Class 1.STATISTICS 2.Univariate (One Variable) Data 1.Definition 2.Mean, Median, Mode, Range.
CHAPTER 11 Mean and Standard Deviation. BOX AND WHISKER PLOTS  Worksheet on Interpreting and making a box and whisker plot in the calculator.
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
MM150 ~ Unit 9 Statistics ~ Part II. WHAT YOU WILL LEARN Mode, median, mean, and midrange Percentiles and quartiles Range and standard deviation z-scores.
 2012 Pearson Education, Inc. Slide Chapter 12 Statistics.
An Introduction to Statistics
Descriptive Statistics Ernesto Diaz Faculty – Mathematics
Descriptive Statistics ( )
Chapter 3 Describing Data Using Numerical Measures
Intro to Statistics Part II Descriptive Statistics
Chapter 12 Statistics 2012 Pearson Education, Inc.
Intro to Statistics Part II Descriptive Statistics
Chapter 12 Statistics 2012 Pearson Education, Inc.
CHAPTER 3 Data Description 9/17/2018 Kasturiarachi.
NUMERICAL DESCRIPTIVE MEASURES
Chapter 3 Describing Data Using Numerical Measures
Numerical Descriptive Measures
MBA 510 Lecture 2 Spring 2013 Dr. Tonya Balan 4/20/2019.
Chapter 12 Statistics.
14.2 Measures of Central Tendency
Chapter 12 Statistics.
Chapter 12 Statistics.
Presentation transcript:

Copyright © 2016 Brooks/Cole Cengage Learning Intro to Statistics Part II Descriptive Statistics Intro to Statistics Part II Descriptive Statistics Ernesto Diaz Assistant Professor of Mathematics

Copyright © Cengage Learning. All rights reserved Descriptive Statistics

3 Descriptive statistics is concerned with the accumulation of data, measures of central tendency, and dispersion.

4 Measures of Central Tendency

5 When we add up a list of numbers in statistics, we use the symbol  x to mean the sum of all the values that x can assume. Similarly,  x 2 means to square each value that x can assume, and then add the results; (  x) 2 means to first add the values and then square the result. The symbol  is the Greek capital letter sigma (which is chosen because S reminds us of “sum”). The average is the measure that most of us think of when we hear someone use the word average. It is called the mean.

6 Measures of Central Tendency Other statistical measures, called averages or measures of central tendency, are defined in the following box.

7 Example 3 – Mean, median, and mode for table values Consider Table 14.5, which shows the number of days one must wait for a marriage license in the various states in the United States. What are the mean, the median, and the mode for these data? Wait Time for a U.S. Marriage License Table 14.5

8 Example 3 – Solution Mean: To find the mean, we could, of course, add all 50 individual numbers, but instead, notice that 0 occurs 25 times, so write 0  25 1 occurs 1 time, so write 1  1 2 occurs 1 time, so write 2  1 3 occurs 19 times, so write 3  19 4 occurs 1 time, so write 4  1 5 occurs 3 times, so write 5  3 Thus, the mean is

9 Example 3 – Solution Median: Since the median is the middle number and there are 50 values, the median is the mean of the 25th and 26th numbers (when they are arranged in order): 25th term is 0 26th term is 1 Mode: The mode is the value that occurs most frequently, which is 0. cont’d

10 Measures of Central Tendency When finding the mean from a frequency distribution, you are finding what is called a weighted mean.

11 Example 4 – Find a weighted mean A sociology class is studying family structures and the professor asks each student to state the number of children in his or her family. The results are summarized in Table What is the average number of children in the families of students in this sociology class? Family Data Table 14.6

12 Example 4 – Solution We need to find the weighted mean, where x represents the number of students and w the population (number of families). = 2.12 There is an average of two children per family.

13 Measures of Position

14 Measures of Position The median divides the data into two equal parts, with half the values above the median and half below the median, so the median is called a measure of position. Sometimes we use benchmark positions that divide the data into more than two parts. Quartiles, denoted by Q 1 (first quartile), Q 2 (second quartile), and Q 3 (third quartile), divide the data into four equal parts. Deciles are nine values that divide the data into ten equal parts, and percentiles are 99 values that divide the data into 100 equal parts.

15 Measures of Position Measures of position are often used to make comparisons. Two measures of position are percentiles and quartiles.

16 To Find the Quartiles of a Set of Data Order the data from smallest to largest. Find the median, or 2 nd quartile, of the set of data. If there are an odd number of pieces of data, the median is the middle value. If there are an even number of pieces of data, the median will be halfway between the two middle pieces of data.

17 To Find the Quartiles of a Set of Data continued The first quartile, Q 1, is the median of the lower half of the data; that is, Q 1, is the median of the data less than Q 2. The third quartile, Q 3, is the median of the upper half of the data; that is, Q 3 is the median of the data greater than Q 2.

18 Example: Quartiles The weekly grocery bills for 23 families are as follows. Determine Q 1, Q 2, and Q

19 Example: Quartiles continued Order the data: Q 2 is the median of the entire data set which is 190. Q 1 is the median of the numbers from 50 to 172 which is 95. Q 3 is the median of the numbers from 210 to 330 which is 270.

20 Example 5 – Divide exam scores into quartiles The test results for Professor Hunter’s midterm exam are summarized in Table Divide these scores into quartiles. Table 14.7 Grade Distribution

21 Example 5 – Solution The quartiles are the three scores that divide the data into four parts. The first quartile is the data value that separates the lowest 25% of the scores from the remaining scores; the 2nd quartile is the value that separates the lower 50% of the scores from the remainder. Note that the 2nd quartile is the same as the median since the median divides the scores so that 50% are above and 50% are below. The 3rd quartile is the value that separates the lower 75% of the scores from the upper 25%. Begin by noting the number of scores: = 30.

22 Example 5 – Solution First quartile: 0.25(30) = 7.5, so Q 1 (the first quartile) is the 8th lowest score. From Table 14.7, we see that this score is 69. Second quartile: Q 2 the second quartile score, is the median, which is the mean of the 15th and 16th scores from the bottom. cont’d

23 Example 5 – Solution Third quartile: 0.75(30) = 22.5, so Q 3 (the third quartile score) is 23 scores from the bottom (or the 8th from the top). From Table 14.7, we see this score is 85. cont’d Table 14.7 Grade Distribution

24 Measures of Dispersion

25 Measures of Dispersion The measures we’ve been discussing can help us interpret information, but they do not give the entire story. For example, consider these sets of data: Set A: {8, 9, 9, 9, 10} Mean: Median: 9 Mode: 9 Set B: {2, 9, 9, 12, 13} Mean: Median: 9 Mode: 9

26 Measures of Dispersion Notice that, for sets A and B, the measures of central tendency do not distinguish the data. However, if you look at the data placed on planks, as shown in Figure 14.29, you will see that the data in Set B are relatively widely dispersed along the plank, whereas the data in Set A are clumped around the mean. Figure Visualization of dispersion of sets of data a. A = {8, 9, 9, 9, 10}b. B = {2, 9, 9, 12, 13}

27 Measures of Dispersion We’ll consider three measures of dispersion: the range, the standard deviation, and the variance.

28 Example 6 – Find the range Find the ranges for the data sets in Figure 14.29: a. Set A = {8, 9, 9, 9,10} b. Set B = {2, 9, 9, 12, 13} Solution: Notice from Figure that the mean for each of these sets of data is the same. Figure Visualization of dispersion of sets of data a. A = {8, 9, 9, 9, 10}b. B = {2, 9, 9, 12, 13}

29 Example 6 – Solution The range is found by comparing the difference between the largest and smallest values in the set. a. 10 – 8 = 2 b. 13 – 2 = 11 cont’d

30 Measures of Dispersion The range is used, along with quartiles, to construct a statistical tool called a box plot. For a given set of data, a box plot consists of a rectangular box positioned above a numerical scale, drawn from Q 1 (the first quartile) to Q 3 (the third quartile). The median ( Q 2, or second quartile) is shown as a dashed line, and a segment is extended to the left to show the distance to the minimum value; another segment is extended to the right for the maximum value.

31 Measures of Dispersion Figure shows a box plot for the data in Example 5. Figure Box plot for grade distribution

32 Measures of Dispersion Sometimes a box plot is called a box-and-whisker plot. Its usefulness should be clear when you look at Figure box plot shows: 1. the median (a measure of central tendency); 2. the location of the middle half of the data (represented by the extent of the box); Figure Box plot

33 Measures of Dispersion 3. the range (a measure of dispersion); 4. the skewness (the nonsymmetry of both the box and the whiskers). The variance and standard deviation are measures that use all the numbers in the data set to give information about the dispersion. When finding the variance, we must make a distinction between the variance of the entire population and the variance of a random sample from the population.

34 Measures of Dispersion When the variance is based on a set of sample scores, it is denoted by s 2 ; and when it is based on all scores in a population, it is denoted by  2 (  is the lowercase Greek letter sigma). The variance for a random sample is found by

35 Measures of Dispersion To understand this formula for the sample variance, we will consider an example before summarizing a procedure. Again, let’s use the data sets we worked with in Example 6. Set A = {8, 9, 9, 9, 10} Set B = {2, 9, 9, 12, 13} Mean is 9.

36 Measures of Dispersion Find the deviations by subtracting the mean from each term: 8 – 9 = –1 2 – 9 = –7 9 – 9 = 0 9 – 9 = 0 9 – 9 = 0 12 – 9 = 3 10 – 9 = 1 13 – 9 = 4 If we sum these deviations (to obtain a measure of the total deviation), in each case we obtain 0, because the positive and negative differences “cancel each other out.” Mean

37 Measures of Dispersion Next we calculate the square of each of these deviations: Set A = {8, 9, 9, 9, 10} Set B = {2, 9, 9, 12, 13} (8 – 9) 2 = (–1) 2 = 1 (2 – 9) 2 = (–7) 2 = 49 (9 – 9) 2 = 0 2 = 0 (9 – 9) 2 = 0 2 = 0 (9 – 9) 2 = 0 2 = 0 (12 – 9) 2 = 3 2 = 9 (10 – 9) 2 = 1 2 = 1 (13 – 9) 2 = 4 2 = 16

38 Measures of Dispersion Finally, we find the sum of these squares and divide by one less than the number of items to obtain the variance: Set A: Set B: The larger the variance, the more dispersion there is in the original data.

39 Measures of Dispersion

40 Example 8 – Find the standard deviation for a math test Suppose that Hannah received the following test scores in a math class: 92, 85, 65, 89, 96, and 71. Find s, the standard deviation, for her test scores. Solution: Step 1 This is the mean.

41 Example 8 – Solution Steps 2–4 We summarize these steps in table format: Score Square of the Deviation from the Mean 92 (92 – 83) 2 = 9 2 = (85 – 83) 2 = 2 2 = 4 65 (65 – 83) 2 = (–18) 2 = (89 – 83) 2 = 6 2 = (96 – 83) 2 = 13 2 = (71 – 83) 2 = (–12) 2 = 144

42 Example 8 – Solution Step 5 Divide the sum by 5 (one less than the number of scores): We note that this number, 151.6, is called the variance. If you do not have access to a calculator, you can use the variance as a measure of dispersion. However, we assume you have a calculator and can find the standard deviation. cont’d

43 Example 8 – Solution Step 6 cont’d

44 Interpreting Measures of Dispersion A main use of dispersion is to compare the amounts of spread in two (or more) data sets. A common technique in inferential statistics is to draw comparisons between populations by analyzing samples that come from those populations.

45 Example: Interpreting Measures Two companies, A and B, sell small packs of sugar for coffee. The mean and standard deviation for samples from each company are given below. Which company consistently provides more sugar in their packs? Which company fills its packs more consistently? Company A Company B

46 Example: Interpreting Measures Solution We infer that Company A most likely provides more sugar than Company B (greater mean). We also infer that Company B is more consistent than Company A (smaller standard deviation).

47 © 2008 Pearson Addison-Wesley. All rights reserved Symmetry in Data Sets The most useful way to analyze a data set often depends on whether the distribution is symmetric or non-symmetric. In a “symmetric” distribution, as we move out from a central point, the pattern of frequencies is the same (or nearly so) to the left and right. In a “non-symmetric” distribution, the patterns to the left and right are different.

48 © 2008 Pearson Addison-Wesley. All rights reserved Some Symmetric Distributions

49 © 2008 Pearson Addison-Wesley. All rights reserved Non-symmetric Distributions A non-symmetric distribution with a tail extending out to the left, shaped like a J, is called skewed to the left. If the tail extends out to the right, the distribution is skewed to the right.

50 © 2008 Pearson Addison-Wesley. All rights reserved Some Non-symmetric Distributions

51 © 2008 Pearson Addison-Wesley. All rights reserved Chebyshev’s Theorem For any set of numbers, regardless of how they are distributed, the fraction of them that lie within k standard deviations of their mean (where k > 1) is at least

52 © 2008 Pearson Addison-Wesley. All rights reserved Example: Chebyshev’s Theorem What is the minimum percentage of the items in a data set which lie within 3 standard deviations of the mean? Solution With k = 3, we calculate