# Sullivan – Statistics: Informed Decisions Using Data – 2 nd Edition – Chapter 3 Introduction – Slide 1 of 3 Topic 16 Numerically Summarizing Data- Averages.

## Presentation on theme: "Sullivan – Statistics: Informed Decisions Using Data – 2 nd Edition – Chapter 3 Introduction – Slide 1 of 3 Topic 16 Numerically Summarizing Data- Averages."— Presentation transcript:

Sullivan – Statistics: Informed Decisions Using Data – 2 nd Edition – Chapter 3 Introduction – Slide 1 of 3 Topic 16 Numerically Summarizing Data- Averages and Five-Number Summary

Sullivan – Statistics: Informed Decisions Using Data – 2 nd Edition – Chapter 3 Introduction – Slide 2 of 3 Topic 16 - Averages ●The arithmetic mean of a variable is often what people mean by the “average” … add up all the values and divide by how many there are ●Compute the arithmetic mean of 6, 1, 5 ●The arithmetic mean of a variable is often what people mean by the “average” … add up all the values and divide by how many there are ●Compute the arithmetic mean of 6, 1, 5 ●Add up the three numbers and divide by 3 (6 + 1 + 5) / 3 = 4.0 ●The arithmetic mean is 4.0

Sullivan – Statistics: Informed Decisions Using Data – 2 nd Edition – Chapter 3 Introduction – Slide 3 of 3 Topic 16 – Median ●The median of a variable is the “center” ●When the data is sorted in order, the median is the middle value ●The median of a variable is the “center” ●When the data is sorted in order, the median is the middle value ●The calculation of the median of a variable is slightly different depending on  If there are an odd number of points, or  If there are an even number of points

Sullivan – Statistics: Informed Decisions Using Data – 2 nd Edition – Chapter 3 Introduction – Slide 4 of 3 Topic 16 – Median ●To calculate the median (M) of a data set  Arrange the data in order  Count the number of observations, n ●To calculate the median (M) of a data set  Arrange the data in order  Count the number of observations, n ●If n is odd  There is a value that’s exactly in the middle  That value is the median M ●To calculate the median (M) of a data set  Arrange the data in order  Count the number of observations, n ●If n is odd  There is a value that’s exactly in the middle  That value is the median M ●If n is even  There are two values on either side of the exact middle  Take their mean to be the median M

Sullivan – Statistics: Informed Decisions Using Data – 2 nd Edition – Chapter 3 Introduction – Slide 5 of 3 Topic 16 – Median ●An example with an odd number of observations (5 observations) ●Compute the median of 6, 1, 11, 2, 11 ●An example with an odd number of observations (5 observations) ●Compute the median of 6, 1, 11, 2, 11 ●Sort them in order 1, 2, 6, 11, 11 ●An example with an odd number of observations (5 observations) ●Compute the median of 6, 1, 11, 2, 11 ●Sort them in order 1, 2, 6, 11, 11 ●The middle number is 6, so the median is 6

Sullivan – Statistics: Informed Decisions Using Data – 2 nd Edition – Chapter 3 Introduction – Slide 6 of 3 Topic 16 – Median ●An example with an even number of observations (4 observations) ●Compute the median of 6, 1, 11, 2 ●An example with an even number of observations (4 observations) ●Compute the median of 6, 1, 11, 2 ●Sort them in order 1, 2, 6, 11 ●An example with an even number of observations (4 observations) ●Compute the median of 6, 1, 11, 2 ●Sort them in order 1, 2, 6, 11 ●Take the mean of the two middle values (2 + 6) / 2 = 4 ●The median is 4

Sullivan – Statistics: Informed Decisions Using Data – 2 nd Edition – Chapter 3 Introduction – Slide 7 of 3 Topic 16 – Median ●One interpretation ●The median splits the data into halves 62, 68, 71, 74, 77, 82, 84, 88, 90, 94 M = 79.5 62, 68, 71, 74, 77 5 on the left 82, 84, 88, 90, 94 5 on the right

Sullivan – Statistics: Informed Decisions Using Data – 2 nd Edition – Chapter 3 Introduction – Slide 8 of 3 Topic 16 – Mode ●The mode of a variable is the most frequently occurring value ●Find the mode of 6, 1, 2, 6, 11, 7, 3 ●The mode of a variable is the most frequently occurring value ●Find the mode of 6, 1, 2, 6, 11, 7, 3 ●The values are 1, 2, 3, 6, 7, 11 ●The mode of a variable is the most frequently occurring value ●Find the mode of 6, 1, 2, 6, 11, 7, 3 ●The values are 1, 2, 3, 6, 7, 11 ●The value 6 occurs twice, all the other values occur only once ●The mode is 6

Sullivan – Statistics: Informed Decisions Using Data – 2 nd Edition – Chapter 3 Introduction – Slide 9 of 3 Topic 16 – Mode ●Qualitative data  Values are one of a set of categories  Cannot add or order them … the mean and median do not exist  The mode is the only one of these three measurements that exists ●Qualitative data  Values are one of a set of categories  Cannot add or order them … the mean and median do not exist  The mode is the only one of these three measurements that exists ●Qualitative data  Values are one of a set of categories  Cannot add or order them … the mean and median do not exist  The mode is the only one of these three measurements that exists ●Find the mode of blue, blue, blue, red, green ●The mode is “blue” because it is the value that occurs the most often

Sullivan – Statistics: Informed Decisions Using Data – 2 nd Edition – Chapter 3 Introduction – Slide 10 of 3 Topic 16 – Mode ●Quantitative data  The mode can be computed but sometimes it is not meaningful  Sometimes each value will only occur once (which can often happen with precise measurements) ●Quantitative data  The mode can be computed but sometimes it is not meaningful  Sometimes each value will only occur once (which can often happen with precise measurements) ●Quantitative data  The mode can be computed but sometimes it is not meaningful  Sometimes each value will only occur once (which can often happen with precise measurements) ●Find the mode of 5.1, 6.6, 6.8, 9.3, 1.9 ●Quantitative data  The mode can be computed but sometimes it is not meaningful  Sometimes each value will only occur once (which can often happen with precise measurements) ●Find the mode of 5.1, 6.6, 6.8, 9.3, 1.9 ●Each value occurs only once ●The mode is not a meaningful measurement

Sullivan – Statistics: Informed Decisions Using Data – 2 nd Edition – Chapter 3 Introduction – Slide 11 of 3 Topic 16 – mean, median & mode ●The mean and the median are often different ●This difference gives us clues about the shape of the distribution  Is it symmetric?  Is it skewed left?  Is it skewed right?  Are there any extreme values?

Sullivan – Statistics: Informed Decisions Using Data – 2 nd Edition – Chapter 3 Introduction – Slide 12 of 3 Topic 16 – mean, median & mode ●Symmetric – the mean will usually be close to the median ●Skewed left – the mean will usually be smaller than the median ●Skewed right – the mean will usually be larger than the median

Sullivan – Statistics: Informed Decisions Using Data – 2 nd Edition – Chapter 3 Introduction – Slide 13 of 3 Topic 16 – mean, median & mode ●If a distribution is symmetric, the data values above and below the mean will balance  The mean will be in the “middle”  The median will be in the “middle” ●If a distribution is symmetric, the data values above and below the mean will balance  The mean will be in the “middle”  The median will be in the “middle” ●Thus the mean will be close to the median, in general, for a distribution that is symmetric

Sullivan – Statistics: Informed Decisions Using Data – 2 nd Edition – Chapter 3 Introduction – Slide 14 of 3 Topic 16 – mean, median & mode ●If a distribution is skewed left, there will be some data values that are larger than the others  The mean will decrease  The median will not decrease as much ●If a distribution is skewed left, there will be some data values that are larger than the others  The mean will decrease  The median will not decrease as much ●Thus the mean will be smaller than the median, in general, for a distribution that is skewed left

Sullivan – Statistics: Informed Decisions Using Data – 2 nd Edition – Chapter 3 Introduction – Slide 15 of 3 Topic 16 – mean, median & mode ●If a distribution is skewed right, there will be some data values that are larger than the others  The mean will increase  The median will not increase as much ●If a distribution is skewed right, there will be some data values that are larger than the others  The mean will increase  The median will not increase as much ●Thus the mean will be larger than the median, in general, for a distribution that is skewed right

Sullivan – Statistics: Informed Decisions Using Data – 2 nd Edition – Chapter 3 Introduction – Slide 16 of 3 Topic 16 – mean, median & mode ●For a mostly symmetric distribution, the mean and the median will be roughly equal ●Many variables, such as birth weights below, are approximately symmetric

Sullivan – Statistics: Informed Decisions Using Data – 2 nd Edition – Chapter 3 Introduction – Slide 17 of 3 Topic 16 – mean, median & mode ●What if one value is extremely different from the others? ●What if we made a mistake and 6, 1, 2 was recorded as 6000, 1, 2 ●What if one value is extremely different from the others ( this is so called an outlier)? ●What if we made a mistake and 6, 1, 2 was recorded as 6000, 1, 2 ●The mean is now ( 6000 + 1 + 2 ) / 3 = 2001 ●The median is still 2 ●The median is “resistant to extreme values”

Sullivan – Statistics: Informed Decisions Using Data – 2 nd Edition – Chapter 3 Introduction – Slide 18 of 3 Topic 16 – Summary for the Measure of Center ●Mean  The center of gravity  Useful for roughly symmetric quantitative data ●Median  Splits the data into halves  Useful for highly skewed quantitative data ●Mode  The most frequent value  Useful for qualitative data

Sullivan – Statistics: Informed Decisions Using Data – 2 nd Edition – Chapter 3 Introduction – Slide 19 of 3 Topic 16 – Measure of Spread/Dispersion ●Comparing two sets of data ●The measures of central tendency (mean, median, mode) measure the differences between the “average” or “typical” values between two sets of data ●Comparing two sets of data ●The measures of central tendency (mean, median, mode) measure the differences between the “average” or “typical” values between two sets of data ●The measures of dispersion in this section measure the differences between how far “spread out” the data values are

Sullivan – Statistics: Informed Decisions Using Data – 2 nd Edition – Chapter 3 Introduction – Slide 20 of 3 Topic 16 – Range ●The range of a variable is the largest data value minus the smallest data value ●Compute the range of 6, 1, 2, 6, 11, 7, 3, 3 ●The range of a variable is the largest data value minus the smallest data value ●Compute the range of 6, 1, 2, 6, 11, 7, 3, 3 ●The largest value is 11 ●The smallest value is 1 ●The range of a variable is the largest data value minus the smallest data value ●Compute the range of 6, 1, 2, 6, 11, 7, 3, 3 ●The largest value is 11 ●The smallest value is 1 ●Subtracting the two … 11 – 1 = 10 … the range is 10

Sullivan – Statistics: Informed Decisions Using Data – 2 nd Edition – Chapter 3 Introduction – Slide 21 of 3 Topic 16 – Range ●The range only uses two values in the data set – the largest value and the smallest value ●The range is not resistant ●The range only uses two values in the data set – the largest value and the smallest value ●The range is not resistant ●If we made a mistake and 6, 1, 2 was recorded as 6000, 1, 2 ●The range only uses two values in the data set – the largest value and the smallest value ●The range is not resistant ●If we made a mistake and 6, 1, 2 was recorded as 6000, 1, 2 ●The range is now ( 6000 – 1 ) = 5999

Sullivan – Statistics: Informed Decisions Using Data – 2 nd Edition – Chapter 3 Introduction – Slide 22 of 3 Topic 16 -Percentile ●The median divides the lower 50% of the data from the upper 50% ●The median is the 50 th percentile ●If a number divides the lower 34% of the data from the upper 66%, that number is the 34 th percentile

Sullivan – Statistics: Informed Decisions Using Data – 2 nd Edition – Chapter 3 Introduction – Slide 23 of 3 Topic 16 - Quartiles ●The quartiles are the 25 th, 50 th, and 75 th percentiles  Q 1 = 25 th percentile  Q 2 = 50 th percentile = median  Q 3 = 75 th percentile ●Quartiles are the most commonly used percentiles ●The 50 th percentile and the second quartile Q 2 are both other ways of defining the median

Sullivan – Statistics: Informed Decisions Using Data – 2 nd Edition – Chapter 3 Introduction – Slide 24 of 3 Topic 16 - Quartiles ●Quartiles divide the data set into four equal parts ●The top quarter are the values between Q 3 and the maximum ●Quartiles divide the data set into four equal parts ●The top quarter are the values between Q 3 and the maximum ●The bottom quarter are the values between the minimum and Q 1

Sullivan – Statistics: Informed Decisions Using Data – 2 nd Edition – Chapter 3 Introduction – Slide 25 of 3 Topic 16 - Quartiles ●Quartiles divide the data set into four equal parts ●The interquartile range (IQR) is the difference between the third and first quartiles IQR = Q 3 – Q 1 ●The IQR is a resistant measurement of dispersion

Sullivan – Statistics: Informed Decisions Using Data – 2 nd Edition – Chapter 3 Introduction – Slide 26 of 3 Topic 16 – Five Number Summary ●The five-number summary is the collection of  The smallest value  The first quartile (Q 1 or P 25 )  The median (M or Q 2 or P 50 )  The third quartile (Q 3 or P 75 )  The largest value ●These five numbers give a concise description of the distribution of a variable

Sullivan – Statistics: Informed Decisions Using Data – 2 nd Edition – Chapter 3 Introduction – Slide 27 of 3 Topic 16 – Five Number Summary ●The median  Information about the center of the data  Resistant ●The median  Information about the center of the data  Resistant ●The first quartile and the third quartile  Information about the spread of the data  Resistant ●The median  Information about the center of the data  Resistant ●The first quartile and the third quartile  Information about the spread of the data  Resistant ●The smallest value and the largest value  Information about the tails of the data  Not resistant

Sullivan – Statistics: Informed Decisions Using Data – 2 nd Edition – Chapter 3 Introduction – Slide 28 of 3 Topic 16 – Five Number Summary ●Compute the five-number summary for 1, 3, 4, 7, 8, 15, 16, 19, 23, 24, 27, 31, 33, 54 ●Compute the five-number summary for 1, 3, 4, 7, 8, 15, 16, 19, 23, 24, 27, 31, 33, 54 ●Calculations  The minimum = 1  Q 1 = P 25 = 7  M = Q 2 = P 50 = (16 + 19) / 2 = 17.5  Q 3 = P 75 = 27  The maximum = 54 ●Compute the five-number summary for 1, 3, 4, 7, 8, 15, 16, 19, 23, 24, 27, 31, 33, 54 ●Calculations  The minimum = 1  Q 1 = P 25 = 7  M = Q 2 = P 50 = (16 + 19) / 2 = 17.5  Q 3 = P 75 = 27  The maximum = 54 ●The five-number summary is 1, 7, 17.5, 27, 54

Sullivan – Statistics: Informed Decisions Using Data – 2 nd Edition – Chapter 3 Introduction – Slide 29 of 3 Topic 16 – Boxplot ●The five-number summary can be illustrated using a graph called the boxplot ●An example of a (basic) boxplot is ●The middle box shows Q 1, Q 2, and Q 3 ●The horizontal lines (sometimes called “whiskers”) show the minimum and maximum

Sullivan – Statistics: Informed Decisions Using Data – 2 nd Edition – Chapter 3 Introduction – Slide 30 of 3 Topic 16 – Boxplot ●To draw a (basic) boxplot:  Calculate the five-number summary ●To draw a (basic) boxplot:  Calculate the five-number summary  Draw a horizontal line that will cover all the data from the minimum to the maximum ●To draw a (basic) boxplot:  Calculate the five-number summary  Draw a horizontal line that will cover all the data from the minimum to the maximum  Draw a box with the left edge at Q 1 and the right edge at Q 3 ●To draw a (basic) boxplot:  Calculate the five-number summary  Draw a horizontal line that will cover all the data from the minimum to the maximum  Draw a box with the left edge at Q 1 and the right edge at Q 3  Draw a line inside the box at M = Q 2 ●To draw a (basic) boxplot:  Calculate the five-number summary  Draw a horizontal line that will cover all the data from the minimum to the maximum  Draw a box with the left edge at Q 1 and the right edge at Q 3  Draw a line inside the box at M = Q 2  Draw a horizontal line from the Q 1 edge of the box to the minimum and one from the Q 3 edge of the box to the maximum

Sullivan – Statistics: Informed Decisions Using Data – 2 nd Edition – Chapter 3 Introduction – Slide 31 of 3 Topic 16 – Boxplot ●To draw a (basic) boxplot Voila! Draw the middle box Draw the minimum and maximum Draw in the median

Sullivan – Statistics: Informed Decisions Using Data – 2 nd Edition – Chapter 3 Introduction – Slide 32 of 3 Topic 16 ●Symmetric distributions DistributionBoxplot Q 1 is equally far from the median as Q 3 is The median line is in the center of the box Q1Q1 MQ3Q3 Q1Q1 MQ3Q3 DistributionBoxplot Q 1 is equally far from the median as Q 3 is The median line is in the center of the box The min is equally far from the median as the max is The left whisker is equal to the right whisker Q1Q1 MQ3Q3 MinMaxQ1Q1 MQ3Q3 MinMax

Sullivan – Statistics: Informed Decisions Using Data – 2 nd Edition – Chapter 3 Introduction – Slide 33 of 3 Topic 16 – Boxplot ●Skewed left distributions DistributionBoxplot Q 1 is further from the median than Q 3 is The median line is to the right of center in the box Q1Q1 MQ3Q3 Q1Q1 MQ3Q3 DistributionBoxplot Q 1 is further from the median than Q 3 is The median line is to the right of center in the box The min is further from the median than the max is The left whisker is longer than the right whisker MinMaxQ1Q1 MQ3Q3 MinMaxQ1Q1 MQ3Q3

Sullivan – Statistics: Informed Decisions Using Data – 2 nd Edition – Chapter 3 Introduction – Slide 34 of 3 Topic 16 – Boxplot ●Skewed right distributions DistributionBoxplot Q 1 is closer to the median than Q 3 is The median line is to the left of center in the box Q1Q1 MQ3Q3 Q1Q1 MQ3Q3 DistributionBoxplot Q 1 is closer to the median than Q 3 is The median line is to the left of center in the box The min is closer to the median than the max is The left whisker is shorter than the right whisker MinMaxQ1Q1 MQ3Q3 MinMaxQ1Q1 MQ3Q3

Sullivan – Statistics: Informed Decisions Using Data – 2 nd Edition – Chapter 3 Introduction – Slide 35 of 3 Topic 16 – Boxplot ●Comparing the “flight” with the “control” samples Center Spread

Download ppt "Sullivan – Statistics: Informed Decisions Using Data – 2 nd Edition – Chapter 3 Introduction – Slide 1 of 3 Topic 16 Numerically Summarizing Data- Averages."

Similar presentations