Presentation on theme: "Histograms! Histograms group data that is close together into “classes” and shows how many or what percentage of the data fall into each “class”. It is."— Presentation transcript:
Histograms! Histograms group data that is close together into “classes” and shows how many or what percentage of the data fall into each “class”. It is important that no data value belongs to more than one “class” so it is important that we clearly label the classes in our histogram on the horizontal axis. The vertical axis must indicate if we are showing counts or percentages and scaled appropriately.
How to make a histogram Divide the range of your data into equal sized groups called classes Define the range of each class Count how many values fall into each class (or find the percentage in each class Each bar should be equal width and the height reflects the count or percentage Do not skip classes with no values in them. The data ranges from 1.2 to 27.2 so we’ll make our classes be 5 wide which will give us 6 classes. We will include the bottom value in each class: 0 to <5 5 to <10 10 to <15 15 to <20 20 to <25 25 to <30
Class Size in a Histogram Just like stemplots, we want to find the right number of classes to show a good picture of the data. ◦ Too few classes result in a “skyscraper” effect where all the data lies in just a few classes. ◦ Too many classes will “flatten” the data and give many short bars in the histogram. ◦ Use your judgment as to how many classes are needed to give a clear picture of the distribution of the data.
Warnings About Histograms Don’t confuse Histograms with Bar Graphs Don’t use counts in a frequency table as data Use percents instead of counts when comparing distributions with a different number of observations. Just because a graph looks nice doesn’t make it a meaningful display of data
Histograms on Calculators Grab your TI-83+/84+
Describing Quantitative Data with Numbers Section 1.3
The Mean The mean is the sum of all the values in the data divided by the number of observations (n) in the data set. How it’s written on the formula sheet
Mean as the “Balancing Point” The mean of a distribution is sometimes thought of as the “balancing point” of the distribution. ◦ The mean tells us how large each observation would be if the total were split equally among all the observations
Ruler Activity “Hey Brian… find a better act-tiv-i-tee”
The Median (M) The median is the midpoint of the data with half the data values below it and half the values above. It is also referred to as the 50 th percentile. How to find the median ◦ Arrange the values from lowest to highest ◦ Find the value with half the data above and below it Middle number if odd number of observations Average of the middle two numbers if there are an even number of observations
Comparing Mean and Median The mean and median of a roughly symmetric distribution are close together. ◦ If the distribution is exactly symmetric the mean and median will be equal. In a skewed distribution, the mean will be further toward the skewed tail than the median ◦ Mean > Median indicates a right skewed distr. ◦ Mean < Median indicates a left skewed distr.
Choosing the Best Measure of Center If the data is roughly symmetric, the mean is the preferable measure of center. If the data is skewed the mean will be distorted by the extreme values in the data so the Median is a more accurate portrayal of the “typical” value and should be used as the measure of center.
Measuring Spread: IQR The 1 st Quartile (Q 1 ) is the median of the lower half of the data not including the median. The 3 rd Quartile (Q 3 )is the median of the upper half of the data not including the median. The interquartile range (IQR) is a measure of center that is used with the Median and is found by: IQR = Q 3 - Q 1 This gives us the spread for the middle 50% of the data so a single extreme value won’t have much of an effect on it like it would for the range.
The interquartile range (IQR) would be: IQR= 30 – 10 = 20 *remember IQR always goes with the Median as the measure of center and is used for skewed data.
Identifying Outliers – 1.5xIQR Rule If an observation falls more than 1.5 times the interquartile range ABOVE THE Q 3, we call it an outlier. If an observation falls more than 1.5 times the interquartile range BELOW THE Q 1, we call it an outlier. It is an outlier if: ◦ Data value > Q 3 + 1.5(IQR) ◦ Data value < Q1 - 1.5(IQR)