 # Chapter 3 Averages and Variations

## Presentation on theme: "Chapter 3 Averages and Variations"— Presentation transcript:

Chapter 3 Averages and Variations
3.1 Measures of Central Tendency

Mode, Median and Mean What kind of data will we be able to compute mode, median and mean? Quantitative data can have a mode, median and mean. Qualitative data can have a mode.

Mode The value that occurs most frequently is the mode. Some books describe the mode as the “hump” or local high point in a histogram, which does imply frequency of an answer.

Median The median of a data set is the middle data value.
To find, order the data from smallest to largest, and the data set in the middle (for a data set of n, the middle position is ) is the median. Does anyone detect a potential problem?

Mean You are used to an “average” of the test. The technical term is the mean.

Mean You are used to an “average” of the test. The technical term is the mean. Trimmed mean is a term for a mean where a percentage of the data values are disregarded. A 5% mean is one where 5% of top and 5% of bottom values are thrown out before computing the mean.

Pulse Data Lets find the mode, the median and the mean of the pulse data from the first day of class. We just found the population mean (μ) rather than the sample mean (). What is the difference then between μ and ?

Weighted Averages Final Exams are computed in as weighted averages. How do they do that???

Weighted Averages Final Exams are computed in as weighted averages. How do they do that??? That is, multiply the data value by its weighting, add each of those, then divide by the sum of the weighting (typically 1)

3.2 Measures of Variation

While knowing the mean is important
There is other information from data that you can measure. These tell you about the spread of the data. Range – difference between largest and smallest value of a data distribution.

Variance Variance = measure of how data tends to spread around an expected value (the mean) Each data point = x Mean =  Deviation = x –  Sample size = n Variance = s2 Standard Deviation = s

Variance (cont) Defining Formula

Variance (cont) Defining Formula Computation Formula

Variance (cont) To find standard deviation, just square root the variance. The computational formula tends to be a little easier to do by hand, but we will practice both. These two formulas ARE the same.

Variance (cont) Lets find the variance and the standard deviation of the pulse data, using both formulas.

Variance (cont) If an entire population is used, instead of a sample, the notation is different but the methods are the same Each data point = x Mean = µ Deviation = x – µ Sample size = N Variance = σ 2 Standard Deviation = σ

Variance (cont) Defining Formula

Variance (cont) Coefficient of Variance (CV) expresses standard deviation as a percentage of the sample/population mean.

Variance (cont) Coefficient of Variance (CV) expresses standard deviation as a percentage of the sample/population mean. Sample Population

Variance (cont) Chebyshev’s Theorem
For any data set, the proportion that lies within k standard deviations on either side of the mean is at least So 75% lies between 2 standard deviations, 88.9% between 3 standard deviations, etc.

3.3 Mean/Standard Deviation
What if you use grouped data

Grouped Data Lots of data = TEDIOUS, whether you have a calculator or not… If you generally approximate the mean and standard deviation, that sometimes is enough To deal with this, you actually begin with a frequency table (remember Histograms?

Grouped Data (cont) Make a frequency table
Find the midpoint of each class = x Compute each class frequency = f Total number of entries = n

Grouped Data (cont) Make a frequency table
Find the midpoint of each class = x Compute each class frequency = f Total number of entries = n

Grouped Data (cont) Defining Formula Computation Formula

Grouped Data (cont) Essentially, by using the midpoint and the frequency, you use a representation for ALL data values in that class, without typing in every data value. It will be a little off, but again, if the data set is huge it isn’t a bad way to approach the problem.

3.4 Percentiles Box/Whiskers Plots

Percentiles Baby Calculator Children’s BMI
A percentile ranking allows one to know where the particular data value falls in relation to the entire population.

Percentiles (cont) The Pth percentile (1 ≤ P ≤ 99) is a value so that P% of the data falls at or below it (and 100 – P % falls at/above) 60th Percentile does NOT mean 60% score – it means that 60% of scores fall at or below that position… 60th percentile could be 80% Where have you seen percentiles?

Percentiles (cont) Quartiles – special percentiles used frequently. The data is divided into fourths, called Quartiles. 2nd Quartile – Median 1st Quartile – Median below (exclude Q2) 3rd Quartile – Median above (exclude Q2) Interquartile Range (IQR) = Q3 – Q1

Percentiles (cont) Lets find the quartiles for following Math class sizes in the 9th grade. 10, 11, 12, 12, 14, 15, 16, 17, 19, 20 1st Q = 12 3rd Q = 17 Median = 14.5 IQR = 17 – 12 = 5

Percentiles (cont) Lets find the quartile for the pulse data
Why are these values significant? These are needed to make Box and Whiskers Plots

Box and Whiskers Plots

Box and Whiskers Plots (cont)
The five number summary is used to make a box and whisker plot. Lets make a box and whiskers plot for the class size data. Lowest value, Q1, Median, Q3, Highest Value

10 12 14 16 18 20 Highest Value Q2 Median Q1 Lowest Value

Box and Whiskers Plots (cont)
Lets make a box and whiskers for the pulse data Outliers – data > Q IQR data < Q1 – 1.5 IQR

Resources http://www.statcan.ca/english/edu/power/ch12/plots.htm