Download presentation
Presentation is loading. Please wait.
Published byKelly Simmons Modified over 8 years ago
1
Lecture 5 Dustin Lueker
2
2 Mode - Most frequent value. Notation: Subscripted variables n = # of units in the sample N = # of units in the population x = Variable to be measured x i = Measurement of the i th unit Mean - Arithmetic Average Median - Midpoint of the observations when they are arranged in increasing order STA 291 Fall 2009 Lecture 5
3
Measurement that falls in the middle of the ordered sample When the sample size n is odd, there is a middle value ◦ It has the ordered index (n+1)/2 Ordered index is where that value falls when the sample is listed from smallest to largest An index of 2 means the second smallest value ◦ Example 1.7, 4.6, 5.7, 6.1, 8.3 n=5, (n+1)/2=6/2=3, index = 3 Median = 3 rd smallest observation = 5.7 3STA 291 Fall 2009 Lecture 5
4
When the sample size n is even, average the two middle values ◦ Example 3, 5, 6, 9, n=4 (n+1)/2=5/2=2.5, Index = 2.5 Median = midpoint between 2 nd and 3 rd smallest observations = (5+6)/2 = 5.5 4STA 291 Fall 2009 Lecture 5
5
For skewed distributions, the median is often a more appropriate measure of central tendency than the mean The median usually better describes a “typical value” when the sample distribution is highly skewed Example ◦ Monthly income for five people 1,000 2,000 3,000 4,000 100,000 ◦ Median monthly income: Does this better describe a “typical value” in the data set than the mean of 22,000? 5STA 291 Fall 2009 Lecture 5
6
Trimmed mean is a compromise between the median and mean ◦ Calculating the trimmed mean Order the date from smallest to largest Delete a selected number of values from each end of the ordered list Find the mean of the remaining values ◦ The trimming percentage is the percentage of values that have been deleted from each end of the ordered list 6STA 291 Fall 2009 Lecture 5
7
Example: Highest Degree Completed 7 Highest DegreeFrequencyPercentage Not a high school graduate 38,01221.4 High school only 65,29136.8 Some college, no degree 33,19118.7 Associate, Bachelor, Master, Doctorate, Professional 41,12423.2 Total 177,618100 STA 291 Fall 2009 Lecture 5
8
n = 177,618 (n+1)/2 = 88,809.5 Median = midpoint between the 88809 th smallest and 88810 th smallest observations ◦ Both are in the category “High school only” Mean wouldn’t make sense here since the variable is only ordinal Median ◦ Can be used for interval data and for ordinal data ◦ Can not be used for nominal data because the observations can not be ordered on a scale 8STA 291 Fall 2009 Lecture 5
9
Mean ◦ Interval data with an approximately symmetric distribution Median ◦ Interval data ◦ Ordinal data Mean is sensitive to outliers, median is not 9STA 291 Fall 2009 Lecture 5
10
10 ObservationsMedianMean 1, 2, 3, 4, 533 1, 2, 3, 4, 100 3, 3, 3, 3, 3 1, 2, 3, 100, 100 STA 291 Fall 2009 Lecture 5
11
Symmetric distribution ◦ Mean = Median Skewed distribution ◦ Mean lies more towards the direction which the distribution is skewed 11STA 291 Fall 2009 Lecture 5
12
Disadvantage ◦ Insensitive to changes within the lower or upper half of the data ◦ Example 1, 2, 3, 4, 5 1, 2, 3, 100, 100 ◦ Sometimes, the mean is more informative even when the distribution is skewed 12STA 291 Fall 2009 Lecture 5
13
Keeneland Sales STA 291 Fall 2009 Lecture 513
14
Value that occurs most frequently ◦ Does not need to be near the center of the distribution Not really a measure of central tendency ◦ Can be used for all types of data (nominal, ordinal, interval) Special Cases ◦ Data Set {2, 2, 4, 5, 5, 6, 10, 11} Mode = ◦ Data Set {2, 6, 7, 10, 13} Mode = 14STA 291 Fall 2009 Lecture 5
15
Mean ◦ Interval data with an approximately symmetric distribution Median ◦ Interval or ordinal data Mode ◦ All types of data 15STA 291 Fall 2009 Lecture 5
16
Mean is sensitive to outliers ◦ Median and mode are not Why? In general, the median is more appropriate for skewed data than the mean ◦ Why? In some situations, the median may be too insensitive to changes in the data The mode may not be unique 16STA 291 Fall 2009 Lecture 5
17
“How often do you read the newspaper?” 17 ResponseFrequency every day969 a few times a week 452 once a week261 less than once a week 196 Never76 TOTAL1954 Identify the mode Identify the median response STA 291 Fall 2009 Lecture 5
18
The p th percentile (L p ) is a number such that p% of the observations take values below it, and (100-p)% take values above it ◦ 50 th percentile = median ◦ 25 th percentile = lower quartile ◦ 75 th percentile = upper quartile The index of L p ◦ (n+1)p/100 18STA 291 Fall 2009 Lecture 5
19
25 th percentile ◦ lower quartile ◦ Q1 ◦ (approximately) median of the observations below the median 75 th percentile ◦ upper quartile ◦ Q3 ◦ (approximately) median of the observations above the median 19STA 291 Fall 2009 Lecture 5
20
Find the 25 th percentile of this data set ◦ {3, 7, 12, 13, 15, 19, 24} 20STA 291 Fall 2009 Lecture 5
21
Use when the index is not a whole number Want to go closest index lower then go the distance of the decimal towards the next number If the index is found to be 5.4 you want to go to the 5 th value then add.4 of the value between the 5 th value and 6 th value ◦ In essence we are going to the 5.4 th value STA 291 Fall 2009 Lecture 521
22
Find the 40 th percentile of the same data set ◦ {3, 7, 12, 13, 15, 19, 24} Must use interpolation 22STA 291 Fall 2009 Lecture 5
23
Five Number Summary ◦ Minimum ◦ Lower Quartile ◦ Median ◦ Upper Quartile ◦ Maximum Example ◦ minimum=4 ◦ Q1=256 ◦ median=530 ◦ Q3=1105 ◦ maximum=320,000. What does this suggest about the shape of the distribution? 23STA 291 Fall 2009 Lecture 5
24
The Interquartile Range (IQR) is the difference between upper and lower quartile ◦ IQR = Q3 – Q1 ◦ IQR = Range of values that contains the middle 50% of the data ◦ IQR increases as variability increases Murder Rate Data ◦ Q1= 3.9 ◦ Q3 = 10.3 ◦ IQR = 24STA 291 Fall 2009 Lecture 5
25
Displays the five number summary (and more) graphical Consists of a box that contains the central 50% of the distribution (from lower quartile to upper quartile) A line within the box that marks the median, And whiskers that extend to the maximum and minimum values This is assuming there are no outliers in the data set 25STA 291 Fall 2009 Lecture 5
26
An observation is an outlier if it falls ◦ more than 1.5 IQR above the upper quartile or ◦ more than 1.5 IQR below the lower quartile 26STA 291 Fall 2009 Lecture 5
27
Whiskers only extend to the most extreme observations within 1.5 IQR beyond the quartiles If an observation is an outlier, it is marked by an x, +, or some other identifier 27STA 291 Fall 2009 Lecture 5
28
Values Min = 148 Q1 = 158 Median = Q2 = 162 Q3 = 182 Max = 204 Create a box plot 28STA 291 Fall 2009 Lecture 5
29
On right-skewed distributions, minimum, Q1, and median will be “bunched up”, while Q3 and the maximum will be farther away. For left-skewed distributions, the “mirror” is true: the maximum, Q3, and the median will be relatively close compared to the corresponding distances to Q1 and the minimum. Symmetric distributions? STA 291 Fall 2009 Lecture 529
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.