Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 5 Dustin Lueker. 2 Mode - Most frequent value. Notation: Subscripted variables n = # of units in the sample N = # of units in the population x.

Similar presentations


Presentation on theme: "Lecture 5 Dustin Lueker. 2 Mode - Most frequent value. Notation: Subscripted variables n = # of units in the sample N = # of units in the population x."— Presentation transcript:

1 Lecture 5 Dustin Lueker

2 2 Mode - Most frequent value. Notation: Subscripted variables n = # of units in the sample N = # of units in the population x = Variable to be measured x i = Measurement of the i th unit Mean - Arithmetic Average Median - Midpoint of the observations when they are arranged in increasing order STA 291 Fall 2009 Lecture 5

3  Measurement that falls in the middle of the ordered sample  When the sample size n is odd, there is a middle value ◦ It has the ordered index (n+1)/2  Ordered index is where that value falls when the sample is listed from smallest to largest  An index of 2 means the second smallest value ◦ Example  1.7, 4.6, 5.7, 6.1, 8.3 n=5, (n+1)/2=6/2=3, index = 3 Median = 3 rd smallest observation = 5.7 3STA 291 Fall 2009 Lecture 5

4  When the sample size n is even, average the two middle values ◦ Example  3, 5, 6, 9, n=4 (n+1)/2=5/2=2.5, Index = 2.5 Median = midpoint between 2 nd and 3 rd smallest observations = (5+6)/2 = 5.5 4STA 291 Fall 2009 Lecture 5

5  For skewed distributions, the median is often a more appropriate measure of central tendency than the mean  The median usually better describes a “typical value” when the sample distribution is highly skewed  Example ◦ Monthly income for five people 1,000 2,000 3,000 4,000 100,000 ◦ Median monthly income:  Does this better describe a “typical value” in the data set than the mean of 22,000? 5STA 291 Fall 2009 Lecture 5

6  Trimmed mean is a compromise between the median and mean ◦ Calculating the trimmed mean  Order the date from smallest to largest  Delete a selected number of values from each end of the ordered list  Find the mean of the remaining values ◦ The trimming percentage is the percentage of values that have been deleted from each end of the ordered list 6STA 291 Fall 2009 Lecture 5

7  Example: Highest Degree Completed 7 Highest DegreeFrequencyPercentage Not a high school graduate 38,01221.4 High school only 65,29136.8 Some college, no degree 33,19118.7 Associate, Bachelor, Master, Doctorate, Professional 41,12423.2 Total 177,618100 STA 291 Fall 2009 Lecture 5

8  n = 177,618  (n+1)/2 = 88,809.5  Median = midpoint between the 88809 th smallest and 88810 th smallest observations ◦ Both are in the category “High school only”  Mean wouldn’t make sense here since the variable is only ordinal  Median ◦ Can be used for interval data and for ordinal data ◦ Can not be used for nominal data because the observations can not be ordered on a scale 8STA 291 Fall 2009 Lecture 5

9  Mean ◦ Interval data with an approximately symmetric distribution  Median ◦ Interval data ◦ Ordinal data  Mean is sensitive to outliers, median is not 9STA 291 Fall 2009 Lecture 5

10 10 ObservationsMedianMean 1, 2, 3, 4, 533 1, 2, 3, 4, 100 3, 3, 3, 3, 3 1, 2, 3, 100, 100 STA 291 Fall 2009 Lecture 5

11  Symmetric distribution ◦ Mean = Median  Skewed distribution ◦ Mean lies more towards the direction which the distribution is skewed 11STA 291 Fall 2009 Lecture 5

12  Disadvantage ◦ Insensitive to changes within the lower or upper half of the data ◦ Example  1, 2, 3, 4, 5  1, 2, 3, 100, 100 ◦ Sometimes, the mean is more informative even when the distribution is skewed 12STA 291 Fall 2009 Lecture 5

13  Keeneland Sales STA 291 Fall 2009 Lecture 513

14  Value that occurs most frequently ◦ Does not need to be near the center of the distribution  Not really a measure of central tendency ◦ Can be used for all types of data (nominal, ordinal, interval)  Special Cases ◦ Data Set  {2, 2, 4, 5, 5, 6, 10, 11}  Mode = ◦ Data Set  {2, 6, 7, 10, 13}  Mode = 14STA 291 Fall 2009 Lecture 5

15  Mean ◦ Interval data with an approximately symmetric distribution  Median ◦ Interval or ordinal data  Mode ◦ All types of data 15STA 291 Fall 2009 Lecture 5

16  Mean is sensitive to outliers ◦ Median and mode are not  Why?  In general, the median is more appropriate for skewed data than the mean ◦ Why?  In some situations, the median may be too insensitive to changes in the data  The mode may not be unique 16STA 291 Fall 2009 Lecture 5

17  “How often do you read the newspaper?” 17 ResponseFrequency every day969 a few times a week 452 once a week261 less than once a week 196 Never76 TOTAL1954 Identify the mode Identify the median response STA 291 Fall 2009 Lecture 5

18  The p th percentile (L p ) is a number such that p% of the observations take values below it, and (100-p)% take values above it ◦ 50 th percentile = median ◦ 25 th percentile = lower quartile ◦ 75 th percentile = upper quartile  The index of L p ◦ (n+1)p/100 18STA 291 Fall 2009 Lecture 5

19  25 th percentile ◦ lower quartile ◦ Q1 ◦ (approximately) median of the observations below the median  75 th percentile ◦ upper quartile ◦ Q3 ◦ (approximately) median of the observations above the median 19STA 291 Fall 2009 Lecture 5

20  Find the 25 th percentile of this data set ◦ {3, 7, 12, 13, 15, 19, 24} 20STA 291 Fall 2009 Lecture 5

21  Use when the index is not a whole number  Want to go closest index lower then go the distance of the decimal towards the next number  If the index is found to be 5.4 you want to go to the 5 th value then add.4 of the value between the 5 th value and 6 th value ◦ In essence we are going to the 5.4 th value STA 291 Fall 2009 Lecture 521

22  Find the 40 th percentile of the same data set ◦ {3, 7, 12, 13, 15, 19, 24}  Must use interpolation 22STA 291 Fall 2009 Lecture 5

23  Five Number Summary ◦ Minimum ◦ Lower Quartile ◦ Median ◦ Upper Quartile ◦ Maximum  Example ◦ minimum=4 ◦ Q1=256 ◦ median=530 ◦ Q3=1105 ◦ maximum=320,000.  What does this suggest about the shape of the distribution? 23STA 291 Fall 2009 Lecture 5

24  The Interquartile Range (IQR) is the difference between upper and lower quartile ◦ IQR = Q3 – Q1 ◦ IQR = Range of values that contains the middle 50% of the data ◦ IQR increases as variability increases  Murder Rate Data ◦ Q1= 3.9 ◦ Q3 = 10.3 ◦ IQR = 24STA 291 Fall 2009 Lecture 5

25  Displays the five number summary (and more) graphical  Consists of a box that contains the central 50% of the distribution (from lower quartile to upper quartile)  A line within the box that marks the median,  And whiskers that extend to the maximum and minimum values  This is assuming there are no outliers in the data set 25STA 291 Fall 2009 Lecture 5

26  An observation is an outlier if it falls ◦ more than 1.5 IQR above the upper quartile or ◦ more than 1.5 IQR below the lower quartile 26STA 291 Fall 2009 Lecture 5

27  Whiskers only extend to the most extreme observations within 1.5 IQR beyond the quartiles  If an observation is an outlier, it is marked by an x, +, or some other identifier 27STA 291 Fall 2009 Lecture 5

28  Values  Min = 148  Q1 = 158  Median = Q2 = 162  Q3 = 182  Max = 204  Create a box plot 28STA 291 Fall 2009 Lecture 5

29  On right-skewed distributions, minimum, Q1, and median will be “bunched up”, while Q3 and the maximum will be farther away.  For left-skewed distributions, the “mirror” is true: the maximum, Q3, and the median will be relatively close compared to the corresponding distances to Q1 and the minimum.  Symmetric distributions? STA 291 Fall 2009 Lecture 529


Download ppt "Lecture 5 Dustin Lueker. 2 Mode - Most frequent value. Notation: Subscripted variables n = # of units in the sample N = # of units in the population x."

Similar presentations


Ads by Google