Lecture 5 Dustin Lueker. 2 Mode - Most frequent value. Notation: Subscripted variables n = # of units in the sample N = # of units in the population x.

Slides:



Advertisements
Similar presentations
Chapter 2 Exploring Data with Graphs and Numerical Summaries
Advertisements

Descriptive Measures MARE 250 Dr. Jason Turner.
Measures of Dispersion
LECTURE 7 THURSDAY, 11 FEBRUARY STA291 Fall 2008.
Descriptive Statistics
Chapter 3 Describing Data Using Numerical Measures
Sullivan – Statistics: Informed Decisions Using Data – 2 nd Edition – Chapter 3 Introduction – Slide 1 of 3 Topic 16 Numerically Summarizing Data- Averages.
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Looking at data: distributions - Describing distributions with numbers
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 3-1 Introduction to Statistics Chapter 3 Using Statistics to summarize.
QBM117 Business Statistics
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 3 Describing Data Using Numerical Measures.
Lecture 4 Dustin Lueker.  The population distribution for a continuous variable is usually represented by a smooth curve ◦ Like a histogram that gets.
Describing distributions with numbers
LECTURE 6 TUESDAY, 10 FEBRUARY 2008 STA291. Administrative Suggested problems from the textbook (not graded): 4.2, 4.3, and 4.4 Check CengageNow for second.
LECTURE 12 Tuesday, 6 October STA291 Fall Five-Number Summary (Review) 2 Maximum, Upper Quartile, Median, Lower Quartile, Minimum Statistical Software.
CHAPTER 2: Describing Distributions with Numbers ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.
STA Lecture 111 STA 291 Lecture 11 Describing Quantitative Data – Measures of Central Location Examples of mean and median –Review of Chapter 5.
STAT 280: Elementary Applied Statistics Describing Data Using Numerical Measures.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter.
Lecture 3 Describing Data Using Numerical Measures.
Lecture 5 Dustin Lueker. 2 Mode - Most frequent value. Notation: Subscripted variables n = # of units in the sample N = # of units in the population x.
INVESTIGATION 1.
THURSDAY, 24 SEPTEMBER 2009 STA291. Announcement Exam 1: September 30 th at 5pm to 7pm. Location MEH, Memorial Auditoriam. The make-up will be at 7:30pm.
Chap 3-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 3 Describing Data Using Numerical.
Lecture 2 Dustin Lueker.  Center of the data ◦ Mean ◦ Median ◦ Mode  Dispersion of the data  Sometimes referred to as spread ◦ Variance, Standard deviation.
LECTURE CENTRAL TENDENCIES & DISPERSION POSTGRADUATE METHODOLOGY COURSE.
Lecture 4 Dustin Lueker.  The population distribution for a continuous variable is usually represented by a smooth curve ◦ Like a histogram that gets.
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Describing Data: One Quantitative Variable SECTIONS 2.2, 2.3 One quantitative.
Summary Statistics: Measures of Location and Dispersion.
Chapter 3 Averages and Variation Understanding Basic Statistics Fifth Edition By Brase and Brase Prepared by Jon Booze.
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-1 Business Statistics, 4e by Ken Black Chapter 3 Descriptive Statistics.
Numerical descriptions of distributions
1 STAT 500 – Statistics for Managers STAT 500 Statistics for Managers.
Descriptive Statistics(Summary and Variability measures)
TUESDAY, 22 SEPTEMBER 2009 STA291. Exam 1: September 30 th at 5pm to 7pm. Location MEH, Memorial Auditoriam. The make-up will be at 7:30pm to 9:30pm at.
Describe Quantitative Data with Numbers. Mean The most common measure of center is the ordinary arithmetic average, or mean.
CHAPTER 1 Exploring Data
Chapter 3 Describing Data Using Numerical Measures
CHAPTER 2: Describing Distributions with Numbers
CHAPTER 2: Describing Distributions with Numbers
Chapter 6 ENGR 201: Statistics for Engineers
Averages and Variation
Description of Data (Summary and Variability measures)
Chapter 3 Describing Data Using Numerical Measures
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Descriptive Statistics
Box and Whisker Plots Algebra 2.
Please take out Sec HW It is worth 20 points (2 pts
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
STA 291 Summer 2008 Lecture 4 Dustin Lueker.
CHAPTER 1 Exploring Data
Measures of Central Tendency
CHAPTER 2: Describing Distributions with Numbers
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
STA 291 Spring 2008 Lecture 4 Dustin Lueker.
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Unit 2: Descriptive Statistics
Presentation transcript:

Lecture 5 Dustin Lueker

2 Mode - Most frequent value. Notation: Subscripted variables n = # of units in the sample N = # of units in the population x = Variable to be measured x i = Measurement of the i th unit Mean - Arithmetic Average Median - Midpoint of the observations when they are arranged in increasing order STA 291 Fall 2009 Lecture 5

 Measurement that falls in the middle of the ordered sample  When the sample size n is odd, there is a middle value ◦ It has the ordered index (n+1)/2  Ordered index is where that value falls when the sample is listed from smallest to largest  An index of 2 means the second smallest value ◦ Example  1.7, 4.6, 5.7, 6.1, 8.3 n=5, (n+1)/2=6/2=3, index = 3 Median = 3 rd smallest observation = 5.7 3STA 291 Fall 2009 Lecture 5

 When the sample size n is even, average the two middle values ◦ Example  3, 5, 6, 9, n=4 (n+1)/2=5/2=2.5, Index = 2.5 Median = midpoint between 2 nd and 3 rd smallest observations = (5+6)/2 = 5.5 4STA 291 Fall 2009 Lecture 5

 For skewed distributions, the median is often a more appropriate measure of central tendency than the mean  The median usually better describes a “typical value” when the sample distribution is highly skewed  Example ◦ Monthly income for five people 1,000 2,000 3,000 4, ,000 ◦ Median monthly income:  Does this better describe a “typical value” in the data set than the mean of 22,000? 5STA 291 Fall 2009 Lecture 5

 Trimmed mean is a compromise between the median and mean ◦ Calculating the trimmed mean  Order the date from smallest to largest  Delete a selected number of values from each end of the ordered list  Find the mean of the remaining values ◦ The trimming percentage is the percentage of values that have been deleted from each end of the ordered list 6STA 291 Fall 2009 Lecture 5

 Example: Highest Degree Completed 7 Highest DegreeFrequencyPercentage Not a high school graduate 38, High school only 65, Some college, no degree 33, Associate, Bachelor, Master, Doctorate, Professional 41, Total 177, STA 291 Fall 2009 Lecture 5

 n = 177,618  (n+1)/2 = 88,809.5  Median = midpoint between the th smallest and th smallest observations ◦ Both are in the category “High school only”  Mean wouldn’t make sense here since the variable is only ordinal  Median ◦ Can be used for interval data and for ordinal data ◦ Can not be used for nominal data because the observations can not be ordered on a scale 8STA 291 Fall 2009 Lecture 5

 Mean ◦ Interval data with an approximately symmetric distribution  Median ◦ Interval data ◦ Ordinal data  Mean is sensitive to outliers, median is not 9STA 291 Fall 2009 Lecture 5

10 ObservationsMedianMean 1, 2, 3, 4, 533 1, 2, 3, 4, 100 3, 3, 3, 3, 3 1, 2, 3, 100, 100 STA 291 Fall 2009 Lecture 5

 Symmetric distribution ◦ Mean = Median  Skewed distribution ◦ Mean lies more towards the direction which the distribution is skewed 11STA 291 Fall 2009 Lecture 5

 Disadvantage ◦ Insensitive to changes within the lower or upper half of the data ◦ Example  1, 2, 3, 4, 5  1, 2, 3, 100, 100 ◦ Sometimes, the mean is more informative even when the distribution is skewed 12STA 291 Fall 2009 Lecture 5

 Keeneland Sales STA 291 Fall 2009 Lecture 513

 Value that occurs most frequently ◦ Does not need to be near the center of the distribution  Not really a measure of central tendency ◦ Can be used for all types of data (nominal, ordinal, interval)  Special Cases ◦ Data Set  {2, 2, 4, 5, 5, 6, 10, 11}  Mode = ◦ Data Set  {2, 6, 7, 10, 13}  Mode = 14STA 291 Fall 2009 Lecture 5

 Mean ◦ Interval data with an approximately symmetric distribution  Median ◦ Interval or ordinal data  Mode ◦ All types of data 15STA 291 Fall 2009 Lecture 5

 Mean is sensitive to outliers ◦ Median and mode are not  Why?  In general, the median is more appropriate for skewed data than the mean ◦ Why?  In some situations, the median may be too insensitive to changes in the data  The mode may not be unique 16STA 291 Fall 2009 Lecture 5

 “How often do you read the newspaper?” 17 ResponseFrequency every day969 a few times a week 452 once a week261 less than once a week 196 Never76 TOTAL1954 Identify the mode Identify the median response STA 291 Fall 2009 Lecture 5

 The p th percentile (L p ) is a number such that p% of the observations take values below it, and (100-p)% take values above it ◦ 50 th percentile = median ◦ 25 th percentile = lower quartile ◦ 75 th percentile = upper quartile  The index of L p ◦ (n+1)p/100 18STA 291 Fall 2009 Lecture 5

 25 th percentile ◦ lower quartile ◦ Q1 ◦ (approximately) median of the observations below the median  75 th percentile ◦ upper quartile ◦ Q3 ◦ (approximately) median of the observations above the median 19STA 291 Fall 2009 Lecture 5

 Find the 25 th percentile of this data set ◦ {3, 7, 12, 13, 15, 19, 24} 20STA 291 Fall 2009 Lecture 5

 Use when the index is not a whole number  Want to go closest index lower then go the distance of the decimal towards the next number  If the index is found to be 5.4 you want to go to the 5 th value then add.4 of the value between the 5 th value and 6 th value ◦ In essence we are going to the 5.4 th value STA 291 Fall 2009 Lecture 521

 Find the 40 th percentile of the same data set ◦ {3, 7, 12, 13, 15, 19, 24}  Must use interpolation 22STA 291 Fall 2009 Lecture 5

 Five Number Summary ◦ Minimum ◦ Lower Quartile ◦ Median ◦ Upper Quartile ◦ Maximum  Example ◦ minimum=4 ◦ Q1=256 ◦ median=530 ◦ Q3=1105 ◦ maximum=320,000.  What does this suggest about the shape of the distribution? 23STA 291 Fall 2009 Lecture 5

 The Interquartile Range (IQR) is the difference between upper and lower quartile ◦ IQR = Q3 – Q1 ◦ IQR = Range of values that contains the middle 50% of the data ◦ IQR increases as variability increases  Murder Rate Data ◦ Q1= 3.9 ◦ Q3 = 10.3 ◦ IQR = 24STA 291 Fall 2009 Lecture 5

 Displays the five number summary (and more) graphical  Consists of a box that contains the central 50% of the distribution (from lower quartile to upper quartile)  A line within the box that marks the median,  And whiskers that extend to the maximum and minimum values  This is assuming there are no outliers in the data set 25STA 291 Fall 2009 Lecture 5

 An observation is an outlier if it falls ◦ more than 1.5 IQR above the upper quartile or ◦ more than 1.5 IQR below the lower quartile 26STA 291 Fall 2009 Lecture 5

 Whiskers only extend to the most extreme observations within 1.5 IQR beyond the quartiles  If an observation is an outlier, it is marked by an x, +, or some other identifier 27STA 291 Fall 2009 Lecture 5

 Values  Min = 148  Q1 = 158  Median = Q2 = 162  Q3 = 182  Max = 204  Create a box plot 28STA 291 Fall 2009 Lecture 5

 On right-skewed distributions, minimum, Q1, and median will be “bunched up”, while Q3 and the maximum will be farther away.  For left-skewed distributions, the “mirror” is true: the maximum, Q3, and the median will be relatively close compared to the corresponding distances to Q1 and the minimum.  Symmetric distributions? STA 291 Fall 2009 Lecture 529