STA 291 Spring 2008 Lecture 5 Dustin Lueker.

Slides:



Advertisements
Similar presentations
Descriptive Measures MARE 250 Dr. Jason Turner.
Advertisements

Measures of Dispersion
Numerically Summarizing Data
LECTURE 7 THURSDAY, 11 FEBRUARY STA291 Fall 2008.
Descriptive Statistics
B a c kn e x t h o m e Parameters and Statistics statistic A statistic is a descriptive measure computed from a sample of data. parameter A parameter is.
Lecture 4 Dustin Lueker.  The population distribution for a continuous variable is usually represented by a smooth curve ◦ Like a histogram that gets.
Describing Data: Numerical
LECTURE 12 Tuesday, 6 October STA291 Fall Five-Number Summary (Review) 2 Maximum, Upper Quartile, Median, Lower Quartile, Minimum Statistical Software.
STA Lecture 111 STA 291 Lecture 11 Describing Quantitative Data – Measures of Central Location Examples of mean and median –Review of Chapter 5.
Numerical Descriptive Techniques
1 Measure of Center  Measure of Center the value at the center or middle of a data set 1.Mean 2.Median 3.Mode 4.Midrange (rarely used)
LECTURE 8 Thursday, 19 February STA291 Fall 2008.
© Copyright McGraw-Hill CHAPTER 3 Data Description.
STA Lecture 131 STA 291 Lecture 13, Chap. 6 Describing Quantitative Data – Measures of Central Location – Measures of Variability (spread)
STAT 280: Elementary Applied Statistics Describing Data Using Numerical Measures.
What is variability in data? Measuring how much the group as a whole deviates from the center. Gives you an indication of what is the spread of the data.
Describing distributions with numbers
Lecture 5 Dustin Lueker. 2 Mode - Most frequent value. Notation: Subscripted variables n = # of units in the sample N = # of units in the population x.
1 Measure of Center  Measure of Center the value at the center or middle of a data set 1.Mean 2.Median 3.Mode 4.Midrange (rarely used)
INVESTIGATION 1.
Lecture 2 Dustin Lueker.  Center of the data ◦ Mean ◦ Median ◦ Mode  Dispersion of the data  Sometimes referred to as spread ◦ Variance, Standard deviation.
Lecture 4 Dustin Lueker.  The population distribution for a continuous variable is usually represented by a smooth curve ◦ Like a histogram that gets.
Summary Statistics: Measures of Location and Dispersion.
Lecture 5 Dustin Lueker. 2 Mode - Most frequent value. Notation: Subscripted variables n = # of units in the sample N = # of units in the population x.
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-1 Business Statistics, 4e by Ken Black Chapter 3 Descriptive Statistics.
Statistics topics from both Math 1 and Math 2, both featured on the GHSGT.
LIS 570 Summarising and presenting data - Univariate analysis.
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
CHAPTER 1 Exploring Data
Descriptive Statistics ( )
Statistics for Managers Using Microsoft® Excel 5th Edition
Business and Economics 6th Edition
Descriptive Statistics
Chapter 3 Describing Data Using Numerical Measures
Chapter 6 ENGR 201: Statistics for Engineers
CHAPTER 3 Data Description 9/17/2018 Kasturiarachi.
Averages and Variation
Midrange (rarely used)
NUMERICAL DESCRIPTIVE MEASURES
Description of Data (Summary and Variability measures)
Chapter 3 Describing Data Using Numerical Measures
CHAPTER 1 Exploring Data
Numerical Descriptive Measures
CHAPTER 1 Exploring Data
Descriptive Statistics
Please take out Sec HW It is worth 20 points (2 pts
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
Describing Data with Numerical Measures
Numerical Descriptive Measures
STA 291 Summer 2008 Lecture 4 Dustin Lueker.
CHAPTER 1 Exploring Data
Describing Quantitative Data with Numbers
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
MBA 510 Lecture 2 Spring 2013 Dr. Tonya Balan 4/20/2019.
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
STA 291 Spring 2008 Lecture 4 Dustin Lueker.
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Business and Economics 7th Edition
Numerical Descriptive Measures
Presentation transcript:

STA 291 Spring 2008 Lecture 5 Dustin Lueker

Measures of Central Tendency Mean - Arithmetic Average Median - Midpoint of the observations when they are arranged in increasing order Notation: Subscripted variables n = # of units in the sample N = # of units in the population x = Variable to be measured xi = Measurement of the ith unit Mode - Most frequent value. STA 291 Spring 2008 Lecture 5

Percentiles The pth percentile (Lp) is a number such that p% of the observations take values below it, and (100-p)% take values above it 50th percentile = median 25th percentile = lower quartile 75th percentile = upper quartile The index of Lp (n+1)p/100 STA 291 Spring 2008 Lecture 5

Quartiles 25th percentile 75th percentile lower quartile Q1 (approximately) median of the observations below the median 75th percentile upper quartile Q3 (approximately) median of the observations above the median STA 291 Spring 2008 Lecture 5

Example Find the 25th percentile of this data set {3, 7, 12, 13, 15, 19, 24} STA 291 Spring 2008 Lecture 5

Interpolation Use when the index is not a whole number Want to go closest index lower then go the distance of the decimal towards the next number If the index is found to be 5.4 you want to go to the 5th value then add .4 of the value between the 5th value and 6th value In essence we are going to the 5.4th value STA 291 Spring 2008 Lecture 5

Example Find the 40th percentile of the same data set {3, 7, 12, 13, 15, 19, 24} Must use interpolation STA 291 Spring 2008 Lecture 5

Data Summary Five Number Summary Example Minimum Lower Quartile Median Upper Quartile Maximum Example minimum=4 Q1=256 median=530 Q3=1105 maximum=320,000. What does this suggest about the shape of the distribution? STA 291 Spring 2008 Lecture 5

Interquartile Range (IQR) The Interquartile Range (IQR) is the difference between upper and lower quartile IQR = Q3 – Q1 IQR = Range of values that contains the middle 50% of the data IQR increases as variability increases Murder Rate Data Q1= 3.9 Q3 = 10.3 IQR = STA 291 Spring 2008 Lecture 5

Box Plot Displays the five number summary (and more) graphical Consists of a box that contains the central 50% of the distribution (from lower quartile to upper quartile) A line within the box that marks the median, And whiskers that extend to the maximum and minimum values This is assuming there are no outliers in the data set STA 291 Spring 2008 Lecture 5

Outliers An observation is an outlier if it falls more than 1.5 IQR above the upper quartile or more than 1.5 IQR below the lower quartile STA 291 Spring 2008 Lecture 5

Box Plot Whiskers only extend to the most extreme observations within 1.5 IQR beyond the quartiles If an observation is an outlier, it is marked by an x, +, or some other identifier STA 291 Spring 2008 Lecture 5

Example Values Create a box plot Min = 148 Q1 = 158 Median = Q2 = 162 Max = 204 Create a box plot STA 291 Spring 2008 Lecture 5

Mode Value that occurs most frequently Special Cases Does not need to be near the center of the distribution Not really a measure of central tendency Can be used for all types of data (nominal, ordinal, interval) Special Cases Data Set {2, 2, 4, 5, 5, 6, 10, 11} Mode = {2, 6, 7, 10, 13} STA 291 Spring 2008 Lecture 5

Mean vs. Median vs. Mode Mean Median Mode Interval data with an approximately symmetric distribution Median Interval or ordinal data Mode All types of data STA 291 Spring 2008 Lecture 5

Mean vs. Median vs. Mode Mean is sensitive to outliers Median and mode are not Why? In general, the median is more appropriate for skewed data than the mean In some situations, the median may be too insensitive to changes in the data The mode may not be unique STA 291 Spring 2008 Lecture 5

Example “How often do you read the newspaper?” Identify the mode Response Frequency every day 969 a few times a week 452 once a week 261 less than once a week 196 Never 76 TOTAL 1954 Identify the mode Identify the median response STA 291 Spring 2008 Lecture 5

Measures of Variation Statistics that describe variability Two distributions may have the same mean and/or median but different variability Mean and Median only describe a typical value, but not the spread of the data Range Variance Standard Deviation Interquartile Range All of these can be computed for the sample or population STA 291 Spring 2008 Lecture 5

Range Difference between the largest and smallest observation Very much affected by outliers A misrecorded observation may lead to an outlier, and affect the range The range does not always reveal different variation about the mean STA 291 Spring 2008 Lecture 5

Example Sample 1 Sample 2 Smallest Observation: 112 Largest Observation: 797 Range = Sample 2 Smallest Observation: 15033 Largest Observation: 16125 STA 291 Spring 2008 Lecture 5

Variance and Standard Deviation Sample Variance Standard Deviation Population STA 291 Spring 2008 Lecture 5

Deviations The deviation of the ith observation xi from the sample mean is the difference between them, Sum of all deviations is zero Therefore, we use either the sum of the absolute deviations or the sum of the squared deviations as a measure of variation STA 291 Spring 2008 Lecture 5

Sample Variance Variance of n observations is the sum of the squared deviations, divided by n-1 STA 291 Spring 2008 Lecture 5

Example Observation Mean Deviation Squared 1 3 4 7 10 Sum of the Squared Deviations n-1 Sum of the Squared Deviations / (n-1) STA 291 Spring 2008 Lecture 5

Interpreting Variance About the average of the squared deviations “average squared distance from the mean” Unit Square of the unit for the original data Difficult to interpret Solution Take the square root of the variance, and the unit is the same as for the original data Standard Deviation STA 291 Spring 2008 Lecture 5

Properties of Standard Deviation s = 0 only when all observations are the same If data is collected for the whole population instead of a sample, then n-1 is replaced by n s is sensitive to outliers STA 291 Spring 2008 Lecture 5

Empirical Rule If the data is approximately symmetric and bell-shaped then About 68% of the observations are within one standard deviation from the mean About 95% of the observations are within two standard deviations from the mean About 99.7% of the observations are within three standard deviations from the mean STA 291 Spring 2008 Lecture 5

Example SAT scores are scaled so that they have an approximate bell-shaped distribution with a mean of 500 and standard deviation of 100 About 68% of the scores are between About 95% of the scores are between If you have a score above 700, you are in the top % What percentile would this be? STA 291 Spring 2008 Lecture 5

Example According to the National Association of Home Builders, the U.S. nationwide median selling price of homes sold in 1995 was $118,000 Would you expect the mean to be larger, smaller, or equal to $118,000? Which of the following is the most plausible value for the standard deviation? (a) –15,000, (b) 1,000, (c) 45,000, (d) 1,000,000 STA 291 Spring 2008 Lecture 5

Population Parameters and Sample Statistics Population mean and population standard deviation are denoted by the Greek letters μ (mu) and σ (sigma) They are unknown constants that we would like to estimate Sample mean and sample standard deviation are denoted by and s They are random variables, because their values vary according to the random sample that has been selected STA 291 Spring 2008 Lecture 5