Download presentation
Presentation is loading. Please wait.
1
Measures of Central Tendency
MARE 250 Dr. Jason Turner
2
Centracidal Tendencies
The measure of central tendency indicates where along the measurement scale the sample or population is located – can be determined via various measures Three most important: Mean Median Mode
3
Mean Girls Mean – most commonly used measure of center
sum of the observations divided by the number of observations
4
The Median "As we were driving, we saw a sign that said "Watch for Rocks." Martha said it should read "Watch for Pretty Rocks." I told her she should write in her suggestion to the highway department, but she started saying it was a joke - just to get out of writing a simple letter! And I thought I was lazy!“ – Jack Handy The median is typically defined as the middle measurement in an ordered set of data Separates the bottom 50% of the data from the top 50%
5
The Mode “Oh, no way - where? Holy crap, he's with a girl! But he's the guy from Depeche Mode! That's impossible! Come on, he's in Depeche Mode!” - The Monarch The mode is typically defined as the most frequently occurring measurement in a set of data The mode is useful if the distribution is skewed or bimodal (having two very pronounced values around which data are concentrated) 30 Number of Individuals 10 20
6
You are so totally skewed!
The mean is sensitive to extreme (very large or small) observations and the median is not Therefore – you can determine how skewed your data is by looking at the relationship between median and mean Mean is Greater than the Median Mean and Median are Equal Mean is Less Than the Median
7
Resistance Measures A resistance measure is not sensitive to the influences of a few extreme observations Median – resistant measure of center Mean – not Resistance of Mean can be improved by using – Trimmed Means – a specified percentage of the smallest and largest observations are removed before computing the mean Will do something like this later when exploring the data and evaluating outliers…(their effects upon the mean)
8
How To on Computer On Minitab: Your data must be in a single column
Go to the 'Stat' menu, and select 'Basic stats', then 'Display descriptive stats'. Select your data column in the 'variables' box. The output will generally go to the session window, or if you select 'graphical summary' in the 'graphs' options, it will be given in a separate window. This will give you a number of basic descriptive stats, though not the mode.
9
Measures of Dispersion and Variability
MARE 250 Dr. Jason Turner
10
Please Disperse! “Alright everyone, disperse immediately. We are prepared to use force a-- what, what? We're not prepared, Eddie? Someone call 911!” – Chief Wiggum Measure of Dispersion of the Data - an indication of the spread of measurements around the center of the distribution 2 of the most frequently used – Range Standard Deviation
11
The Range Range - the difference between the highest and lowest values in the observations This is useful, but may be misleading when the data has one or more outliers (single measurements that are exceptionally large or small relative to the other data) It is not relative to the central location Range = Max - Min
12
The Variance Variance - the average of the squared deviations from the mean The most widely used measure of spread, and one that will be used often in various statistical applications
13
The Variance Degrees of Freedom - quantity (n -1)
Used instead of n to provide an unbiased estimate of the population variance As the sample size (n) increases (and n approaches N) Value of the population and sample variance will become more similar
14
Standard Deviation Standard Deviation – the positive square root of the variance Indicates how far (on average) the observations in the sample are from the mean of the sample The more variation in a data set, the larger its standard deviation
15
Quartiles Quartiles – into quarters – 4 equal parts
Median divides data into 2 equal parts: 50% bottom, 50% top Quartiles – into quarters – 4 equal parts A dataset has 3 quartiles: Q1 – is the number that divides the bottom 25% from top 75% Q2 – is the median; bottom 50% from top 50% Q3 – is the number that divides the bottom 75% from top 25%
16
Quartiles
17
Interquartile Range Interquartile Range (IQR) – the difference between the first and third quartiles IQR = Q3 – Q1 The IQR gives you the range of the middle 50% of the data
18
Outlier, Outlier Outliers – observations that fall well outside the overall pattern of the data Requires special attention May be the result of: Measurement or Recording Error Observation from a different population Unusual Extreme observation
19
Pants on Fire Must deal with outliers: (Yes, really!)
If error – can delete; otherwise judgment call Can use quartiles and IQR to identify potential outliers
20
The Outer Limits Lower and Upper Limits: Lower Limit = Q1 - 1.5 * IQR
Lower limit – is the number that lies 1.5 IQR’s below the first quartile Lower Limit = Q * IQR Upper limit – is the number that lies 1.5 IQR’s above the first quartile Upper Limit = Q * IQR
21
The Outer Limits Outlier
If a value is outside the “Outer Limits” of a dataset it is an Outlier
22
Five-Number Summary 5-Number Summary: Written in increasing order
Min, Q1, Q2, Q3, Max Written in increasing order Provides information on Center and Variation Are used to construct Box-Plots
23
Boxplots Boxplot (Box-and-Whisker-Design):
based on the 5-number summary provide graphic display of the center and variation Q1 Q2 Q3 Min Max 70
24
Note that Min & Max are determine after outliers are removed!
Boxplots Modified Boxplot – includes outliers Potential Outlier * 70 Note that Min & Max are determine after outliers are removed!
25
Boxplots
26
Boxplots Boxplots summarize information about the shape, dispersion, and center of your data. They can also help you spot outliers. The left edge of the box represents the first quartile (Q1), while the right edge represents the third quartile (Q3). Thus the box portion of the plot represents the interquartile range (IQR), or the middle 50% of the observations Q1 Q2 Q3 Min Max 70
27
Boxplots The line drawn through the box represents the median of the data The lines extending from the box are called whiskers. The whiskers extend outward to indicate the lowest and highest values in the data set (excluding outliers) Extreme values, or outliers, are represented by dots. A value is considered an outlier if it is outside of the box (greater than Q3 or less than Q1) by more than 1.5 times the IQR Potential Outlier * 70
28
Boxplots Use the boxplot to assess the symmetry of the data:
If the data are fairly symmetric, the median line will be roughly in the middle of the IQR box and the whiskers will be similar in length If the data are skewed, the median may not fall in the middle of the IQR box, and one whisker will likely be noticeably longer than the other
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.