Chapter 16: Exploratory data analysis: numerical summaries

Slides:



Advertisements
Similar presentations
DESCRIBING DISTRIBUTION NUMERICALLY
Advertisements

HS 67 - Intro Health Statistics Describing Distributions with Numbers
Chapter 2 Exploring Data with Graphs and Numerical Summaries
Descriptive Measures MARE 250 Dr. Jason Turner.
Measures of Position - Quartiles
Measures of Variation Sample range Sample variance Sample standard deviation Sample interquartile range.
Describing Quantitative Data with Numbers Part 2
Descriptive Statistics
Descriptive Statistics – Central Tendency & Variability Chapter 3 (Part 2) MSIS 111 Prof. Nick Dedeke.
Sullivan – Statistics: Informed Decisions Using Data – 2 nd Edition – Chapter 3 Introduction – Slide 1 of 3 Topic 16 Numerically Summarizing Data- Averages.
MEASURES OF SPREAD – VARIABILITY- DIVERSITY- VARIATION-DISPERSION
1 Distribution Summaries Measures of central tendency Mean Median Mode Measures of spread Range Standard Deviation Interquartile Range (IQR)
BPS - 5th Ed. Chapter 21 Describing Distributions with Numbers.
Chapter 5 – 1 Chapter 5: Measures of Variability The Importance of Measuring Variability The Range IQR (Inter-Quartile Range) Variance Standard Deviation.
5 Number Summary Box Plots. The five-number summary is the collection of The smallest value The first quartile (Q 1 or P 25 ) The median (M or Q 2 or.
Vocabulary for Box and Whisker Plots. Box and Whisker Plot: A diagram that summarizes data using the median, the upper and lowers quartiles, and the extreme.
Box and Whisker Plots and Quartiles Sixth Grade. Five Statistical Summary When describing a set of data we have seen that we can use measures such as.
Programming in R Describing Univariate and Multivariate data.
LECTURE 12 Tuesday, 6 October STA291 Fall Five-Number Summary (Review) 2 Maximum, Upper Quartile, Median, Lower Quartile, Minimum Statistical Software.
Methods for Describing Sets of Data
Exploration of Mean & Median Go to the website of “Introduction to the Practice of Statistics”website Click on the link to “Statistical Applets” Select.
LECTURE 8 Thursday, 19 February STA291 Fall 2008.
What is variability in data? Measuring how much the group as a whole deviates from the center. Gives you an indication of what is the spread of the data.
Section 1 Topic 31 Summarising metric data: Median, IQR, and boxplots.
Lecture 5 Dustin Lueker. 2 Mode - Most frequent value. Notation: Subscripted variables n = # of units in the sample N = # of units in the population x.
Measures of Dispersion How far the data is spread out.
Numerical Statistics Given a set of data (numbers and a context) we are interested in how to describe the entire set without listing all the elements.
1 Further Maths Chapter 2 Summarising Numerical Data.
BPS - 5th Ed. Chapter 21 Describing Distributions with Numbers.
Chapter 16 Exploratory data analysis: numerical summaries CIS 2033 Based on Textbook: A Modern Introduction to Probability and Statistics Instructor:
CCGPS Advanced Algebra UNIT QUESTION: How do we use data to draw conclusions about populations? Standard: MCC9-12.S.ID.1-3, 5-9, SP.5 Today’s Question:
Using Measures of Position (rather than value) to Describe Spread? 1.
What is a box-and-whisker plot? 5-number summary Quartile 1 st, 2 nd, and 3 rd quartiles Interquartile Range Outliers.
Unit 4: Probability Day 4: Measures of Central Tendency and Box and Whisker Plots.
Chapter 1 Lesson 4 Quartiles, Percentiles, and Box Plots.
Probability & Statistics Box Plots. Describing Distributions Numerically Five Number Summary and Box Plots (Box & Whisker Plots )
Exploratory Data Analysis
Methods for Describing Sets of Data
a graphical presentation of the five-number summary of data
Probability and Statistics for Computer Scientists Second Edition, By: Michael Baron Chapter 8: Introduction to Statistics CIS Computational Probability.
Describing Distributions Numerically
Descriptive Measures Descriptive Measure – A Unique Measure of a Data Set Central Tendency of Data Mean Median Mode 2) Dispersion or Spread of Data A.
Chapter 5 : Describing Distributions Numerically I
Chapter 16: Exploratory data analysis: Numerical summaries
Unit 6 Day 2 Vocabulary and Graphs Review
Averages and Variation
Bar graphs are used to compare things between different groups
CHAPTER 1 Exploring Data
Unit 4 Statistics Review
DAY 3 Sections 1.2 and 1.3.
Numerical Measures: Skewness and Location
Lecture 2 Chapter 3. Displaying and Summarizing Quantitative Data
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
Quartile Measures DCOVA
The absolute value of each deviation.
Displaying Distributions with Graphs
Exploratory data analysis: numerical summaries
Data Analysis and Statistical Software I Quarter: Spring 2003
Basic Practice of Statistics - 3rd Edition
Chapter 1: Exploring Data
Day 52 – Box-and-Whisker.
Describing Distributions Numerically
Basic Practice of Statistics - 3rd Edition
MCC6.SP.5c, MCC9-12.S.ID.1, MCC9-12.S.1D.2 and MCC9-12.S.ID.3
The Five-Number Summary
Box and Whisker Plots.
Basic Practice of Statistics - 3rd Edition
Describing Distributions with Numbers
Box and Whisker Plots and the 5 number summary
Presentation transcript:

Chapter 16: Exploratory data analysis: numerical summaries CIS 3033

16.1 The center of a dataset Sample mean: Sample median: Medn is the middle element (if n is odd), or the average of the two middle elements (if n is even), after the elements are sorted.   The sample mean and sample median of a dataset correspond to the expectation and median of a probability distribution, respectively.  Sample mean is more sensitive to outliers than sample median.

16.2 The amount of variability Sample variance: Sample standard deviation: sn , the square-root of sample variance.   When the center of a dataset is represented by its sample median, the variability is often indicated by median of absolute deviations, or MAD,  MAD less sensitive to outliers than sn.

16.3 Empirical quantiles, quartiles, IQR The pth empirical quantile, qn(p), or the 100p empirical percentile, is the number that is greater than (or equal to) a proportion p of the elements in the dataset, and less than (or equal to) a proportion 1−p of the elements.   A quantile is not necessarily in the dataset. The order statistics of a dataset x1, x2, . . . , xn consist of the same elements as in the original dataset, but ordered as x(1) ≤ x(2) ≤ · · · ≤ x(n).

16.3 Empirical quantiles, quartiles, IQR The ith order statistic is greater or equal to i elements in the dataset, and less or equal to n-i+1 elements, so it is the i/(n+1) quantile (the element itself is counted in both sides).   Example: 2, 5, 7, 8             When p = 0.3, for this dataset                 = 2 + (0.3*5 - 1) * (5 - 2) = 3.5

16.3 Empirical quantiles, quartiles, IQR   The qn(0.25) is called the lower quartile and qn(0.75) is called the upper quartile. The distance between the upper and lower quartiles is called the interquartile range, or IQR. Five-number summary of a dataset: minimum, lower (or first) quartile, median, upper (or third) quartile, maximum.

16.3 Empirical quantiles, quartiles, IQR

16.4 The box-and-whisker plot Box-and-whisker plot, briefly boxplot: visualizing the five-number summary.   The horizontal width of the box is irrelevant. The height of the box is precisely the IQR. The horizontal line inside the box corresponds to the sample median. Two whiskers are the min/max data elements within 1.5 IQR from the box. All observations beyond the whiskers are called outliers.

16.4 The box-and-whisker plot

16.4 The box-and-whisker plot Boxplots become useful if we want to compare several sets of data in a simple graphical display.