Exploratory Data Analysis

Slides:



Advertisements
Similar presentations
C. D. Toliver AP Statistics
Advertisements

Chapter 2 Exploring Data with Graphs and Numerical Summaries
Lesson Describing Distributions with Numbers parts from Mr. Molesky’s Statmonkey website.
Probabilistic & Statistical Techniques
Describing Data: Percentiles
Measures of Position - Quartiles
Understanding and Comparing Distributions 30 min.
Sullivan – Statistics: Informed Decisions Using Data – 2 nd Edition – Chapter 3 Introduction – Slide 1 of 3 Topic 16 Numerically Summarizing Data- Averages.
Chapter 3 Numerically Summarizing Data Section 3.5 Five Number Summary; Boxplots.
B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.
CHAPTER 2: Describing Distributions with Numbers
5 Number Summary Box Plots. The five-number summary is the collection of The smallest value The first quartile (Q 1 or P 25 ) The median (M or Q 2 or.
The Five-Number Summary And Boxplots. Chapter 3 – Section 5 ●Learning objectives  Compute the five-number summary  Draw and interpret boxplots 1 2.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
©The McGraw-Hill Companies, Inc. 2008McGraw-Hill/Irwin Describing Data: Displaying and Exploring Data Unit 1: One Variable Statistics CCSS: N-Q (1-3);
© Copyright McGraw-Hill CHAPTER 3 Data Description.
Slide 1 Statistics Workshop Tutorial 6 Measures of Relative Standing Exploratory Data Analysis.
Percentiles and Box – and – Whisker Plots Measures of central tendency show us the spread of data. Mean and standard deviation are useful with every day.
Section 1 Topic 31 Summarising metric data: Median, IQR, and boxplots.
Chapter 2 Describing Data.
3-5: Exploratory Data Analysis  Exploratory Data Analysis (EDA) data can be organized using a stem and leaf (as opposed to a frequency distribution) 
Displaying Quantitative Data Graphically and Describing It Numerically AP Statistics Chapters 4 & 5.
© Copyright McGraw-Hill CHAPTER 3 Data Description.
Unit 4 Statistical Analysis Data Representations.
Chapter 3 Data Description Section 3-3 Measures of Variation.
Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 1 Chapter Descriptive Statistics 2.
BOX PLOTS (BOX AND WHISKERS). Boxplot A graph of a set of data obtained by drawing a horizontal line from the minimum to maximum values with quartiles.
Percentiles For any whole number P (between 1 and 99), the Pth percentile of a distribution is a value such that P% of the data fall at or below it. The.
Chapter 2 Section 5 Notes Coach Bridges
1 Descriptive Statistics 2-1 Overview 2-2 Summarizing Data with Frequency Tables 2-3 Pictures of Data 2-4 Measures of Center 2-5 Measures of Variation.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 3 Section 5 – Slide 1 of 21 Chapter 3 Section 5 The Five-Number Summary And Boxplots.
Chapter 5: Boxplots  Objective: To find the five-number summaries of data and create and analyze boxplots CHS Statistics.
Organizing Data AP Stats Chapter 1. Organizing Data Categorical Categorical Dotplot (also used for quantitative) Dotplot (also used for quantitative)
Edpsy 511 Exploratory Data Analysis Homework 1: Due 9/19.
Chapter 3 Averages and Variation Understanding Basic Statistics Fifth Edition By Brase and Brase Prepared by Jon Booze.
Chapter 6: Interpreting the Measures of Variability.
Measures of Position Section 3-3.
Chapter 4 Histograms Stem-and-Leaf Dot Plots Measures of Central Tendency Measures of Variation Measures of Position.
More Univariate Data Quantitative Graphs & Describing Distributions with Numbers.
Unit 3: Averages and Variations Part 3 Statistics Mr. Evans.
Chapter 4 Measures of Central Tendency Measures of Variation Measures of Position Dot Plots Stem-and-Leaf Histograms.
Data Description Note: This PowerPoint is only a summary and your main source should be the book. Lecture (8) Lecturer : FATEN AL-HUSSAIN.
Statistics Unit Test Review Chapters 11 & /11-2 Mean(average): the sum of the data divided by the number of pieces of data Median: the value appearing.
Chapter 3 Section 3 Measures of variation. Measures of Variation Example 3 – 18 Suppose we wish to test two experimental brands of outdoor paint to see.
Chapter 4 Histograms Stem-and-Leaf Dot Plots Measures of Central Tendency Measures of Variation Measures of Position.
Exploratory Data Analysis
Chapter 2 Descriptive Statistics.
Chapter 5 : Describing Distributions Numerically I
STATISTICS ELEMENTARY MARIO F. TRIOLA
Unit 2 Section 2.5.
3-3: Measures of Position
CHAPTER 3 Data Description 9/17/2018 Kasturiarachi.
Averages and Variation
Statistical Reasoning
CHAPTET 3 Data Description.
Box and Whisker Plots Algebra 2.
Numerical Measures: Skewness and Location
Describing Distributions of Data
Measure of Center And Boxplot’s.
3.4 Exploratory Data Analysis
Chapter 2 Descriptive Statistics.
Quartile Measures DCOVA
Measure of Center And Boxplot’s.
Descriptive Statistics
Data Analysis and Statistical Software I Quarter: Spring 2003
Day 52 – Box-and-Whisker.
Honors Statistics Review Chapters 4 - 5
Quiz.
Displaying Distributions with Graphs
Presentation transcript:

Exploratory Data Analysis Chapter 3.4 Exploratory Data Analysis

Traditional Statistics Data are organized by using a frequency distribution Use distribution to create various graphs, histogram, frequency polygon, ogive Mean and standard deviation are computer to summarize data Purpose is to confirm various conjectures about the nature of the data

Exploratory Data Analysis (EDA) Purpose is to examine data to find out what information can be discovered about the data such as the center and the spread Organized using a stem and leaf plot Measure of central tendency is the median and variation is the interquartile range Represented graphically using a boxplot

Quartiles Quartiles divide the distribution into four groups, separated by Q1, Q2, Q3 Q1 is the same as the 25th percentile Q2 is the same as the 50th percentile (median) Q3 is the same as the 75th percentile For example: 5, 6, 12, 13, 15, 18, 22, 50

The five number summary The lowest value of the data set (minimum) Q1 the median Q3 The highest value of the data set (maximum)

Boxplot A boxplot is a graph of a data set obtained by drawing a horizontal line from the minimum data value to Q1 , drawing a horizontal line from Q3 to the maximum data value, and drawing a box whose vertical sides pass through Q1 and Q3 with a vertical line inside the box passing through the median or Q2

Procedure for constructing a boxplot Find the five-number summary for the data values Draw a horizontal axis with a scale such that it includes the maximum and the minimum data values. Draw a box whose vertical sides go through Q1 and Q3 and draw a vertical line through the median Draw a line from the minimum data value to the left side of the box and a line from the maximum data value to the right side of the box.

Number of Meteorites Found The number of meteorites found in 10 states of the U. S. is 89, 47, 164, 296, 30, 215, 138, 78, 48, 39. Construct a boxplot for the data

Information obtained from a boxplot If the median is near the center of the box, the distribution is approximately symmetric If the median falls to the left for the center of the box, the distribution is positively (right) skewed. If the median falls to the right of the center, the distribution is negatively (left) skewed. If the lines are about the same length, the distribution is approximately symmetric If the right line is larger than the left line, the distribution is positively (right) skewed If the left line is larger than the right line, the distribution is negatively (left) skewed

Sodium Content of Cheese A dietitian is interest in comparing the sodium content of real cheese with the sodium content of a cheese substitute. Compare the distribution using boxplots. Real Cheese Cheese Substitute 310 4520 45 40 270 180 250 290 220 240 90 130 260 340

Resistant Statistic A resistant statistic is relatively less affected by outliers than a nonresistant statistic. The mean and standard deviation are nonresistant statistics Sometimes, when a distribution is skewed or contains outliers, the median and interquartile range may more accurately summarize the data than the mean and standard deviation

Correspondence between traditional and exploratory data analysis  Frequency Distribution  Stem and leaf plot  Histogram  boxplot  Mean  median  Standard Deviation  interquartile range

Try it! Applying the concepts 3-4 Pg. 174