Presentation is loading. Please wait.

Presentation is loading. Please wait.

Exploratory Data Analysis (EDA)

Similar presentations


Presentation on theme: "Exploratory Data Analysis (EDA)"— Presentation transcript:

1 Exploratory Data Analysis (EDA)
Section 3-5 Exploratory Data Analysis (EDA)

2 EXPLORATORY DATA ANALYSIS
Exploratory data analysis (EDA) is the process of using statistical tools (such as graphs, measures of center, and measures of variation) to investigate data sets in order to understand their important characteristics.

3 OUTLIERS An outlier is a value that is located very far away from almost all of the other values. An outlier is also known as an extreme value. Outliers can have a dramatic effect on the mean, standard deviation, and on the scale of the histogram so that the true nature of the distribution is totally obscured. To find outliers, examine a sorted list of data and look for values that are far from most other values.

4 5-NUMBER SUMMARY For a set of data, the 5-number summary consists of:
the minimum value; the first quartile, Q1; the median (or second quartile, Q2); the third quartile, Q3; and the maximum value.

5 EXAMPLE Find the 5-number summary for Bank of Providence waiting times. Bank of Providence (multiple waiting lines) 4.2 5.4 5.8 6.2 6.7 7.7 8.5 9.3 10.0

6 BOXPLOTS (BOX-AND-WHISKER DIAGRAMS)
Boxplots are good for revealing: 1. center of the data 2. spread of the data 3. distribution of the data 4. presence of outliers Boxplots are also excellent for comparing two or more data sets.

7 CONSTRUCTING A BOXPLOT
Find the 5-number summary. Construct a scale with values that include the minimum and maximum data values. Construct a box (rectangle) extending from Q1 to Q3, and draw a line in the box at the median value. Draw lines extending outward from the box to the minimum and maximum data values.

8 AN EXAMPLE OF A BOXPLOT Bank of Providence (multiple waiting lines) 4.2 5.4 5.8 6.2 6.7 7.7 8.5 9.3 10.0

9 DRAWING A BOXPLOT ON THE TI-83/84
Press STAT; select 1:Edit…. Enter your data values in L1. (Note: You could enter them in a different list.) Press 2ND, Y= (for STATPLOT). Select 1:Plot1. Turn the plot ON. For Type, select the boxplot (middle one on second row). For Xlist, put L1 by pressing 2ND, 1. For Freq, enter the number 1. Press ZOOM. Select 9:ZoomStat.

10 EXAMPLE Use boxplots to compare the waiting times at Jefferson Valley Bank and the Bank of Providence. Interpret your results. Jefferson Valley Bank (single waiting line) 6.5 6.6 6.7 6.8 7.1 7.3 7.4 7.7 Bank of Providence (multiple waiting lines) 4.2 5.4 5.8 6.2 8.5 9.3 10.0

11 BOXPLOTS AND DISTRIBUTIONS
Bell-Shaped Uniform Skewed

12 EXPLORING Measures of Center: mean, median, and mode
Measures of Variation: standard deviation and range Measures of Dispersion: minimum value, maximum value, and quartiles Unusual Values: outliers Distribution: histogram, stem-leaf plots, and boxplots

13 EXAMPLE Explore the data below which shows the ages of most employees at the Vita Needle Company. (Based on data from “Where Retirement Became a Dirty Word” by Julie Flaherty, New York Times.)


Download ppt "Exploratory Data Analysis (EDA)"

Similar presentations


Ads by Google