Presentation is loading. Please wait.

Presentation is loading. Please wait.

Methods for Describing Sets of Data

Similar presentations


Presentation on theme: "Methods for Describing Sets of Data"— Presentation transcript:

1 Methods for Describing Sets of Data
Chapter 2 Methods for Describing Sets of Data Slides for Optional Sections Section 2.8 Methods for Detecting Outliers Slides 31-34 Section 2.9 Graphing Bivariate Relationships Slide 35 Section The Time Series Plot Slide 36

2 Objectives Describe Data using Graphs Describe Data using Charts

3 Describing Qualitative Data
Qualitative data are nonnumeric in nature Best described by using Classes 2 descriptive measures class frequency – number of data points in a class class relative = class frequency frequency total number of data points in data set class percentage – class relative freq. x 100 Add discussion of class percentage

4 Describing Qualitative Data – Displaying Descriptive Measures
Summary Table Class Frequency Class percentage – class relative frequency x 100

5 Describing Qualitative Data – Qualitative Data Displays
Bar Graph

6 Describing Qualitative Data – Qualitative Data Displays
Pie chart

7 Describing Qualitative Data – Qualitative Data Displays
Pareto Diagram

8 Graphical Methods for Describing Quantitative Data
The Data

9 Graphical Methods for Describing Quantitative Data
For describing, summarizing, and detecting patterns in such data, we can use three graphical methods: dot plots stem-and-leaf displays histograms

10 Graphical Methods for Describing Quantitative Data
Dot Plot

11 Graphical Methods for Describing Quantitative Data
Stem-and-Leaf Display

12 Graphical Methods for Describing Quantitative Data
Histogram

13 Graphical Methods for Describing Quantitative Data
More on Histograms Number of Observations in Data Set Number of Classes Less than 25 5-6 25-50 7-14 More than 50 15-20

14 Summation Notation Used to simplify summation instructions
Each observation in a data set is identified by a subscript x1, x2, x3, x4, x5, …. xn Notation used to sum the above numbers together is

15 Summation Notation Data set of 1, 2, 3, 4 Are these the same? and

16 Numerical Measures of Central Tendency
Central Tendency – tendency of data to center about certain numerical values 3 commonly used measures of Central Tendency: Mean Median Mode

17 Numerical Measures of Central Tendency
The Mean Arithmetic average of the elements of the data set Sample mean denoted by Population mean denoted by Calculated as and

18 Numerical Measures of Central Tendency
The Median Middle number when observations are arranged in order Median denoted by m Identified as the observation if n is odd, and the mean of the and observations if n is even

19 Numerical Measures of Central Tendency
The Mode The most frequently occurring value in the data set Data set can be multi-modal – have more than one mode Data displayed in a histogram will have a modal class – the class with the largest frequency

20 Numerical Measures of Central Tendency
The Data set Mean Median is the or 5th observation, 8 Mode is 8

21 Numerical Measures of Variability
Variability – the spread of the data across possible values 3 commonly used measures of Variability: Range Variance Standard Deviation

22 Numerical Measures of Variability
The Range Largest measurement minus the smallest measurement Loses sensitivity when data sets are large These 2 distributions have the same range. How much does the range tell you about the data variability?

23 Numerical Measures of Variability
The Sample Variance (s2) The sum of the squared deviations from the mean divided by (n-1). Expressed as units squared Why square the deviations? The sum of the deviations from the mean is zero

24 Numerical Measures of Variability
The Sample Standard Deviation (s) The positive square root of the sample variance Expressed in the original units of measurement

25 Numerical Measures of Variability
Samples and Populations - Notation Sample Population Variance s2 Standard Deviation s

26 Numerical Measures of Relative Standing
Descriptive measures of relationship of a measurement to the rest of the data Common measures: percentile ranking z-score

27 Numerical Measures of Relative Standing
Percentile rankings make use of the pth percentile The median is an example of percentiles. Median is the 50th percentile – 50 % of observations lie above it, and 50% lie below it For any p, the pth percentile has p% of the measures lying below it, and (100-p)% above it

28 Numerical Measures of Relative Standing
z-score – the distance between a measurement x and the mean, expressed in standard units Use of standard units allows comparison across data sets

29 Numerical Measures of Relative Standing
More on z-scores Z-scores follow the empirical rule for mounded distributions

30 Methods for Detecting Outliers
Outlier – an observation that is unusually large or small relative to the data values being described Causes: Invalid measurement Misclassified measurement A rare (chance) event 2 detection methods: Box Plots z-scores

31 Methods for Detecting Outliers
Box Plots based on quartiles, values that divide the dataset into 4 groups Lower Quartile QL – 25th percentile Middle Quartile - median Upper Quartile QU – 75th percentile Interquartile Range (IQR) = QU - QL

32 Methods for Detecting Outliers
Box Plots Not on plot – inner and outer fences, which determine potential outliers QU (hinge) QL (hinge) Median Potential Outlier Whiskers

33 Methods for Detecting Outliers
Rules of thumb Box Plots measurements between inner and outer fences are suspect measurements beyond outer fences are highly suspect Z-scores Scores of 3 in mounded distributions (2 in highly skewed distributions) are considered outliers


Download ppt "Methods for Describing Sets of Data"

Similar presentations


Ads by Google