Chapter 2 Describing Data.

Slides:



Advertisements
Similar presentations
Describing Quantitative Variables
Advertisements

Chapter 2 Exploring Data with Graphs and Numerical Summaries
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Lecture Slides Elementary Statistics Eleventh Edition and the Triola.
Measures of Dispersion
B a c kn e x t h o m e Frequency Distributions frequency distribution A frequency distribution is a table used to organize data. The left column (called.
1 Chapter 1: Sampling and Descriptive Statistics.
Descriptive Statistics
Chapter 3 Describing Data Using Numerical Measures
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Descriptive Statistics – Central Tendency & Variability Chapter 3 (Part 2) MSIS 111 Prof. Nick Dedeke.
B a c kn e x t h o m e Parameters and Statistics statistic A statistic is a descriptive measure computed from a sample of data. parameter A parameter is.
Sullivan – Statistics: Informed Decisions Using Data – 2 nd Edition – Chapter 3 Introduction – Slide 1 of 3 Topic 16 Numerically Summarizing Data- Averages.
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
ISE 261 PROBABILISTIC SYSTEMS. Chapter One Descriptive Statistics.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter Two Treatment of Data.
Slides by JOHN LOUCKS St. Edward’s University.
B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.
Descriptive statistics (Part I)
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Describing Data: Numerical
Chapter 2 Describing Data with Numerical Measurements
Describing Data Using Numerical Measures
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Chapter 2 Describing Data with Numerical Measurements General Objectives: Graphs are extremely useful for the visual description of a data set. However,
Let’s Review for… AP Statistics!!! Chapter 1 Review Frank Cerros Xinlei Du Claire Dubois Ryan Hoshi.
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Descriptive Statistics
Census A survey to collect data on the entire population.   Data The facts and figures collected, analyzed, and summarized for presentation and.
Numerical Descriptive Techniques
Methods for Describing Sets of Data
2011 Summer ERIE/REU Program Descriptive Statistics Igor Jankovic Department of Civil, Structural, and Environmental Engineering University at Buffalo,
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 2 Descriptive Statistics.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 3 Descriptive Statistics: Numerical Methods.
© Copyright McGraw-Hill CHAPTER 3 Data Description.
STATISTICS I COURSE INSTRUCTOR: TEHSEEN IMRAAN. CHAPTER 4 DESCRIBING DATA.
Chapter 3 Descriptive Statistics: Numerical Methods Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
1 Laugh, and the world laughs with you. Weep and you weep alone.~Shakespeare~
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 3 Descriptive Statistics: Numerical Methods.
STAT 280: Elementary Applied Statistics Describing Data Using Numerical Measures.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter.
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 3 Descriptive Statistics: Numerical Methods.
Lecture 3 Describing Data Using Numerical Measures.
Skewness & Kurtosis: Reference
1 Elementary Statistics Larson Farber Descriptive Statistics Chapter 2.
Larson/Farber Ch 2 1 Elementary Statistics Larson Farber 2 Descriptive Statistics.
INVESTIGATION 1.
Chap 3-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 3 Describing Data Using Numerical.
Descriptive Statistics. Frequency Distributions and Their Graphs What you should learn: How to construct a frequency distribution including midpoints,
Descriptive statistics Petter Mostad Goal: Reduce data amount, keep ”information” Two uses: Data exploration: What you do for yourself when.
Chapter 2 Descriptive Statistics Section 2.3 Measures of Variation Figure 2.31 Repair Times for Personal Computers at Two Service Centers  Figure 2.31.
1 Descriptive Statistics 2-1 Overview 2-2 Summarizing Data with Frequency Tables 2-3 Pictures of Data 2-4 Measures of Center 2-5 Measures of Variation.
CHAPTER 1 Basic Statistics Statistics in Engineering
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-1 Business Statistics, 4e by Ken Black Chapter 3 Descriptive Statistics.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall2(2)-1 Chapter 2: Displaying and Summarizing Data Part 2: Descriptive Statistics.
Larson/Farber Ch 2 1 Elementary Statistics Larson Farber 2 Descriptive Statistics.
CHAPTER 1 Basic Statistics Statistics in Engineering
Descriptive Statistics(Summary and Variability measures)
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
Slide 1 Copyright © 2004 Pearson Education, Inc.  Descriptive Statistics summarize or describe the important characteristics of a known set of population.
Descriptive Statistics
Exploratory Data Analysis
Methods for Describing Sets of Data
Descriptive Statistics
Chapter 3 Describing Data Using Numerical Measures
NUMERICAL DESCRIPTIVE MEASURES
Description of Data (Summary and Variability measures)
Chapter 3 Describing Data Using Numerical Measures
Descriptive Statistics
Presentation transcript:

Chapter 2 Describing Data

Summarizing and Describing Data Tables and Graphs Numerical Measures

Classification of Variables Discrete numerical variable Continuous numerical variable Categorical variable

Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.

Classification of Variables Continuous Numerical Variable A variable that produces a response that is the outcome of a measurement process.

Classification of Variables Categorical Variables Variables that produce responses that belong to groups (sometimes called “classes”) or categories.

Measurement Levels Nominal and Ordinal Levels of Measurement refer to data obtained from categorical questions. A nominal scale indicates assignments to groups or classes. Ordinal data indicate rank ordering of items.

Frequency Distributions A frequency distribution is a table used to organize data. The left column (called classes or groups) includes numerical intervals on a variable being studied. The right column is a list of the frequencies, or number of observations, for each class. Intervals are normally of equal size, must cover the range of the sample observations, and be non-overlapping.

Construction of a Frequency Distribution Rule 1: Intervals (classes) must be inclusive and non-overlapping; Rule 2: Determine k, the number of classes; Rule 3: Intervals should be the same width, w; the width is determined by the following: Both k and w should be rounded upward, possibly to the next largest integer.

Construction of a Frequency Distribution Quick Guide to Number of Classes for a Frequency Distribution Sample Size Number of Classes Fewer than 50 5 – 6 classes 50 to 100 6 – 8 classes over 100 8 – 10 classes

Example of a Frequency Distribution Table 2.2 A Frequency Distribution for the Suntan Lotion Example Weights (in mL) Number of Bottles 220 less than 225 1 225 less than 230 4 230 less than 235 29 235 less than 240 34 240 less than 245 26 245 less than 250 6

Cumulative Frequency Distributions A cumulative frequency distribution contains the number of observations whose values are less than the upper limit of each interval. It is constructed by adding the frequencies of all frequency distribution intervals up to and including the present interval.

Relative Cumulative Frequency Distributions A relative cumulative frequency distribution converts all cumulative frequencies to cumulative percentages

Example of a Frequency Distribution Table 2.3 A Cumulative Frequency Distribution for the Suntan Lotion Example Weights (in mL) Number of Bottles less than 225 1 less than 230 5 less than 235 34 less than 240 68 less than 245 94 less than 250 100

Histograms and Ogives A histogram is a bar graph that consists of vertical bars constructed on a horizontal line that is marked off with intervals for the variable being displayed. The intervals correspond to those in a frequency distribution table. The height of each bar is proportional to the number of observations in that interval.

Histograms and Ogives An ogive, sometimes called a cumulative line graph, is a line that connects points that are the cumulative percentage of observations below the upper limit of each class in a cumulative frequency distribution.

Histogram and Ogive for Example 2.1

Stem-and-Leaf Display A stem-and-leaf display is an exploratory data analysis graph that is an alternative to the histogram. Data are grouped according to their leading digits (called the stem) while listing the final digits (called leaves) separately for each member of a class. The leaves are displayed individually in ascending order after each of the stems.

Stem-and-Leaf Display Stem-and-Leaf Display for Gilotti’s Deli Example

Tables - Bar and Pie Charts - Frequency and Relative Frequency Distribution for Top Company Employers Example

Tables - Bar and Pie Charts - Figure 2.9 Bar Chart for Top Company Employers Example

Tables - Bar and Pie Charts - Figure 2.10 Pie Chart for Top Company Employers Example

Pareto Diagrams A Pareto diagram is a bar chart that displays the frequency of defect causes. The bar at the left indicates the most frequent cause and bars to the right indicate causes in decreasing frequency. A Pareto diagram is use to separate the “vital few” from the “trivial many.”

Line Charts A line chart, also called a time plot, is a series of data plotted at various time intervals. Measuring time along the horizontal axis and the numerical quantity of interest along the vertical axis yields a point on the graph for each observation. Joining points adjacent in time by straight lines produces a time plot.

Line Charts

Parameters and Statistics A statistic is a descriptive measure computed from a sample of data. A parameter is a descriptive measure computed from an entire population of data.

Measures of Central Tendency - Arithmetic Mean - A arithmetic mean is of a set of data is the sum of the data values divided by the number of observations.

If the data set is from a sample, then the sample mean, , is:

Population Mean If the data set is from a population, then the population mean,  , is:

Measures of Central Tendency - Median - An ordered array is an arrangement of data in either ascending or descending order. Once the data are arranged in ascending order, the median is the value such that 50% of the observations are smaller and 50% of the observations are larger. If the sample size n is an odd number, the median, Xm, is the middle observation. If the sample size n is an even number, the median, Xm, is the average of the two middle observations. The median will be located in the 0.50(n+1)th ordered position.

Measures of Central Tendency - Mode - The mode, if one exists, is the most frequently occurring observation in the sample or population.

Shape of the Distribution The shape of the distribution is said to be symmetric if the observations are balanced, or evenly distributed, about the mean. In a symmetric distribution the mean and median are equal.

Shape of the Distribution A distribution is skewed if the observations are not symmetrically distributed above and below the mean. A positively skewed (or skewed to the right) distribution has a tail that extends to the right in the direction of positive values. A negatively skewed (or skewed to the left) distribution has a tail that extends to the left in the direction of negative values.

Shapes of the Distribution

Measures of Central Tendency - Geometric Mean - The Geometric Mean is the nth root of the product of n numbers: The Geometric Mean is used to obtain mean growth over several periods given compounded growth from each period.

Measures of Variability - The Range - The range is in a set of data is the difference between the largest and smallest observations

Measures of Variability - Sample Variance - The sample variance, s2, is the sum of the squared differences between each observation and the sample mean divided by the sample size minus 1.

Measures of Variability - Short-cut Formulas for Sample Variance - Short-cut formulas for the sample variance are:

Measures of Variability - Population Variance - The population variance, 2, is the sum of the squared differences between each observation and the population mean divided by the population size, N.

Measures of Variability - Sample Standard Deviation - The sample standard deviation, s, is the positive square root of the variance, and is defined as:

Measures of Variability - Population Standard Deviation- The population standard deviation, , is

The Empirical Rule (the 68%, 95%, or almost all rule) For a set of data with a mound-shaped histogram, the Empirical Rule is: approximately 68% of the observations are contained with a distance of one standard deviation around the mean;  1 approximately 95% of the observations are contained with a distance of two standard deviations around the mean;  2 almost all of the observations are contained with a distance of three standard deviation around the mean;  3

Coefficient of Variation The Coefficient of Variation, CV, is a measure of relative dispersion that expresses the standard deviation as a percentage of the mean (provided the mean is positive). The sample coefficient of variation is The population coefficient of variation is

Percentiles and Quartiles Data must first be in ascending order. Percentiles separate large ordered data sets into 100ths. The Pth percentile is a number such that P percent of all the observations are at or below that number. Quartiles are descriptive measures that separate large ordered data sets into four quarters.

Percentiles and Quartiles The first quartile, Q1, is another name for the 25th percentile. The first quartile divides the ordered data such that 25% of the observations are at or below this value. Q1 is located in the .25(n+1)st position when the data is in ascending order. That is,

Percentiles and Quartiles The third quartile, Q3, is another name for the 75th percentile. The first quartile divides the ordered data such that 75% of the observations are at or below this value. Q3 is located in the .75(n+1)st position when the data is in ascending order. That is,

Interquartile Range The Interquartile Range (IQR) measures the spread in the middle 50% of the data; that is the difference between the observations at the 25th and the 75th percentiles:

Five-Number Summary The Five-Number Summary refers to the five descriptive measures: minimum, first quartile, median, third quartile, and the maximum.

Box-and-Whisker Plots A Box-and-Whisker Plot is a graphical procedure that uses the Five-Number summary. A Box-and-Whisker Plot consists of an inner box that shows the numbers which span the range from Q1 Box-and-Whisker Plot to Q3. a line drawn through the box at the median. The “whiskers” are lines drawn from Q1 to the minimum vale, and from Q3 to the maximum value.

Box-and-Whisker Plots (Excel)

Grouped Data Mean For a population of N observations the mean is For a sample of n observations, the mean is Where the data set contains observation values m1, m2, . . ., mk occurring with frequencies f1, f2, . . . fK respectively

Grouped Data Variance For a population of N observations the variance is For a sample of n observations, the variance is Where the data set contains observation values m1, m2, . . ., mk occurring with frequencies f1, f2, . . . fK respectively

Key Words Arithmetic Mean Bar Chart Box-and-Whisker Plot Categorical Variable Coefficient of Variation Continuous Numerical Variable Cumulative Frequency Distribution Discrete Numerical Variable Empirical Rule First Quartile Five-Number Summary Frequency Distribution Geometric Mean Histogram Interquartile Range (IQR) Line Chart (Time Plot) Measurement Levels Median Mode

Key Words (continued) Numerical Variables Ogive Outlier Parameter Pareto Diagram Percentiles Pie Chart Qualitative Quantitative Variables Quartiles Range Relative Cumulative Frequency Distribution Short-cut Formula for s2 Skewness Standard Deviation Statistic Stem-and-Leaf Display Third Quartile Variance