Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 1 Chapter 4 Numerical Methods for Describing Data.

Slides:

Advertisements

Similar presentations

Describing Quantitative Variables

Advertisements

DESCRIBING DISTRIBUTION NUMERICALLY

Chapter 2 Exploring Data with Graphs and Numerical Summaries

Descriptive Measures MARE 250 Dr. Jason Turner.

Measures of Dispersion

1 Chapter 1: Sampling and Descriptive Statistics.

Descriptive Statistics

Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.

1 1 Slide © 2003 South-Western/Thomson Learning TM Slides Prepared by JOHN S. LOUCKS St. Edward’s University.

Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.

MEASURES OF SPREAD – VARIABILITY- DIVERSITY- VARIATION-DISPERSION

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter Two Treatment of Data.

Slides by JOHN LOUCKS St. Edward’s University.

Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 3-1 Introduction to Statistics Chapter 3 Using Statistics to summarize.

Chapter In Chapter 3… … we used stemplots to look at shape, central location, and spread of a distribution. In this chapter we use numerical summaries.

BPS - 5th Ed. Chapter 21 Describing Distributions with Numbers.

Basic Practice of Statistics - 3rd Edition

Chap 3-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 3 Describing Data: Numerical Statistics for Business and Economics.

CHAPTER 2: Describing Distributions with Numbers

Quartiles and the Interquartile Range.  Comparing shape, center, and spreads of two or more distributions  Distribution has too many values for a stem.

Programming in R Describing Univariate and Multivariate data.

Chapter 2 Describing Data with Numerical Measurements General Objectives: Graphs are extremely useful for the visual description of a data set. However,

Chapter 3 - Part B Descriptive Statistics: Numerical Methods

CHAPTER 2: Describing Distributions with Numbers ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.

Objectives 1.2 Describing distributions with numbers

1 1 Slide © 2001 South-Western /Thomson Learning  Anderson  Sweeney  Williams Anderson  Sweeney  Williams  Slides Prepared by JOHN LOUCKS  CONTEMPORARYBUSINESSSTATISTICS.

Numerical Descriptive Techniques

1 1 Slide © 2009 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS St. Edward’s University.

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 1 Overview and Descriptive Statistics.

© Copyright McGraw-Hill CHAPTER 3 Data Description.

© 2008 Brooks/Cole, a division of Thomson Learning, Inc. 1 Chapter 4 Numerical Methods for Describing Data.

Descriptive Statistics Measures of Variation. Essentials: Measures of Variation (Variation – a must for statistical analysis.) Know the types of measures.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. Turning Data Into Information Chapter 2.

1 Laugh, and the world laughs with you. Weep and you weep alone.~Shakespeare~

STAT 280: Elementary Applied Statistics Describing Data Using Numerical Measures.

1 1 Slide Slides Prepared by JOHN S. LOUCKS St. Edward’s University © 2002 South-Western/Thomson Learning.

Numerical Methods for Describing Data

Describing distributions with numbers

Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 3 Section 2 – Slide 1 of 27 Chapter 3 Section 2 Measures of Dispersion.

Numerical Statistics Given a set of data (numbers and a context) we are interested in how to describe the entire set without listing all the elements.

Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.

Unit 3 Lesson 2 (4.2) Numerical Methods for Describing Data

Chapter 3 Descriptive Statistics II: Additional Descriptive Measures and Data Displays.

Chapter 3 Looking at Data: Distributions Chapter Three

Essential Statistics Chapter 21 Describing Distributions with Numbers.

Chapter 3, Part B Descriptive Statistics: Numerical Measures n Measures of Distribution Shape, Relative Location, and Detecting Outliers n Exploratory.

Chapter 2 Describing Distributions with Numbers. Numerical Summaries u Center of the data –mean –median u Variation –range –quartiles (interquartile range)

1 Chapter 4 Numerical Methods for Describing Data.

1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 3 Graphical Methods for Describing Data.

BPS - 5th Ed. Chapter 21 Describing Distributions with Numbers.

Statistics topics from both Math 1 and Math 2, both featured on the GHSGT.

Numerical descriptions of distributions

Honors Statistics Chapter 3 Measures of Variation.

Chapter 5 Describing Distributions Numerically Describing a Quantitative Variable using Percentiles Percentile –A given percent of the observations are.

Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.

Slide 1 Copyright © 2004 Pearson Education, Inc.  Descriptive Statistics summarize or describe the important characteristics of a known set of population.

CHAPTER 4 NUMERICAL METHODS FOR DESCRIBING DATA What trends can be determined from individual data sets?

Numerical Methods for Describing Data

Measures of Dispersion

To compare information such as the mean and standard deviation it is useful to be able to describe how far away a particular observation is from the mean.

Laugh, and the world laughs with you. Weep and you weep alone

Chapter 3 Describing Data Using Numerical Measures

Numerical Descriptive Measures

Displaying Distributions with Graphs

Displaying and Summarizing Quantitative Data

Basic Practice of Statistics - 3rd Edition

Essential Statistics Describing Distributions with Numbers

Basic Practice of Statistics - 3rd Edition

Basic Practice of Statistics - 3rd Edition

Presentation transcript:

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 1 Chapter 4 Numerical Methods for Describing Data

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 2 Describing the Center of a Data Set with the arithmetic mean

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 3 Describing the Center of a Data Set with the arithmetic mean The population mean is denoted by µ, is the average of all x values in the entire population.

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 4 The “average” or mean price for this sample of 10 houses in Fancytown is $295,000 Example calculations During a two week period 10 houses were sold in Fancytown.

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 5 Describing the Center of a Data Set with the median The sample median is obtained by first ordering the n observations from smallest to largest (with any repeated values included, so that every sample observation appears in the ordered list). Then

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 6 Comparing the Sample Mean & Sample Median

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 7 Comparing the Sample Mean & Sample Median

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 8 The Trimmed Mean A trimmed mean is computed by first ordering the data values from smallest to largest, deleting a selected number of values from each end of the ordered list, and finally computing the mean of the remaining values. The trimming percentage is the percentage of values deleted from each end of the ordered list.

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 9 Example of Trimmed Mean

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 10 Categorical Data - Sample Proportion

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 11 If we look at the student data sample, consider the variable gender and treat being female as a success, we have 25 of the sample of 79 students are female, so the sample proportion (of females) is Categorical Data - Sample Proportion

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 12 Describing Variability The simplest numerical measure of the variability of a numerical data set is the range, which is defined to be the difference between the largest and smallest data values. range = maximum - minimum

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 13 Describing Variability The n deviations from the sample mean are the differences: Note: The sum of all of the deviations from the sample mean will be equal to 0, except possibly for the effects of rounding the numbers. This means that the average deviation from the mean is always 0 and cannot be used as a measure of variability.

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 14 Sample Variance The sample variance, denoted s 2 is the sum of the squared deviations from the mean divided by n-1.

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 15 Sample Standard Deviation The sample standard deviation, denoted s is the positive square root of the sample variance. The population standard deviation is denoted by .

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 16 Example calculations 10 Macintosh Apples were randomly selected and weighed (in ounces).

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 17 Calculator Formula for s 2 and s A computational formula for the sample variance is given by A little algebra can establish the sum of the square deviations,

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 18 Calculations Revisited The values for s 2 and s are exactly the same as were obtained earlier.

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 19 Quartiles and the Interquartile Range Lower quartile (Q 1 ) = median of the lower half of the data set. Upper Quartile (Q 3 ) = median of the upper half of the data set. Note: If n is odd, the median is excluded from both the lower and upper halves of the data. The interquartile range (iqr), a resistant measure of variability is given by iqr = upper quartile – lower quartile = Q 3 – Q 1

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 20 Skeletal Boxplot Example Using the student work hours data we have

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 21 Outliers An observations is an outlier if it is more than 1.5 iqr away from the closest end of the box (less than the lower quartile minus 1.5 iqr or more than the upper quartile plus 1.5 iqr. An outlier is extreme if it is more than 3 iqr from the closest end of the box, and it is mild otherwise.

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 22 Modified Boxplots A modified boxplot represents mild outliers by shaded circles and extreme outliers by open circles. Whiskers extend on each end to the most extreme observations that are not outliers.

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 23 Modified Boxplot Example Using the student work hours data we have Lower quartile iqr = (6) = -1 Upper quartile iqr = (6) = 23 Smallest data value that isn’t an outlier Largest data value that isn’t an outlier Upper quartile + 3 iqr = (6) = 32 Mild Outlier

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 24 Modified Boxplot Example Consider the ages of the 79 students from the classroom data set from the slideshow Chapter 3. Iqr = 22 – 19 = Median Lower Quartile Upper Quartile Moderate OutliersExtreme Outliers Lower quartile – 3 iqr = 10 Lower quartile – 1.5 iqr =14.5 Upper quartile + 3 iqr = 31 Upper quartile iqr = 26.5

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 25 Smallest data value that isn’t an outlier Largest data value that isn’t an outlier Mild Outliers Extreme Outliers Modified Boxplot Example Here is the modified boxplot for the student age data.

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 26 Modified Boxplot Example Here is the same boxplot reproduced with a vertical orientation.

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 27 Comparative Boxplot Example Females Males GenderGender Student Weight By putting boxplots of two separate groups or subgroups we can compare their distributional behaviors. Notice that the distributional pattern of female and male student weights have similar shapes, although the females are roughly 20 lbs lighter (as a group).

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 28 Interpreting Variability Chebyshev’s Rule

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 29 For specific values of k Chebyshev’s Rule reads  At least 75% of the observations are within 2 standard deviations of the mean.  At least 89% of the observations are within 3 standard deviations of the mean.  At least 90% of the observations are within 3.16 standard deviations of the mean.  At least 94% of the observations are within 4 standard deviations of the mean.  At least 96% of the observations are within 5 standard deviations of the mean.  At least 99 % of the observations are with 10 standard deviations of the mean. Interpreting Variability Chebyshev’s Rule

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 30 Consider the student age data Example - Chebyshev’s Rule Color code: within 1 standard deviation of the mean within 2 standard deviations of the mean within 3 standard deviations of the mean within 4 standard deviations of the mean within 5 standard deviations of the mean

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 31 Summarizing the student age data Example - Chebyshev’s Rule IntervalChebyshev’sActual within 1 standard deviation of the mean  0% 72/79 = 91.1% within 2 standard deviations of the mean  75% 75/79 = 94.9% within 3 standard deviations of the mean  88.8% 76/79 = 96.2% within 4 standard deviations of the mean  93.8% 77/79 = 97.5% within 5 standard deviations of the mean  96.0% 79/79 = 100% Notice that Chebyshev gives very conservative lower bounds and the values aren’t very close to the actual percentages.

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 32 Empirical Rule If the histogram of values in a data set is reasonably symmetric and unimodal (specifically, is reasonably approximated by a normal curve), then 1.Approximately 68% of the observations are within 1 standard deviation of the mean. 2.Approximately 95% of the observations are within 2 standard deviation of the mean. 3.Approximately 99.7% of the observations are within 3 standard deviation of the mean.

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 33 Z Scores The z score is how many standard deviations the observation is from the mean. A positive z score indicates the observation is above the mean and a negative z score indicates the observation is below the mean.

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 34 Computing the z score is often referred to as standardization and the z score is called a standardized score. Z Scores

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 35 Example A sample of GPAs of 38 statistics students appear below (sorted in increasing order)

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 36 Example The following stem and leaf indicates that the GPA data is reasonably symmetric and unimodal Stem: Units digit Leaf: Tenths digit

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 37 Example

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 38 Example IntervalEmpirical RuleActual within 1 standard deviation of the mean  68% 27/38 = 71% within 2 standard deviations of the mean  95% 37/38 = 97% within 3 standard deviations of the mean  99.7% 38/38 = 100% Notice that the empirical rule gives reasonably good estimates for this example.