Introduction to Summary Statistics

Slides:



Advertisements
Similar presentations
Unit 1.1 Investigating Data 1. Frequency and Histograms CCSS: S.ID.1 Represent data with plots on the real number line (dot plots, histograms, and box.
Advertisements

Appendix A. Descriptive Statistics Statistics used to organize and summarize data in a meaningful way.
Introduction to Summary Statistics
Forging new generations of engineers. Introductionto Basic Statistics.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 3-1 Introduction to Statistics Chapter 3 Using Statistics to summarize.
Histogram A frequency plot that shows the number of times a response or range of responses occurred in a data set.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Programming in R Describing Univariate and Multivariate data.
Summarizing Scores With Measures of Central Tendency
Objective To understand measures of central tendency and use them to analyze data.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Statistics Workshop Tutorial 3
Chapter 3 Statistics for Describing, Exploring, and Comparing Data
1 Measure of Center  Measure of Center the value at the center or middle of a data set 1.Mean 2.Median 3.Mode 4.Midrange (rarely used)
Statistics Chapter 9. Statistics Statistics, the collection, tabulation, analysis, interpretation, and presentation of numerical data, provide a viable.
Data Handbook Chapter 4 & 5. Data A series of readings that represents a natural population parameter A series of readings that represents a natural population.
BUS250 Seminar 4. Mean: the arithmetic average of a set of data or sum of the values divided by the number of values. Median: the middle value of a data.
Introduction to Summary Statistics. Statistics The collection, evaluation, and interpretation of data Statistical analysis of measurements can help verify.
Introduction to Summary Statistics
Mr. Wax Meridian Joint School District No 2 1/11/2013
Created by Tom Wegleitner, Centreville, Virginia Section 2-4 Measures of Center.
1.1 EXPLORING STATISTICAL QUESTIONS Unit 1 Data Displays and Number Systems.
Lecture 3 Describing Data Using Numerical Measures.
Mean, Median, Mode & Range. Mean A number that represents the centre, or average, of a set of numbers; to find the mean, add the numbers in the set, then.
Skewness & Kurtosis: Reference
An Introduction to Statistics. Two Branches of Statistical Methods Descriptive statistics Techniques for describing data in abbreviated, symbolic fashion.
7.7 Statistics and Statistical Graphs. Learning Targets  Students should be able to… Use measures of central tendency and measures of dispersion to describe.
1 Descriptive Statistics 2-1 Overview 2-2 Summarizing Data with Frequency Tables 2-3 Pictures of Data 2-4 Measures of Center 2-5 Measures of Variation.
7.7 Statistics & Statistical Graphs p.445. An intro to Statistics Statistics – numerical values used to summarize & compare sets of data (such as ERA.
Mean = The sum of the data divided by the number of items in the data set. Median = The middle number in a set of data when the data are arranged in numerical.
Cumulative frequency Cumulative frequency graph
NOTES #9 CREATING DOT PLOTS & READING FREQUENCY TABLES.
Measures of Central Tendency (MCT) 1. Describe how MCT describe data 2. Explain mean, median & mode 3. Explain sample means 4. Explain “deviations around.
Probability and Statistics 12/11/2015. Statistics Review/ Excel: Objectives Be able to find the mean, median, mode and standard deviation for a set of.
Statistics Josée L. Jarry, Ph.D., C.Psych. Introduction to Psychology Department of Psychology University of Toronto June 9, 2003.
Statistics Unit Test Review Chapters 11 & /11-2 Mean(average): the sum of the data divided by the number of pieces of data Median: the value appearing.
Statistics Review  Mode: the number that occurs most frequently in the data set (could have more than 1)  Median : the value when the data set is listed.
Statistics -Descriptive statistics 2013/09/30. Descriptive statistics Numerical measures of location, dispersion, shape, and association are also used.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Measures of Center.
Slide 1 Copyright © 2004 Pearson Education, Inc.  Descriptive Statistics summarize or describe the important characteristics of a known set of population.
INTRODUCTION TO STATISTICS
Statistical Methods Michael J. Watts
Probability and Statistics
Statistical Methods Michael J. Watts
Chapter 3 Describing Data Using Numerical Measures
Statistics Unit Test Review
Introduction to Summary Statistics
Measures of Central Tendency
Summarizing Scores With Measures of Central Tendency
Introduction to Summary Statistics
Introduction to Summary Statistics
Introduction to Summary Statistics
Introduction to Summary Statistics
Midrange (rarely used)
6th Grade Math Lab MS Jorgensen 1A, 3A, 3B.
Description of Data (Summary and Variability measures)
Introduction to Summary Statistics
Do Now: 1. About how many acres of land were burned by wildfires in 1992? 2. The number of acres burned in 2007 is about the same as the number burned.
Introduction to Summary Statistics
Introduction to Summary Statistics
DS4 Interpreting Sets of Data
Types of Distributions
Introduction to Summary Statistics
Introduction to Summary Statistics
Statistics: The Interpretation of Data
Introduction to Summary Statistics
Introduction to Summary Statistics
Introduction to Basic Statistics
Introduction to Summary Statistics
Mean.
Presentation transcript:

Introduction to Summary Statistics Introduction to Engineering Design Unit 3 – Measurement and Statistics Introduction to Summary Statistics

Introduction to Summary Statistics The collection, evaluation, and interpretation of data Statistical analysis of measurements can help verify the quality of a design or process

Introduction to Summary Statistics Central Tendency “Center” of a distribution Mean, median, mode Variation Spread of values around the center Range, standard deviation, interquartile range Distribution Summary of the frequency of values Frequency tables, histograms, normal distribution [click] The average value of a variable (like rainfall depth) is one type of statistic that indicates central tendency – it gives an indication of the center of the data. The median and mode are two other indications of central tendency. But often we need more details on how much a quantity can vary. [click] Statistical dispersion is the variability or spread of data. The range, standard deviation and interquartile range are indications of dispersion. [click] Even more detail about a variable can be shown by a frequency distribution which shows a summary of the data values are distributed throughout the range of values. Frequency distributions can be shown by frequency tables, histograms, box plots, or other summaries.

Introduction to Summary Statistics Mean Central Tendency The mean is the sum of the values of a set of data divided by the number of values in that data set. μ = mean value xi = individual data value x i = summation of all data values N = # of data values in the data set μ = x i N The mean is the most frequently used measure of central tendency. It is strongly influenced by outliers which are very large or very small data values that do not seem to fit with the majority of data.

Introduction to Summary Statistics Mean Central Tendency Data Set 3 7 12 17 21 21 23 27 32 36 44 Sum of the values = 243 Number of values = 11 μ = x i N 243 Mean = = = 22.09 11

Introduction to Summary Statistics Mode Central Tendency Measure of central tendency The most frequently occurring value in a set of data is the mode Symbol is M Data Set: 27 17 12 7 21 44 23 3 36 32 21

Introduction to Summary Statistics Mode Central Tendency The most frequently occurring value in a set of data is the mode Data Set: 3 7 12 17 21 21 23 27 32 36 44 There are two occurrences of 21. [click] Every other data value has only one occurrence. Therefore, 21 is the mode. [click] Mode = M = 21

Introduction to Summary Statistics Mode Central Tendency The most frequently occurring value in a set of data is the mode. Bimodal Data Set: Two numbers of equal frequency stand out Multimodal Data Set: If more than two numbers of equal frequency stand out

Introduction to Summary Statistics Mode Central Tendency Determine the mode of 48, 63, 62, 49, 58, 2, 63, 5, 60, 59, 55 Mode = 63 Determine the mode of 48, 63, 62, 59, 58, 2, 63, 5, 60, 59, 55 Mode = 63 & 59 Bimodal Determine the mode of 48, 63, 62, 59, 48, 2, 63, 5, 60, 59, 55 Mode = 63, 59, & 48 Multimodal

Median Central Tendency Introduction to Summary Statistics Median Central Tendency Measure of central tendency The median is the value that occurs in the middle of a set of data that has been arranged in numerical order Symbol is x, pronounced “x-tilde” ~ The median divides the data into two sets which contain an equal number of data values.

Median Central Tendency Introduction to Summary Statistics Median Central Tendency The median is the value that occurs in the middle of a set of data that has been arranged in numerical order. Data Set: 27 17 12 7 21 44 23 3 36 32 21

Median Central Tendency Introduction to Summary Statistics Median Central Tendency A data set that contains an odd number of values always has a Median. Data Set: 3 7 12 17 21 21 23 27 32 36 44

Median Central Tendency Introduction to Summary Statistics Median Central Tendency For a data set that contains an even number of values, the two middle values are averaged with the result being the Median. Data Set: 3 7 12 17 21 21 23 27 31 32 36 44

Introduction to Summary Statistics Range Variation Measure of data variation. The range is the difference between the largest and smallest values that occur in a set of data. Symbol is R Data Set: 3 7 12 17 21 21 23 27 32 36 44 Range = R = 44 – 3 = 41

Histogram Distribution Introduction to Summary Statistics Histogram Distribution A histogram is a common data distribution chart that is used to show the frequency with which specific values, or values within ranges, occur in a set of data. An engineer might use a histogram to show the variation of a dimension that exists among a group of parts that are intended to be identical.

Histogram Distribution Introduction to Summary Statistics Histogram Distribution Large sets of data are often divided into limited number of groups. These groups are called class intervals. -6 to -16 -5 to 5 6 to 16 Class Intervals

Histogram Distribution Introduction to Summary Statistics Histogram Distribution The number of data elements in each class interval is shown by the frequency, which occurs along the Y-axis of the graph 7 5 Frequency 3 1 -16 to -6 -5 to 5 6 to 16

Histogram Distribution Introduction to Summary Statistics Histogram Distribution Example 1, 7, 15, 4, 8, 8, 5, 12, 10 1, 4, 5, 7, 8, 8, 10, 12,15 Let’s create a histogram to represent this data set. [click] From this example, we will break the data into ranges of 1 to 5, 2 to 10, and 11 to 15. So, we will place labels on the x-axis to indicate these ranges. [click] Let’s reorder the data to make it easier to divide into the ranges. [click] [click] 4 3 Frequency 2 1 1 to 5 6 to 10 11 to 15

Histogram Distribution Introduction to Summary Statistics Histogram Distribution The height of each bar in the chart indicates the number of data elements, or frequency of occurrence, within each range 1, 4, 5, 7, 8, 8, 10, 12,15 The height of each bar in a histogram indicates the number of data elements, or frequency of occurrence, within each range. Now, looking at the data, you can see there are three data values in the set that are in the range 1 to 5. [click] There are four data values in the range 6 to 10. [click] And there are three data values in the range 11 to 15. [click] Note, that you can always determine the number of data points in a data set from the histogram by adding the frequencies from all of the data ranges. In this case we add three, four and two to get a total of nine data points. 4 3 Frequency 2 1 1 to 5 6 to 10 11 to 15

Histogram Distribution Introduction to Summary Statistics Histogram Distribution This histogram represents the side length of twenty seven small wooden cubes. The minimum measurement was 0.745 in. [click] and the maximum length measurement was 0.760 inches. Each data value between the min and max fall within one of the classes on the horizontal (x-) axis. Note that in this case, each value indicated on the horizontal axis actually represents a class interval - a range of values. For instance, 0.745 represents the values 0.745 ≤ x < 0.746. Due to the precision of the measurement device, the data are recorded to the thousandth (e.g. 0.745) and will therefore correspond to one of the values shown on the axis. The bars are drawn to show the number of cubes that have a side length within each interval – the frequency of occurrence. For instance, of the twenty seven cubes, two have a side length of 0.746 in. (and therefore fall within the 0.746 ≤ x < 0.747 class interval). [click] And four have a measured length of 0.750 in. (and fall within the 0.750 ≤ x < 0.751 class interval). [click] MINIMUM = 0.745 in. MAXIMUM = 0.760 in. Class Intervals

Introduction to Summary Statistics Dot Plot Distribution 3 -1 -3 3 2 1 -1 -1 2 1 1 -1 -2 1 2 1 -2 -4 Another way to represent data is a dot plot. A dot plot is similar to a histogram in that it shows frequency of occurrence of data values. To represent the data in the table, we place a dot for each data point directly above the data value. [click] So there is one dot for each data point. [click many times] -6 -5 -4 -3 -2 -1 1 2 3 4 5 6

Introduction to Summary Statistics Dot Plot Distribution 3 -1 -3 3 2 1 -1 -1 2 1 1 -1 -2 1 2 1 -2 -4 Dot plots are easily changed to histograms by replacing the dots with bars of the appropriate height indicating the frequency. [click] 5 Frequency 3 1 -6 -5 -4 -3 -2 -1 1 2 3 4 5 6

Normal Distribution Distribution Introduction to Summary Statistics Normal Distribution Distribution “Is the data distribution normal?” Translation: Is the histogram/dot plot bell-shaped? Does the greatest frequency of the data values occur at about the mean value? Does the curve decrease on both sides away from the mean? Is the curve symmetric about the mean?

Normal Distribution Distribution Introduction to Summary Statistics Normal Distribution Distribution Bell shaped curve Frequency In this example the data values are fairly evenly distributed about the mean. Approximately half of the values that are not mean values are less than the mean and approximately half are greater than the mean. And, the frequency of occurrence decreases as the value of the data point moves farther away from the mean. The data appears to form a bell shaped curve. [click] This data set looks to be normally distributed. -6 -5 -4 -3 -2 -1 1 2 3 4 5 6 Data Elements

Normal Distribution Distribution Introduction to Summary Statistics Normal Distribution Distribution Does the greatest frequency of the data values occur at about the mean value? Mean Value Frequency The highest frequency of values in this example occur at zero, which is the mean value of the data set. -6 -5 -4 -3 -2 -1 1 2 3 4 5 6 Data Elements

Normal Distribution Distribution Introduction to Summary Statistics Normal Distribution Distribution Does the curve decrease on both sides away from the mean? Mean Value Frequency The highest frequency of values in this example occur at zero, which is the mean value of the data set. -6 -5 -4 -3 -2 -1 1 2 3 4 5 6 Data Elements

Normal Distribution Distribution Introduction to Summary Statistics Normal Distribution Distribution Is the curve symmetric about the mean? Mean Value Frequency The highest frequency of values in this example occur at zero, which is the mean value of the data set. -6 -5 -4 -3 -2 -1 1 2 3 4 5 6 Data Elements

What if things are not equal? Introduction to Summary Statistics What if things are not equal? Although a normal distribution is the most common probability distribution in statistics and science, some quantities are not normally distributed. A visual analysis can help you decide if a normal distribution is a good representation for your data (although mathematical tests are usually necessary). If the data is skewed you should not assume a normal distribution of data. This histogram shows that the data is skewed to the right, that is, there is a longer “tail” to the right. Histogram Interpretation: Skewed (Non-Normal) Right