2011 Summer ERIE/REU Program Descriptive Statistics Igor Jankovic Department of Civil, Structural, and Environmental Engineering University at Buffalo,

Slides:



Advertisements
Similar presentations
Describing Quantitative Variables
Advertisements

Chapter 2 Exploring Data with Graphs and Numerical Summaries
Descriptive Measures MARE 250 Dr. Jason Turner.
Unit 1.1 Investigating Data 1. Frequency and Histograms CCSS: S.ID.1 Represent data with plots on the real number line (dot plots, histograms, and box.
1 Chapter 1: Sampling and Descriptive Statistics.
Descriptive Statistics
Descriptive Statistics – Central Tendency & Variability Chapter 3 (Part 2) MSIS 111 Prof. Nick Dedeke.
ISE 261 PROBABILISTIC SYSTEMS. Chapter One Descriptive Statistics.
Slides by JOHN LOUCKS St. Edward’s University.
B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.
Descriptive statistics (Part I)
QBM117 Business Statistics
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Chapter 2 Describing Data with Numerical Measurements
Agresti/Franklin Statistics, 1 of 63 Chapter 2 Exploring Data with Graphs and Numerical Summaries Learn …. The Different Types of Data The Use of Graphs.
AP Statistics Chapters 0 & 1 Review. Variables fall into two main categories: A categorical, or qualitative, variable places an individual into one of.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Chapter 2 Describing Data with Numerical Measurements General Objectives: Graphs are extremely useful for the visual description of a data set. However,
Descriptive Statistics
Census A survey to collect data on the entire population.   Data The facts and figures collected, analyzed, and summarized for presentation and.
6.1 What is Statistics? Definition: Statistics – science of collecting, analyzing, and interpreting data in such a way that the conclusions can be objectively.
Sta220 - Statistics Mr. Smith Room 310 Class #3. Section
Methods for Describing Sets of Data
Sta220 - Statistics Mr. Smith Room 310 Class #3. Section
STAT 211 – 019 Dan Piett West Virginia University Lecture 1.
Chapter 2: Methods for Describing Sets of Data
Review Measures of central tendency
1 Laugh, and the world laughs with you. Weep and you weep alone.~Shakespeare~
1 MATB344 Applied Statistics Chapter 2 Describing Data with Numerical Measures.
Chapter 2 Describing Data.
Ex St 801 Statistical Methods Introduction. Basic Definitions STATISTICS : Area of science concerned with extraction of information from numerical data.
Skewness & Kurtosis: Reference
1 CHAPTER 3 NUMERICAL DESCRIPTIVE MEASURES. 2 MEASURES OF CENTRAL TENDENCY FOR UNGROUPED DATA  In Chapter 2, we used tables and graphs to summarize a.
Bellwork 1. If a distribution is skewed to the right, which of the following is true? a) the mean must be less than the.
Descriptive Statistics. Frequency Distributions and Their Graphs What you should learn: How to construct a frequency distribution including midpoints,
Chapter 8 Making Sense of Data in Six Sigma and Lean
Displaying Distributions with Graphs. the science of collecting, analyzing, and drawing conclusions from data.
1 Descriptive Statistics 2-1 Overview 2-2 Summarizing Data with Frequency Tables 2-3 Pictures of Data 2-4 Measures of Center 2-5 Measures of Variation.
Engineering Statistics KANCHALA SUDTACHAT. Statistics  Deals with  Collection  Presentation  Analysis and use of data to make decision  Solve problems.
CHAPTER 1 Basic Statistics Statistics in Engineering
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-1 Business Statistics, 4e by Ken Black Chapter 3 Descriptive Statistics.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall2(2)-1 Chapter 2: Displaying and Summarizing Data Part 2: Descriptive Statistics.
CHAPTER 1 Basic Statistics Statistics in Engineering
Basic Statistics  Statistics in Engineering (collect, organize, analyze, interpret)  Collecting Engineering Data  Data Presentation and Summary  Types.
2-1 Copyright © 2014, 2011, and 2008 Pearson Education, Inc.
Statistics Unit Test Review Chapters 11 & /11-2 Mean(average): the sum of the data divided by the number of pieces of data Median: the value appearing.
Describing Data Week 1 The W’s (Where do the Numbers come from?) Who: Who was measured? By Whom: Who did the measuring What: What was measured? Where:
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
Statistics -Descriptive statistics 2013/09/30. Descriptive statistics Numerical measures of location, dispersion, shape, and association are also used.
Slide 1 Copyright © 2004 Pearson Education, Inc.  Descriptive Statistics summarize or describe the important characteristics of a known set of population.
COMPLETE BUSINESS STATISTICS
Methods for Describing Sets of Data
STATISTICS Statistics ??? Meaning : Numerical facts
Basic Statistics Statistics in Engineering (collect, organize, analyze, interpret) Collecting Engineering Data Data Presentation and Summary Types of.
ISE 261 PROBABILISTIC SYSTEMS
2.5: Numerical Measures of Variability (Spread)
Statistics Unit Test Review
Chapter 6 ENGR 201: Statistics for Engineers
Averages and Variation
NUMERICAL DESCRIPTIVE MEASURES
Description of Data (Summary and Variability measures)
Laugh, and the world laughs with you. Weep and you weep alone
Descriptive Statistics
Percentiles and Box-and- Whisker Plots
Displaying and Summarizing Quantitative Data
Statistics: The Interpretation of Data
Honors Statistics Review Chapters 4 - 5
NUMERICAL DESCRIPTIVE MEASURES
Descriptive Statistics Civil and Environmental Engineering Dept.
Presentation transcript:

2011 Summer ERIE/REU Program Descriptive Statistics Igor Jankovic Department of Civil, Structural, and Environmental Engineering University at Buffalo, State University of New York

Content Statistics terminology 1.Population vs. Sample 2.Descriptive statistics vs. Inferential statistics 3.Data Types Presentation of qualitative data 1.Graphical method 2.Numerical method Presentation of quantitative data 1.Graphical method 2.Numerical method Outliers in a data set

Population vs. Sample Population: an entire data set that is the target of our interest Sample: a subset of data selected from a population Example: Electrical engineers recognize that high natural current in computer power system is a potential problem. To determine the extent of the problem, a survey of the computer power system load currents at 146 US sites taken (IEEE Transaction on Industry Applications, July/August 1990). The survey revealed that less than 10% of the sites had high neutral to full-load current ratios. Identify the population of interest (powerload status at all US sites with computer powers systems) Identify the sample (powerload status at 146 US sites with computer powers systems Use of the sample information to make an inference about population (less than 10% of the sites had high neutral to full-load current ratios)

Descriptive statistics vs. Inferential statistics Two major applications of Statistics: -Summarizing, describing, and exploring data -Using sample data to infer the nature of the population data set In other words, Descriptive statistics -The branch of statistics devoted to the organization, summarization, and description of data sets Inferential statistics -The branch of statistics concerned with using sample data to make an inference about populations

Data Types Quantitative Data: The data that represent the quantity or amount of something Qualitative (categorical) Data: The data that have no quantitative interpretation Example: Length (in centimeters), weight (in grams), DDT concentration (in ppm): quantitative data Location and species: qualitative data

Qualitative Data

Graphical method for describing qualitative data For qualitative data, we define the categories in such a way that each observation can fall in one and only one category. Example: Student distribution in terms of year at college in EAS 308 Pareto diagram Horizontal Bar Graph Pie Chart

Numerical method for describing qualitative data For qualitative data, we define the categories in such a way that each observation can fall in one and only one category. Category frequency for a given category is the number of observations that fall in that category Category relative frequency for a given category is the proportion of the total number of observations that fall in that category Summary frequency table

Quantitative Data

Graphical method for describing quantitative data (1) Dot plots Steps: 1.Draw a horizontal scale that spans the range of data 2.Place a dot over the appropriate value on the scale representing the value of observations 3.If data value repeats, then the dots are placed on top of each other

Graphical method for describing quantitative data (2) Histograms (most popular and traditional method for describing quantitative data) Steps: 1.Calculate the range of data 2.Divide the range into 5-20 classes of equal width 3.For each class, count the number (class frequency) of observations that fall in the class 4.Calculate each relative class frequency = (class frequency)/ total number of measurements

Graphical method for describing quantitative data (3) Stem-and-Leaf Display Steps: 1.Divide each observation in the data set into two parts, the stem and the leaf. For example, the stem and leaf of the CPU time 2.41 are 2, and 41, respectively. Stem Leaf List the stems in order in a column, starting with the smallest stem and ending with the largest. 3. Proceed through the data set, placing the leaf for each observation in the appropriate stem row.

Numerical method for describing quantitative data Measures of central tendency - help to locate the center of the relative frequency distribution - -Arithmetic mean (mean) Suppose we have a set of n measurements, y 1,y 2,y 3,…,y n, The arithmetic mean = Generally, we use to represent sample mean and  to represent population mean -Median Median is the middle number when the measurements are arranged in ascending (descending) order y [(n+1)/2], if n is odd Median = { y (n/2) + y (n/2+1) } /2, if n is even Generally, we use m to represent sample median and  to represent population median

Numerical method for describing quantitative data Measures of central tendency - help to locate the center of the relative frequency distribution - -Mode The mode of a set of n measurements, y 1,y 2,y 3,…,y n, is the value of y that occurs with the greatest frequency

Numerical method for describing quantitative data Measures of central tendency Example: We have 10 sample measurements: 4, 5, 8, 1, 11, 6, 2, 8, 3, 7 Compute the mean, median, and mode. Solution: Mean = 5.5 Median = (6+5)/ 2 = 5.5 Mode = 8

Measures of central tendency: Geometric Mean (from Wikipedia)

Measures of central tendency: Harmonic Mean (from Wikipedia)

Numerical method for describing quantitative data Measures of variation - help to locate the spread of the distribution - -Range Range = largest measurement – smallest measurement -Variance (of n measurements, measurements, y 1,y 2,y 3,…,y n ) Sample variance = Population variance =

Numerical method for describing quantitative data Measures of variation - help to locate the spread of the distribution - -Standard Deviation standard deviation of a sample = standard deviation of a population =

Skewness: measure of shape Approximate formula (accurate for large “n”) Exact formula where s is the sample standard deviation.

Kurtosis: measure of “peakedness” Approximate formula (accurate for large “n”) where s is the sample standard deviation. Exact formula

Numerical method for describing quantitative data Measures of relative standing - describes the relative position of an observation within the data set - Two measures used to describe the relative standing of an observation are percentiles and z-scores Percentiles pth percentile 100pth percentile of a data set is a value of y located so that 100 p% of the area under the relative frequency distribution for the data lies to the left of the 100pth percentile and 100 (1-p)% of the area lies to its right [note: 0  p  1] - Lower quartile, Q L,, corresponding to 25 th percentile. - Midquartile, m, corresponding to 50 th percentile. - Upper quartile, Q U, corresponding to 75 th percentile

Numerical method for describing quantitative data Measures of relative standing - describes the relative position of an observation within the data set - Two measures used to describe the relative standing of an observation are percentiles and z-scores Z-scores The z-score for a value y of a data set is the distance that y lies above or below the mean, measured in units of the standard deviation. Sample z-score: Population z-score:

Detecting Outliers Definition of an outlier: An observation y that is unusually large or small relative to the other values in a data set is called an outlier. Reasons for outliers in a data set: 1. The measurement is observed, recorded, or entered into the computer incorrectly 2. The measurement comes from a different population 3. The measurement is correct, but represents a rare (chance) event. Rule of Thumb for detecting outliers: Observations with z-scores greater than 3 in absolute value are considered outliers.

Detecting Outliers Box Plot Method Interquartile range, IQR IQR = Q U - Q L Steps to construct a Box Plot 1. Calculate the median m, lower and upper quartiles, Q L, and Q U, and IQR, for the y values in a data set 2. Construct a box on the y-axis with Q L and Q U located at the lower corners. The base width will be equal to IQR. Draw a vertical line inside the box to locate the median, m 3. Construct two sets of limits on the box plot. Inner fences are located a distance of 1.5 (IQR) below Q L and Q U ; outer fences are located a distance of 3(IQR) below Q L and above Q U. 4. Observations that fall between the inner and outer fences are called suspect outliers. Observations that fall outside the outer fences are called highly suspect outliers. 5. To further highlight extreme values, use Whiskers.

Empirical Rule If a data set has an approximately mound shaped distribution, then the following rules of thumb may be used to describe the data set Example: At least 68% of the measurements will lie within the interval ± s for samples At least 95% of the measurements will lie within the interval ±2s for samples

Summary In this lecture, we have learned: Some important statistics terminologies 1.Population vs. Sample 2.Descriptive statistics vs. Inferential statistics 3.Data Type How to deal with Qualitative data 1.Graphical method (Bar graph, Pie chart, Pareto diagram) 2.Numerical method How to deal with Quantitative data 1.Graphical method (Dot plot, Histogram, Stem and Leaf plot) 2.Numerical method How to detect outliers in a data set? Empirical Rule