Application of Statistical Techniques to Interpretation of Water Monitoring Data Eric Smith, Golde Holtzman, and Carl Zipper.

Slides:



Advertisements
Similar presentations
Box and Whisker Plots and the 5 number summary
Advertisements

Measures of Location and Dispersion
Copyright © 2014 by McGraw-Hill Higher Education. All rights reserved.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide 5- 1.
Describing Data: Measures of Dispersion
12.3 – Analyzing Data.
CS1512 Foundations of Computing Science 2 Lecture 20 Probability and statistics (2) © J R W Hunter,
Multiple-choice example
St. Edward’s University
Application of Statistical Techniques to Interpretation of Water Monitoring Data Eric Smith, Golde Holtzman, and Carl Zipper.
Data Distributions Warm Up Lesson Presentation Lesson Quiz
BPS - 5th Ed. Chapter 21 Describing Distributions with Numbers.
Chapter 3, Numerical Descriptive Measures
Quantitative Analysis (Statistics Week 8)
12-2 Conditional Probability Obj: To be able to find conditional probabilities and use formulas and tree diagrams.
Basic Statistics Measures of Central Tendency.
Chapter 2 Tutorial 2nd & 3rd LAB.
HS 67 - Intro Health Statistics Describing Distributions with Numbers
Lesson Describing Distributions with Numbers parts from Mr. Molesky’s Statmonkey website.
Descriptive Measures MARE 250 Dr. Jason Turner.
Measures of Dispersion
Descriptive Statistics
Measures of Dispersion or Measures of Variability
Chapter 3 Describing Data Using Numerical Measures
Sullivan – Statistics: Informed Decisions Using Data – 2 nd Edition – Chapter 3 Introduction – Slide 1 of 3 Topic 16 Numerically Summarizing Data- Averages.
Intro to Descriptive Statistics
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 3-1 Introduction to Statistics Chapter 3 Using Statistics to summarize.
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 3 Describing Data Using Numerical Measures.
Chapter 2 Describing distributions with numbers. Chapter Outline 1. Measuring center: the mean 2. Measuring center: the median 3. Comparing the mean and.
Describing Data: Numerical
Describing distributions with numbers
(c) 2007 IUPUI SPEA K300 (4392) Outline: Numerical Methods Measures of Central Tendency Representative value Mean Median, mode, midrange Measures of Dispersion.
Objectives 1.2 Describing distributions with numbers
Copyright © 2005 Pearson Education, Inc. Slide 6-1.
STAT 280: Elementary Applied Statistics Describing Data Using Numerical Measures.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter.
Section 1 Topic 31 Summarising metric data: Median, IQR, and boxplots.
Descriptive Statistics1 LSSG Green Belt Training Descriptive Statistics.
Measures of Relative Standing Percentiles Percentiles z-scores z-scores T-scores T-scores.
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 3 Descriptive Statistics: Numerical Methods.
Lecture 3 Describing Data Using Numerical Measures.
Skewness & Kurtosis: Reference
Lecture 5 Dustin Lueker. 2 Mode - Most frequent value. Notation: Subscripted variables n = # of units in the sample N = # of units in the population x.
Measures of Dispersion How far the data is spread out.
INVESTIGATION 1.
Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.
Chap 3-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 3 Describing Data Using Numerical.
INVESTIGATION Data Colllection Data Presentation Tabulation Diagrams Graphs Descriptive Statistics Measures of Location Measures of Dispersion Measures.
Describing Quantitative Data with Numbers Section 1.3.
Chapter 5 Describing Distributions Numerically.
BPS - 5th Ed. Chapter 21 Describing Distributions with Numbers.
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-1 Business Statistics, 4e by Ken Black Chapter 3 Descriptive Statistics.
LIS 570 Summarising and presenting data - Univariate analysis.
Foundations of Math I: Unit 3 - Statistics Arithmetic average Median: Middle of the data listed in ascending order (use if there is an outlier) Mode: Most.
MODULE 3: DESCRIPTIVE STATISTICS 2/6/2016BUS216: Probability & Statistics for Economics & Business 1.
Descriptive Statistics(Summary and Variability measures)
Chapter 5 Describing Distributions Numerically Describing a Quantitative Variable using Percentiles Percentile –A given percent of the observations are.
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
Notes 13.2 Measures of Center & Spread
Chapter 3 Describing Data Using Numerical Measures
Numerical Descriptive Measures
Description of Data (Summary and Variability measures)
Summary Statistics 9/23/2018 Summary Statistics
Numerical Measures: Skewness and Location
1.2 Describing Distributions with Numbers
Quartile Measures DCOVA
Basic Practice of Statistics - 3rd Edition
Basic Practice of Statistics - 3rd Edition
MBA 510 Lecture 2 Spring 2013 Dr. Tonya Balan 4/20/2019.
Basic Practice of Statistics - 3rd Edition
Presentation transcript:

Application of Statistical Techniques to Interpretation of Water Monitoring Data Eric Smith, Golde Holtzman, and Carl Zipper

Outline I. Water quality data: program design (CEZ, 15 min) II. Characteristics of water-quality data (CEZ, 15 min) III. Describing water quality(GIH, 30 min) IV. Data analysis for making decisions A, Compliance with numerical standards (EPS, 45 min) Dinner Break B, Locational / temporal comparisons (cause and effect) (EPS, 45) C, Detection of water-quality trends (GIH, 60 min)

III. Describing water quality (GIH, 30 min) Rivers and streams are an essential component of the biosphere Rivers are alive Life is characterized by variation Statistics is the science of variation Statistical Thinking/Statistical Perspective Thinking in terms of variation Thinking in terms of distribution

The present problem is multivariate WATER QUALITY as a function of TIME, under the influence of co-variates like FLOW, at multiple LOCATIONS

WQ variable versus time Time in Years Water Variable

Bear Creek below Town of Wise STP

Univariate WQ Variable Time Water Quality

Univariate WQ Variable Time Water Quality

Univariate WQ Variable Time Water Quality

Univariate WQ Variable Time Water Quality

Univariate WQ Variable Time Water Quality

Univariate WQ Variable Time Water Quality

Univariate WQ Variable Time Water Quality

Univariate WQ Variable Water Quality

Univariate WQ Variable Water Quality

Univariate WQ Variable Water Quality

Univariate WQ Variable Water Quality

Univariate WQ Variable Water Quality

Univariate Perspective, Real Data (pH below STP)

The three most important pieces of information in a sample: Central Location –Mean, Median, Mode Dispersion –Range, Standard Deviation, Inter Quartile Range Shape –Symmetry, skewness, kurtosis –No mode, unimodal, bimodal, multimodal

Central Location: Sample Mean (Sum of all observations) / (sample size) Center of gravity of the distribution depends on each observation therefore sensitive to outliers

Central Location: Sample Mean (Sum of all observations) / (sample size) Center of gravity of the distribution depends on each observation therefore sensitive to outliers

Central Location: Sample Mean (Sum of all observations) / (sample size) Center of gravity of the distribution depends on each observation therefore sensitive to outliers

Central Location: Sample Mean (Sum of all observations) / (sample size) Center of gravity of the distribution depends on each observation therefore sensitive to outliers

Central Location: Sample Mean (Sum of all observations) / (sample size) Center of gravity of the distribution depends on each observation therefore sensitive to outliers

Central Location: Sample Mean (Sum of all observations) / (sample size) Center of gravity of the distribution depends on each observation therefore sensitive to outliers

Central Location: Sample Median Center of the ordered array I.e., the (0.5)(n + 1) observation in the ordered array. If sample size n is odd, then the median is the middle value in the ordered array. Example A: 1, 1, 0, 2, 3 Order: 0, 1, 1, 2, 3 n = 5, odd (0.5)(n + 1) = 3 Median = 1 If sample size n is even, then the median is the average of the two middle values in the ordered array. Example B: 1, 1, 0, 2, 3, 6 Order: 0, 1, 1, 2, 3, 6 n = 6, even, (0.5)(n + 1) = 3.5 Median = (1 + 2)/2 = 1.5

Central Location: Sample Median Center of the ordered array depends on the magnitude of the central observations only therefore NOT sensitive to outliers

Central Location: Sample Median Center of the ordered array depends on the magnitude of the central observations only therefore NOT sensitive to outliers

Central Location: Sample Median Center of the ordered array depends on the magnitude of the central observations only therefore NOT sensitive to outliers

Central Location: Sample Median Center of the ordered array depends on the magnitude of the central observations only therefore NOT sensitive to outliers

Central Location: Sample Median Center of the ordered array depends on the magnitude of the central observations only therefore NOT sensitive to outliers

Central Location: Sample Median Center of the ordered array depends on the magnitude of the central observations only therefore NOT sensitive to outliers

Central Location: Mean vs. Median Mean is influenced by outliers Median is robust against (resistant to) outliers Mean moves toward outliers Median represents bulk of observations almost always Comparison of mean and median tells us about outliers

Dispersion Range Standard Deviation Inter-quartile Range

Dispersion: Range Maximum - Minimum Easy to calculate Easy to interpret Depends on sample size (biased) Therefore not good for statistical inference

Dispersion: Standard Deviation SD = SD =

Dispersion: Properties of SD SD > 0 for all data SD = 0 if and only if all observations the same (no variation) Familiar Intervals for a normal distribution, –68% expected within 1 SD, –95% expected within 2 SD, –99.6% expected within 3 SD, –Exact for normal distribution, ballpark for any distn For any distribution, nearly all observations lie within 3 SD

Interpretation of SD n = 200 SD = 0.41 Median = 7.6 Mean = 7.6

Quartiles, Percentiles, Quantiles, Five Number Summary, Boxplot Maximum4 th quartile100 th percentile1.00 quantile 3 rd quartile75 th percentile0.75 quantile Median2 nd quartile50 th percentile0.50 quantile 1st quartile25 th percentile0.25 quantile Minimum0 th quartile0 th percentile0.00 quantile

Quartiles (undergrad classes) E.g., Sample: 0, 3.1, 0.4, 0, 2.2, 5.1, 3.8, 3.8, 3.9, 2.3, n = 10 RankValue 105.1Maximum rd Quartile Median2 nd Quartile st Quartile Minimum Note: Quartiles Q 0, Q 1, Q 2, Q 3, Q 4, = Quantiles Q 0.00, Q 0.25, Q 0.50, Q 0.75, Q 1.00

5-Number Summary and Boxplot (undergrad perspective) MinQ1Q2Q3Max

Terminology Warning: Quartiles, a.k.a. Percentiles, a.k.a. Quantiles Note: Quartiles Q 0, Q 1, Q 2, Q 3, Q 4, = Quantiles Q 0.00, Q 0.25, Q 0.50, Q 0.75, Q 1.00 QuartilesPercentilesQuantiles Q 4 = 4 th quartile = Max= 100 th percentile= Q 1.00 = 1.00 quantile Q 3 = 3 rd quartile= 75 th percentile= Q 0.75 = 0.75 quantile Q 2 = 2 nd quartile = Med= 50 th percentile= Q 0.50 = 0.50 quantile Q 1 = 1 st quartile= 25 th percentile= Q 0.25 = 0.25 quantile Q 0 = 0 th quartile = Min= 0 th percentile= Q 0.00 = 0.00 quantile

Terminology Warning: But Percentiles and Quantiles are more general Note: Quartiles Q 0, Q 1, Q 2, Q 3, Q 4, = Quantiles Q 0.00, Q 0.25, Q 0.50, Q 0.75, Q 1.00 QuartilesPercentilesQuantiles Q 4 = 4 th quartile = Max= 100 th percentile= Q 1.00 = 1.00 quantile 95 th percentile= Q 0.95 = 0.95 quantile Q 3 = 3 rd quartile= 75 th percentile= Q 0.75 = 0.75 quantile 60 th percentile= Q 0.60 = 0.60 quantile Q 2 = 2 nd quartile = Med= 50 th percentile= Q 0.50 = 0.50 quantile 34 th percentile= Q 0.34 = 0.34 quantile Q 1 = 1 st quartile= 25 th percentile= Q 0.25 = 0.25 quantile 2.5 th percentile= Q = quantile Q 0 = 0 th quartile = Min= 0 th percentile= Q 0.00 = 0.00 quantile

Quantile Location and Quantiles Quantile RankQuantile LocationQuartile 0.75 = 3/ = 2/ = 1/4 ValueRank Minimum = 3.1 Maximum = 5.1 E.g., Sample: 0, 3.1, 0.4, 0, 2.2, 5.1, 3.8, 3.8, 3.9, 2.3, n = 10

Quantile Location and Quantiles by weighted averages (graduate classes) Example: Find the 20 th percentile of the sample above. Step 1: q = 0.20, n =10 L = 0.20(10 + 1) = 2.2 indicating the 2.2 th observation in the ordered array. Step 2: Therefore the 0.20 quantile is a weighted average of the 2 nd and 3 rd observations in the ordered array, which are a = 0.4, b = 0 and the weight is w = 0.2 Q = (0 – (– 0.4)) = – = – 0.32 E.g., Sample: 0, 3.1, 0.4, 0, 2.2, 5.1, 3.8, 3.8, 3.9, 2.3, n = 10

Quantile Location and Quantiles by weighted averages (graduate classes) Step 2: a = 0.4, b = 0, w = 0.2 Q = a + w(b – a) = – (0 – (– 0.4)) = – (0.4) = – = – 0.32 E.g., Sample: 0, 3.1, 0.4, 0, 2.2, 5.1, 3.8, 3.8, 3.9, 2.3, n = 10 – – 0.32

Quantile Location and Quantiles Example: 0, 3.1, 0.4, 0, 2.2, 5.1, 3.8, 3.8, 3.9, 2.3, n = 10 ValueRank Quantile rank, q Quantile Location, LQuantile, Q Common Name 1.00n = 105.1Maximum (10+1) = ( ) = rd Quartile (10+1) = ( ) = 2.25 Median, or 2 nd Quartile (10+1) = [0 (0.4)] = st Quartile Minimum

5-Number Summary and Boxplot using weighted averages for quantiles MinQ1Q2Q3Max Note slightly different results by using weighted averages.

Dispersion: IQR Inter-Quartile Range (3rd Quartile - (1st Quartile) Robust against outliers

Interpretation of IQR n = 200 SD = 0.41 Median = 7.6 Mean = 7.6 IQR = 0.54 For a Normal distribution, Median 2 IQR includes 99.3%

Shape: Symmetry and Skewness Symmetry mean bilateral symmetry

Shape: Symmetry and Skewness Symmetry mean bilateral symmetry Positive Skewness (asymmetric tail in positive direction)

Shape: Symmetry and Skewness Symmetry mean bilateral symmetry, skewness = 0 Mean = Median (approximately) Positive Skewness (asymmetric tail in positive direction) Mean > Median Negative Skewness (asymmetric tail in negative direction) Mean < Median Comparison of mean and median tells us about shape

Bear Creek below Town of Wise STP

Outlier Box Plot Outliers Whisker Median 75th %-tile = 3rd Quartile 25th %-tile = 1st Quartile IQR

Wise, VA, below STP pH TKN mg/l

Wise, VA below STP DO (% satur) BOD (mg/l)

Wise, VA below STP Tot Phosphorous (mg/l Fecal Coliforms