Presentation is loading. Please wait.

Presentation is loading. Please wait.

Application of Statistical Techniques to Interpretation of Water Monitoring Data Eric Smith, Golde Holtzman, and Carl Zipper.

Similar presentations


Presentation on theme: "Application of Statistical Techniques to Interpretation of Water Monitoring Data Eric Smith, Golde Holtzman, and Carl Zipper."— Presentation transcript:

1

2 Application of Statistical Techniques to Interpretation of Water Monitoring Data Eric Smith, Golde Holtzman, and Carl Zipper

3 Outline I. Water quality data: program design (CEZ, 15 min) II. Characteristics of water-quality data (CEZ, 15 min) III. Describing water quality(GIH, 30 min) IV. Data analysis for making decisions A, Compliance with numerical standards (EPS, 45 min) Dinner Break B, Locational / temporal comparisons (cause and effect) (EPS, 45) C, Detection of water-quality trends (GIH, 60 min)

4 III. Describing water quality (GIH, 30 min) Rivers and streams are an essential component of the biosphere Rivers are alive Life is characterized by variation Statistics is the science of variation Statistical Thinking/Statistical Perspective Thinking in terms of variation Thinking in terms of distribution

5 The present problem is multivariate WATER QUALITY as a function of TIME, under the influence of co-variates like FLOW, at multiple LOCATIONS

6 WQ variable versus time Time in Years Water Variable

7 Bear Creek below Town of Wise STP

8 Univariate WQ Variable Time Water Quality

9 Univariate WQ Variable Time Water Quality

10 Univariate WQ Variable Time Water Quality

11 Univariate WQ Variable Time Water Quality

12 Univariate WQ Variable Time Water Quality

13 Univariate WQ Variable Time Water Quality

14 Univariate WQ Variable Time Water Quality

15 Univariate WQ Variable Water Quality

16 Univariate WQ Variable Water Quality

17 Univariate WQ Variable Water Quality

18 Univariate WQ Variable Water Quality

19 Univariate WQ Variable Water Quality

20 Univariate Perspective, Real Data (pH below STP)

21 The three most important pieces of information in a sample: Central Location –Mean, Median, Mode Dispersion –Range, Standard Deviation, Inter Quartile Range Shape –Symmetry, skewness, kurtosis

22 Central Location: Sample Mean (Sum of all observations) / (sample size) Center of gravity of the distribution depends on each observation therefore sensitive to outliers

23 Central Location: Sample Mean (Sum of all observations) / (sample size) Center of gravity of the distribution depends on each observation therefore sensitive to outliers

24 Central Location: Sample Mean (Sum of all observations) / (sample size) Center of gravity of the distribution depends on each observation therefore sensitive to outliers

25 Central Location: Sample Mean (Sum of all observations) / (sample size) Center of gravity of the distribution depends on each observation therefore sensitive to outliers

26 Central Location: Sample Mean (Sum of all observations) / (sample size) Center of gravity of the distribution depends on each observation therefore sensitive to outliers

27 Central Location: Sample Mean (Sum of all observations) / (sample size) Center of gravity of the distribution depends on each observation therefore sensitive to outliers

28 Central Location: Sample Median Center of the ordered array I.e., the (½)(n + 1) observation in the ordered array. If sample size n is odd, then the median is the middle value in the ordered array. Example A: 1, 1, 0, 2, 3 Order: 0, 1, 1, 2, 3 n = 5, odd (½)(n + 1) = 3 Median = 1 If sample size n is even, then the median is the average of the two middle values in the ordered array. Example B: 1, 1, 0, 2, 3, 6 Order: 0, 1, 1, 2, 3, 6 n = 6, even, (½)(n + 1) = 3.5 Median = (1 + 2)/2 = 1.5

29 Central Location: Sample Median Center of the ordered array depends on the magnitude of the central observations only therefore NOT sensitive to outliers

30 Central Location: Sample Median Center of the ordered array depends on the magnitude of the central observations only therefore NOT sensitive to outliers

31 Central Location: Sample Median Center of the ordered array depends on the magnitude of the central observations only therefore NOT sensitive to outliers

32 Central Location: Sample Median Center of the ordered array depends on the magnitude of the central observations only therefore NOT sensitive to outliers

33 Central Location: Sample Median Center of the ordered array depends on the magnitude of the central observations only therefore NOT sensitive to outliers

34 Central Location: Sample Median Center of the ordered array depends on the magnitude of the central observations only therefore NOT sensitive to outliers

35 Central Location: Mean vs. Median Mean is influenced by outliers Median is robust against (resistant to) outliers Mean moves toward outliers Median represents bulk of observations almost always Comparison of mean and median tells us about outliers

36 Dispersion Range Standard Deviation Inter-quartile Range

37 Dispersion: Range Maximum - Minimum Easy to calculate Easy to interpret Depends on sample size (biased) Therefore not good for statistical inference

38 Dispersion: Standard Deviation SD = SD =

39 Dispersion: Properties of SD SD > 0 for all data SD = 0 if and only if all observations the same (no variation) For a normal distribution, –68% expected within 1 SD, –95% expected within 2 SD, –99.6% expected within 3 SD, For any distribution, nearly all observations lie within 3 SD

40 Interpretation of SD n = 200 SD = 0.41 Median = 7.6 Mean = 7.6

41 Quantiles, Five Number Summary, Boxplot Maximum4 th quartile100 th percentile1.00 quantile 3 rd quartile75 th percentile0.75 quantile Median2 nd quartile50 th percentile0.50 quantile 1st quartile25 th percentile0.25 quantile Minimum0 th quartile0 th percentile0.00 quantile

42 Quantile Location and Quantiles Quantile RankQuantile LocationQuartile 0.75 = 3/ = 2/ = 1/4 Example: 0, 3.1, 0.4, 0, 2.2, 5.1, 3.8, 3.8, 3.9, 2.3, n = 10 ValueRank Minimum = 3.1 Maximum = 5.1

43 5-Number Summary and Boxplot MinQ1Q2Q3Max

44 Dispersion: IQR Inter-Quartile Range (3rd Quartile - (1st Quartile) Robust against outliers

45 Interpretation of IQR n = 200 SD = 0.41 Median = 7.6 Mean = 7.6 IQR = 0.54 For a Normal distribution, Median 2 IQR includes 99.3%

46 Shape: Symmetry and Skewness Symmetry mean bilateral symmetry

47 Shape: Symmetry and Skewness Symmetry mean bilateral symmetry Positive Skewness (asymmetric tail in positive direction)

48 Shape: Symmetry and Skewness Symmetry mean bilateral symmetry, skewness = 0 Mean = Median (approximately) Positive Skewness (asymmetric tail in positive direction) Mean > Median Negative Skewness (asymmetric tail in negative direction) Mean < Median Comparison of mean and median tells us about shape

49 Bear Creek below Town of Wise STP

50 Outlier Box Plot Outliers Whisker Median 75th %-tile = 3rd Quartile 25th %-tile = 1st Quartile IQR

51 Wise, VA, below STP pH TKN mg/l

52 Wise, VA below STP DO (% satur) BOD (mg/l)

53 Wise, VA below STP Tot Phosphorous (mg/l Fecal Coliforms


Download ppt "Application of Statistical Techniques to Interpretation of Water Monitoring Data Eric Smith, Golde Holtzman, and Carl Zipper."

Similar presentations


Ads by Google