Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing.

Similar presentations


Presentation on theme: "Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing."— Presentation transcript:

1 Data Analysis 101

2 Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing Data Comparison to Standards Comparison to Standards Key questions to ask of data Key questions to ask of data Summarizing QA/QC Standards Summarizing QA/QC Standards

3 Dissolved Oxygen data (mg/L) Site23/4/201216/5/201215/6/201214/7/201215/8/20129/15/2012 110.29.89.210.09.310.3 29.68.210.310.19.99.8 38.37.18.26.3 5.6 47.57.27.36.0 2.1 58.07.26.16.33.46.2 66.17.66.98.68.48.9 Site14/4/201316/5/201314/6/201316/7/201315/8/201321/9/2013 19.210.28.37.810.310.1 29.56.99.310.310.69.2 38.26.38.28.37.26.2 47.27.36.17.02.36.5 56.38.08.63.06.77.6 69.210.37.69.37.310.3

4 Average (Arithmetic Mean) SiteAverage 19.6 29.5 37.2 46.0 56.5 68.4

5 Average (Arithmetic Mean) SiteAverage 19.6 29.5 37.2 46.0 56.5 68.4 Very high and low numbers can distort results Is the Site 4 value of 6.0 mg/L representative of the data set?

6 Average (Arithmetic Mean) SiteAverage 19.6 29.5 37.2 46.0 56.5 68.4 Very high and low numbers can distort results Is the Site 4 value of 6.0 mg/L representative of the data set? Site23/4/201216/5/201215/6/201214/7/201215/8/20129/15/2012 47.57.27.36.0 2.1 Site14/4/201316/5/201314/6/201316/7/201315/8/201321/9/2013 47.27.36.17.02.36.5

7 Median Central value in a set of values, ranked from lowest to highest. 2, 5, 6, 6, 10, 11, 12, 13, 120

8 Median Central value in a set of values, ranked from lowest to highest. 2, 5, 6, 6, 10, 11, 12, 13, 120 Median = 10 Average = 20.5

9 Median Central value in a set of values, ranked from lowest to highest. 2, 5, 6, 6, 10, 11, 12, 13, 120 Median = 10 Average = 20.5

10 SiteAverageMedian 19.69.9 29.59.7 37.2 46.06.8 56.5 68.48.5 Site23/4/201216/5/201215/6/201214/7/201215/8/20129/15/2012 47.57.27.36.0 2.1 Site14/4/201316/5/201314/6/201316/7/201315/8/201321/9/2013 47.27.36.17.02.36.5 Site 4 median value is more representative of data set than average

11 Range Maximum & Minimum Range is the difference between the maximum and minimum values in data set. Range is the difference between the maximum and minimum values in data set. The larger the range, the greater the variability The larger the range, the greater the variability Maximum and Minimum values are also important Maximum and Minimum values are also important DO standards expressed as minimum concentration to needed to support fish DO standards expressed as minimum concentration to needed to support fish Bacteria levels expresses as maximum levels that pose an acceptable risk to public health Bacteria levels expresses as maximum levels that pose an acceptable risk to public health

12 DO (mg/L) SiteMinMaxRange 17.810.32.5 26.910.63.7 35.68.32.7 42.17.55.4 53.08.65.6 66.110.34.2

13 DO (mg/L) SiteMinMaxRange 17.810.32.5 26.910.63.7 35.68.32.7 42.17.55.4 53.08.65.6 66.110.34.2 Sites 4 and 5 have the greatest range

14 Quartiles and Interquartile Range Quartiles: 3 values below which lie 25%, 50% & 75% of the values in a set of numbers, respectively Quartiles: 3 values below which lie 25%, 50% & 75% of the values in a set of numbers, respectively Median = 50 th quartile Median = 50 th quartile Half of your data values occur between the 25 th and 75 th quartiles Half of your data values occur between the 25 th and 75 th quartiles Difference between the 25 th and 75 th quartiles is the IQ Range Difference between the 25 th and 75 th quartiles is the IQ Range

15 DO (mg/L) - Site 3 8.3 7.1 8.2 6.3 5.6 8.2 6.3 8.2 8.3 7.2 6.2

16 DO (mg/L) - Site 3 8.3 7.18.3 8.2 6.38.2 6.38.2 5.67.2 8.27.1 6.3 8.26.3 8.36.3 7.26.2 5.6

17 DO (mg/L) - Site 3 8.3 7.18.3 8.2 6.38.2 6.38.2 5.67.2 8.27.1 6.3 8.26.3 8.36.3 7.26.2 5.6 75 th quartile

18 DO (mg/L) - Site 3 8.3 7.18.3 8.2 6.38.2 6.38.2 5.67.2 8.27.1 6.3 8.26.3 8.36.3 7.26.2 5.6 50 th quartile (median) 75 th quartile

19 DO (mg/L) - Site 3 8.3 7.18.3 8.2 6.38.2 6.38.2 5.67.2 8.27.1 6.3 8.26.3 8.36.3 7.26.2 5.6 50 th quartile (median) 75 th quartile 25 th quartile

20 DO (mg/L) - Site 3 8.3 7.18.3 8.2 6.38.2 6.38.2 5.67.2 8.27.1 6.3 8.26.3 8.36.3 7.26.2 5.6 50 th quartile (median) 75 th quartile 25 th quartile

21 Quartiles and Interquartile Range Site25thMedian75thIQ Range 19.209.9010.201.00 29.289.7010.150.88 36.307.158.201.90 46.006.757.231.23 56.186.507.701.53 67.538.509.231.70

22 Quartiles and Interquartile Range Site25thMedian75thIQ Range 19.209.9010.201.00 29.289.7010.150.88 36.307.158.201.90 46.006.757.231.23 56.186.507.701.53 67.538.509.231.70 Which sample site has the greatest variability in data? Which has the least?

23 Quartiles and Interquartile Range Site25thMedian75thIQ Range 19.209.9010.201.00 29.289.7010.150.88 36.307.158.201.90 46.006.757.231.23 56.186.507.701.53 67.538.509.231.70 Which sample site has the greatest variability in data? Which has the least?

24 Geometric Mean Like median, the geometric mean reduces the influence of very high and very low numbers in data set. Like median, the geometric mean reduces the influence of very high and very low numbers in data set. GeoMean = √2 x 8 = 4 GeoMean = √2 x 8 = 4 GeoMean = √2 x 4 x 8 = 4 GeoMean = √2 x 4 x 8 = 4 Use when data covers several orders of magnitude (Guideline: largest value must be at least 3x smallest) Use when data covers several orders of magnitude (Guideline: largest value must be at least 3x smallest) Spreadsheets: replace “0” values with “1” Spreadsheets: replace “0” values with “1” 2 3

25 E.coli (MPN) Site23/4/201216/5/201215/6/201214/7/201215/8/2012 122801003821 2151420213974 31002250123450 480100010057146 530260100 630 610146074330 Site14/4/201316/5/201314/6/201316/7/201315/8/2013 1170340620162 21194901212050 3273190172060 42026306316012 576770163310468 61940150416

26 E. coli summary SiteGeomeanAverage 147114 275236 374301 4126245 5192291 632178

27 E. coli summary SiteGeomeanAverage 147114 275236 374301 4126245 5192291 632178 In every case, geomean is lower than average Especially true for Site 6, where geomean is six times lower than mean

28 Site23/4/1216/5/1215/6/1214/7/1215/8/12 122801003821 2151420213974 31002250123450 480100010057146 530260100 630 610146074330 Site14/4/1316/5/1314/6/1316/701315/8/13 1170340620162 21194901212050 3273190172060 42026306316012 576770163310468 61940150416

29 Site23/4/1216/5/1215/6/1214/7/1215/8/12 122801003821 2151420213974 31002250123450 480100010057146 530260100 630 610146074330 Site14/4/1316/5/1314/6/1316/701315/8/13 1170340620162 21194901212050 3273190172060 42026306316012 576770163310468 61940150416 SiteGeomeanAverage 147114 275236 374301 4126245 5192291 632178

30 Site23/4/1216/5/1215/6/1214/7/1215/8/12 122801003821 2151420213974 31002250123450 480100010057146 530260100 630 610146074330 Site14/4/1316/5/1314/6/1316/701315/8/13 1170340620162 21194901212050 3273190172060 42026306316012 576770163310468 61940150416 SiteGeomeanAverage 147114 275236 374301 4126245 5192291 632178

31 Sites 3, 4 & 6 – single high result skews up average Site 3 had highest average; Site 5 had highest geomean Different analysis = different result! Site23/4/1216/5/1215/6/1214/7/1215/8/12 122801003821 2151420213974 31002250123450 480100010057146 530260100 630 610146074330 Site14/4/1316/5/1314/6/1316/701315/8/13 1170340620162 21194901212050 3273190172060 42026306316012 576770163310468 61940150416 SiteGeomeanAverage 147114 275236 374301 4126245 5192291 632178

32 Suggested Statistical Summaries Tend to be useful for comparisons between sites, or between months, seasons, or years for the same site Tend to be useful for comparisons between sites, or between months, seasons, or years for the same site Presents a “representative” or “typical” value and information on how the data is spread Presents a “representative” or “typical” value and information on how the data is spread

33 Suggested Statistical Summaries IndicatorSummary Temperature (water or air) Seasonal average Seasonal median Maximum Range Quartiles Dissolved Oxygen (mg/L) Seasonal median Minimum Quartiles Dissolved Oxygen (% saturation) Seasonal average* Seasonal median Quartiles Water clarity Seasonal average Seasonal median Maximum and Minimum Range Quartiles

34 Suggested Statistical Summaries IndicatorSummary Bacteria (E. coli) Geometric mean Quartiles Turbidity Median Quartiles Nutrients (e.g. NO3/ PO4) Median Quartiles Specific Conductivity or Salinity Median Quartiles pH Median or average* Quartiles Minimum

35 Statistical Summaries Factors to bear in mind: Temp and DO – use seasonal medians and quartiles, since these parameters vary naturally with seasons Temp and DO – use seasonal medians and quartiles, since these parameters vary naturally with seasons In general, use median instead of average In general, use median instead of average You should at least 5 data points to calculate averages, geometric mean, medians and quartiles. You should at least 5 data points to calculate averages, geometric mean, medians and quartiles.

36 A good table has…. Readable, logical data placement Readable, logical data placement Clear column and row headings Clear column and row headings A title at the top A title at the top Reporting units included Reporting units included SitesMedian 10.02 2 30.12 4 50.11 60.04 Smith River Median Orthophosphate Results for 2013 (mg/L)

37 A good graph has….. A clear title A clear title Simple clear labels on axes Simple clear labels on axes A scale that reveals trends A scale that reveals trends A legend that explains the elements on graph A legend that explains the elements on graph Clearly shown reporting units Clearly shown reporting units A story that is apparent from the graph A story that is apparent from the graph Information that allows the reader to get the point, e.g. levels of concern Information that allows the reader to get the point, e.g. levels of concern The minimum number of elements to tell the story – avoid clutter The minimum number of elements to tell the story – avoid clutter

38

39

40 Threshold of concern

41 Upstream Downstream

42

43 Graph implies a connection between each point on line & trend up or down between sites. This may not be appropriate in all cases

44

45

46

47

48

49

50

51

52

53 Creating Box and Whisker Plots Proprietary 3 rd party graphing software (e.g. Grapher) Proprietary 3 rd party graphing software (e.g. Grapher) Some Statistics packages Some Statistics packages Not standard with MS Excel Not standard with MS Excel Excel instructions at: Excel instructions at: http://peltiertech.com/WordPress/excel-box-and-whisker- diagrams-box-plots/

54 Reporting Variability Sample Standard Deviation SD = √((x – mean) 2 ) / (n – 1) SD = sample standard deviation X = individual sample value Mean = arithmetic mean of all values N = number of sample values A measure of the amount of variability with a data set.

55 Reporting Variability Sample Standard Deviation SD = √((x – mean) 2 ) / (n – 1) SD = sample standard deviation X = individual sample value Mean = arithmetic mean of all values N = number of sample values A measure of the amount of variability with a data set.

56 Estimating precision Standard Error SE = SD / √ n SE = standard error SD = sample standard deviation N = sample size

57 Estimating precision Standard Error SE = SD / √ n SE = standard error SD = sample standard deviation N = sample size Quantifies the certainty with which the mean computed from a random sample estimates the true mean of the population from which the sample was drawn.

58 Estimating precision Co-efficient of Variation CV =( SD / sample mean ) x 100 CV does not depend on magnitude of values and units. This allows comparison of different studies and different sampling designs

59

60

61 You have your data, but what does it mean? Do your values show a problem or not? It helps to have a point of reference.

62 VariableGuidelineUnitsWater Quality Objective NotesReference DO6.5 to 9.5mg/ L Freshwater aquatic lifecold water biota CCME 2002 pH6.5 to 9.0Freshwater aquatic lifeCCME 2002 Temp.<20 <24 °C Stress to salmonids Mortality to salmonids MacMillan et al 2005 Total P0.03 0.02 to 0.07 mg/ L Protection from eutrophication OMEE Mackie 2004 Dodds and Welch 2000 Total N0.25 to 3.0mg/ L Protection from eutrophication Dodds and Welch 2000 E. coli<200cfu/1 00 ml Human recreational contact Geomean of 5 samples taken with 30 days Health Canada 2012 TSS25mg/ L Clear flow, short termMax increase from background CCME 2002

63

64 Other sources of WQ reference values www.lakes.chebucto.org/lakecomp.html (reference and historical values for NS lakes) http://novascotia.ca/nse/surface.water/automatedqualitymo nitoringdata.asp http://novascotia.ca/nse/surface.water/automatedqualitymo nitoringdata.asp (automated data collection – NS surface water quality monitoring network)

65 Questions to ask of your data Dates, 1995

66 Questions to ask of your data Which sites consistently did not meet WQO? By how much? Which sites consistently did not meet WQO? By how much? Were there sampling dates on which most or all of the sites did not meet the criteria? Were there sampling dates on which most or all of the sites did not meet the criteria? Do levels increase or decrease in a consistent manner up or downstream? Do levels increase or decrease in a consistent manner up or downstream? If monitoring a pollution source, are results different above/below? If monitoring a pollution source, are results different above/below? Does change in an indicator coincide with changes in another? e.g. DO & temperature Does change in an indicator coincide with changes in another? e.g. DO & temperature Dates, 1995

67 Human alterations or Natural conditions??

68 Might natural up/downstream changes in river account for results? (benthic invert drift/turbidity) Might natural up/downstream changes in river account for results? (benthic invert drift/turbidity) Does weather influence results? (heavy rain, elevated temp) Does weather influence results? (heavy rain, elevated temp) Do problem levels coincide with rising flow? (consider dam releases or flow management) Do problem levels coincide with rising flow? (consider dam releases or flow management) Does presence of specific sources explain results (WWTP, failing septic) Does presence of specific sources explain results (WWTP, failing septic)

69 Human alterations or Natural conditions?? con’t Do changes in an indicator appear to explain changes in another (Low DO/high temp) Do changes in an indicator appear to explain changes in another (Low DO/high temp) Do visual results explain results? (strange pipes, eroding banks, dry weather seeps etc) Do visual results explain results? (strange pipes, eroding banks, dry weather seeps etc) If monitoring impact of a pollution source, could multiple point sources be confusing results? If monitoring impact of a pollution source, could multiple point sources be confusing results?

70 More questions to keep in the back of your mind Could flaws in field/lab techniques explain results? (sample contamination/sampling error) Could flaws in field/lab techniques explain results? (sample contamination/sampling error) For episodic discharges, did sampling coincide with discharge? For episodic discharges, did sampling coincide with discharge? Where analytical methods sensitive enough to detect levels of concern? Where analytical methods sensitive enough to detect levels of concern? Time of day of sampling (diurnal DO cycling) Time of day of sampling (diurnal DO cycling)

71 Summarizing QA/QC Results

72 You need to prove the: Precision Precision Accuracy Accuracy Representativeness Representativeness Comparability Comparability Completeness Completeness of your data and conclusions

73

74 DO, pH & Temperature collected here once per year

75 Is this sampling representative of environmental conditions in this lake? DO, pH & Temperature collected here once per year

76 VolunteerDO (mg/L) results from training day (same time and place) Tom8.9 Jon6.8 Jill9.0 Geoff8.8

77 VolunteerDO (mg/L) results from training day (same time and place) Tom8.9 Jon6.8 Jill9.0 Geoff8.8 Are volunteer results comparable?

78 VolunteerDO (mg/L) results from training day (same time and place) Tom8.9 Jon6.8 Jill9.0 Geoff8.8 Are results comparable between volunteers, at different times and at different locations?

79 Monitoring Plan -Sample DO, pH, Spec. Cond. & Turbidity within 12 hours of >15mm precipitation events, on Sackville River between April and October

80 Monitoring Plan -Sample DO, pH, Spec. Cond. & Turbidity within 12 hours of >15mm precipitation events, on Sackville River between April and October Monitoring Results -Samples collected at 4 of 9 precipitation events Are results complete?

81 Collect Replicates to Evaluate Precision ObservationDO (mg/L) 19.8 29.9 310.1 4 Mean9.98 Standard Deviation0.15 Co-efficient of Variation1.50 Samples collected by the same individual at same location and time Set threshold for maximum co-efficient of variation?

82 Collect Replicates to Evaluate accuracy SiteDateDO (mg/L) Difference% Difference VolunteerQA/QC 1013/6/201389111.1 2022/8/201311.311.40.10.9 3023/8/20137.48.4111.9 4015/9/20138.89.10.33.3 5016/9/20137.29.11.920.9 601/10/201310.48.4-2-23.8 Single sample split and tested by volunteer and program coordinator using same method.

83 Collect Replicates to Evaluate accuracy SiteDateDO (mg/L) Difference% Difference VolunteerQA/QC 1013/6/201389111.1 2022/8/201311.311.40.10.9 3023/8/20137.48.4111.9 4015/9/20138.89.10.33.3 5016/9/20137.29.11.920.9 601/10/201310.48.4-2-23.8 Single sample split and tested by volunteer and program coordinator using same method. Which volunteer(s) need retraining on analysis technique? Set threshold for maximum percent difference?

84 Collect Replicates to Evaluate accuracy SiteDateDO (mg/L) Difference% Difference VolunteerQA/QC 1013/6/201389111.1 2022/8/201311.311.40.10.9 3023/8/20137.48.4111.9 4015/9/20138.89.10.33.3 5016/9/20137.29.11.920.9 601/10/201310.48.4-2-23.8 Single sample split and tested by volunteer and program coordinator using same method. Which volunteer(s) need retraining on analysis technique? Set threshold for maximum percent difference?


Download ppt "Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing."

Similar presentations


Ads by Google