OUTLINE OF QUALITY CONTROL DOCUMENT Introduction Why is quality control is needed? Information to accompany data Automatic checks “Scientific” quality.

Slides:



Advertisements
Similar presentations
Better Data, Better Science! [ Better Science through Better Data Management ] Todd D. OBrien NOAA – NMFS - COPEPOD.
Advertisements

Assumptions underlying regression analysis
PRESENTS: FORECASTING FOR OPERATIONS AND DESIGN February 16 th 2011 – Aberdeen.
Pierre Jaccard1 MyOcean Quality Control for Ferryboxes MyOcean Tutorial, NERSC, Bergen.
TMSI/IDM/SISMER Sept 2000 SISMER SISMER Systèmes d’Informations Scientifiques pour la Mer F-NODC Quality Control Procedure at IFREMER Argo data management.
Review and Rating Discharge Measurements David S. Mueller Office of Surface Water March 2010.
Argo Real-time Quality Control Process NOAA/AOML: Y.-H. DANESHZADEH, R. MOLINARI, R. SABINA, C. SCHMID CIMAS/UM: E. FORTEZA, X. XIA, H. YANG.
Guidelines for QC/QA from EMODNet Chemistry pilot Marina Lipizer, Istituto Nazionale di Oceanografia e di Geofisica Sperimentale–OGS Dipartimento OCE Kick-off.
Argo QC with an emphasis on the North Atlantic Justin Buck British Oceanographic Data Centre Joseph Proudman Building 6 Brownlow Street Liverpool L3 5DA,
Hernan E. Garcia (U.S. NODC, IODE Group of Experts on Biological and Chemical Data Management and Exchange Practices) EDM Workshop 2014, Silver Spring,
National Data Buoy Center Presented to the QARTOD III Workshop November 2, 2005 Wave Data Quality Control Chung-Chu Teng.
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
QARTOD III Scripps Institution of Oceanography La Jolla, CA Quality Control, Quality Assurance, and Quality Flags Mark Bushnell, NOAA/NOS/CO-OPS November.
Methods of Exploratory Data Analysis GG 313 Fall /25/05.
Operational Quality Control in Helsinki Testbed Mesoscale Atmospheric Network Workshop University of Helsinki, 13 February 2007 Hannu Lahtela & Heikki.
QARTOD II Currents and Waves In-Situ Currents: Breakout Group Report Out QARTOD II February 28 – March 2, 2005.
CHAPTER 6 Statistical Analysis of Experimental Data
Total Quality Management BUS 3 – 142 Statistics for Variables Week of Mar 14, 2011.
TRENDS IN MARINE WINDS ADJUSTED FOR CHANGES IN OBSERVATION METHOD, Bridget R. Thomas 1, Elizabeth C. Kent 2, Val R. Swail 3 and David I. Berry.
Chemometrics Method comparison
First Data Management Training Workshop, February, 2007, Oostende, Belgium 1 Quality control checks description First Data Management Training Workshop.
Quality Control Standards for SeaDataNet Review status at 1 st Annual Meeting (March 2007) Review developments over last year Current status Future work.
Principles of the Global Positioning System Lecture 11 Prof. Thomas Herring Room A;
Quality Assessment 2 Quality Control.
Towards a Standard for Real-time Quality Control Procedures for in situ Ocean Waves Richard Bouchard 1 and Julie Thomas 2 1.NOAA’s National Data Buoy Center.
MEDIN Data Guidelines. Data Guidelines Documents with tables and Excel versions of tables which are organised on a thematic basis which consider the actual.
MineralScan Fill Level Signal Examples & Explanations - RNMC Introduction The MineralScan MillSlicer system normally consists of two fixed vibration sensors.
Work Package 2 / 3 TECHNOLOGIAL & PROCEDURAL HARMONISATION FixO3 General Assembly 14 th to the 16 th October 2014, Heraklion-CRETE Maureen Pagnani, BODC/NOC,
Controlled Vocabularies (Term Lists). Controlled Vocabs Literally - A list of terms to choose from Aim is to promote the use of common vocabularies so.
Needs for Data and Information Managing living and non-living resources, monitoring environmental changes in the sea and protecting the marine environment,
Reiner Schlitzer Alfred Wegener Institute for Polar and Marine Research Ocean Data View.
Quantitative Skills 1: Graphing
LC and SMBA Updates Office of Surface Water Hydroacoustics Webinar January 6 and 9, 2008 David S. Mueller.
Fundamentals of Data Analysis Lecture 9 Management of data sets and improving the precision of measurement.
Fundamentals of Data Analysis Lecture 10 Management of data sets and improving the precision of measurement pt. 2.
The role of gliders in sustained observations of the ocean Deliverable 4.1 or WP 4.
Editing RT QC flag in delayed mode ? Virginie Thierry DMQC 4 Toulouse, 28 septembre 2009.
Technical Working Group, II Teruko Manabe Steven Worley Miroslaw Mietus Shawn Smith Simon Tett Volker Wagner Scott Woodruff David Berry Liz Kent.
© Crown copyright Met Office The EN QC system for temperature and salinity profiles Simon Good.
June 19, 2007 GRIDDED MOS STARTS WITH POINT (STATION) MOS STARTS WITH POINT (STATION) MOS –Essentially the same MOS that is in text bulletins –Number and.
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
Quality Control for the World Ocean Database GSOP Quality Control Workshop June 12, 2013.
© Crown copyright Met Office The EN4 dataset of quality controlled ocean temperature and salinity profiles and monthly objective analyses Simon Good.
Effective drift velocity and initiation times of interplanetary type-III radio bursts Dennis K. Haggerty and Edmond C. Roelof The Johns Hopkins University.
Data Analysis.
Special Considerations for Archiving Data from Field Observations A Presentation for “International Workshop on Strategies for Preservation of and Open.
MEDAR 2002 Database and Network The MEDAR Group MEDAR/MEDATLAS II Mediterranean Data Archaeology and Rescue of Temperature, Salinity and Bio-chemical Parameters.
1 NODC Quality Control : Automatic Checks - reveal systematic errors in incoming data and metadata - eliminate most non-representative data from consideration.
Yield Cleaning Software and Techniques OFPE Meeting
BASIC STATISTICAL CONCEPTS Statistical Moments & Probability Density Functions Ocean is not “stationary” “Stationary” - statistical properties remain constant.
10 th Argo data management 2009 Toulouse What is new at GDACs ?
Hernan E. Garcia (U.S. NODC, IODE Group of Experts on Biological and Chemical Data Management and Exchange Practices) 2nd IQuOD Workshop 2014, Silver Spring,
Geneva, April 2007 GTSPP by Bob Keeley (Canada) and Charles Sun (U.S.A)
Evaluation of SVP-BW drifters thanks to deployments near moored buoys DBCP-18 workshop - Martinique October 2002 By Pierre Blouch Presentation :
1 Chapter No. 17 Radiation Detection and Measurements, Glenn T. Knoll, Third edition (2000), John Willey. Measurement of Timing Properties.
Data Editing Strategies Common edits Invalidation vs. contamination? What is a considered a spike? What not to edit! Automatic edits Reasons for edits.
Rolling Deck to Repository (R2R): How to Systematically Document Quality for the New Era of Data Re-Usability? AGU Poster IN21B-1048 AGU Fall Meeting December.
ARENA08 Roma June 2008 Francesco Simeone (Francesco Simeone INFN Roma) Beam-forming and matched filter techniques.
CTD Data Processing Current BIO Procedure. Current Processing Software Matlab Migrating to R & Python Code Version Control SVN Migrating to GitHub.
Reiner Schlitzer Alfred Wegener Institute for Polar and Marine Research Data Quality Control and Visualization with Ocean Data View 4.
National Highway Institute 5-1 REV-2, JAN 2006 EQUIPMENT FACTORS AFFECTING INERTIAL PROFILER MEASUREMENTS BLOCK 5.
SeaDataNet Technical Task Group meeting JRA1 Standards Development Task 1.2 Common Data Management Protocol (for dissemination to all NODCs and JRA3) Data.
Argo Delayed-Mode Salinity Data
Numerical Measures: Centrality and Variability
Marine Meteorology Quality Control at the Florida State University
Ocean Data View Reiner Schlitzer
LCDR John Hendrickson 17SEP2008
Natalie Laudier Operational Oceanography 13Feb2009
Quality Control Lecture 3
Ocean Data View Reiner Schlitzer
Presentation transcript:

OUTLINE OF QUALITY CONTROL DOCUMENT Introduction Why is quality control is needed? Information to accompany data Automatic checks “Scientific” quality control CTD (temperature and salinity) Current meter data (including ADCP) Wave data Sea level Biological data, etc., Quality flags Documentation Quality Control Standards for SEADATANET

Data quality control has the following objective: “To ensure the data consistency within a single data set and within a collection of data sets and to ensure that the quality and errors of the data are apparent to the user who has sufficient information to assess its suitability for a task.” (IOC/CEC Manual, 1993) Quality control, if done well, brings about a number of key advantages: Maintaining standards Consistency Reliability

Quality Control Standards for SEADATANET For all types of data information is required about: Where the data were collected: location (preferably as latitude and longitude) and depth/height When the data were collected (date and time in UTC or clearly specified local time zone) How the data were collected (e.g. sampling methods, instrument types, analytical techniques) How the data are referenced (e.g. station numbers, cast numbers) Who collected the data, including name and institution of the data originator(s) and the principal investigator What has been done to the data (e.g. details of processing and calibrations applied, algorithms used to compute derived parameters) Comments for other users of the data (e.g. problems encountered and comments on data quality)

Data Collection Details: example 1 Biological Net Tow (Plankton) Project, ship, cruise identifier Country, organisation Date, time, latitude and longitude (for start and end if sampling via a net tow) Sounding, maximum and minimum pressure or depth of tow Description of operational procedures such as tow orientation (vertical, horizontal or oblique), methods of position fixing (e.g. DGPS, GPS, etc.) Weather conditions (including sun and wind) Gear type (e.g. net mesh size, net mouth size, single or multi-net, etc.) Sample preservation method (e.g. pickling, frozen, etc.) Sample analysis/processing or data collection procedures (e.g. filtered size ranges, sub-sampling, etc.) Any additional information of use to secondary users which may have affected the data or have a bearing on its subsequent use

Data Collection Details: example 2 Shipboard ADCP Project, ship, cruise identifier Country, organisation Details of the instrument and sensors (e.g. manufacturer, instrument type, model number, serial number and any modifications carried out, number of transducers) Description of operational procedures including sampling interval (time between ensembles), pings per ensemble, bin size, number of bins, bottom tracking on/off, pitch and roll on/off, percentage good level, method of position fix (e.g. GPS, DGPS), automated data rejection (e.g. fish rejection algorithms), etc. Frequency (kHz), band type (broad, narrow) Date and time of the start and end of the profiles for each data file Any additional information of use to secondary users which may have affected the data or have a bearing on its subsequent use.

Quality Control Standards for SEADATANET Parameter Details Parameters measured (Refer to BODC Parameter Usage Vocabulary if necessary for help with parameter definitions) Data Processing Details Originator's Data Format Description of calibrations Description of any data processing that has occurred (manufacturers and in-house)

AUTOMATIC QUALITY CONTROL CHECKS

Basic automatic checks for all data types Date and time of an observation has to be valid Year 4 digits Month between 1 and 12 Day in range expected for month Hour between 0 and 23 Minute between 0 and 59 Latitude and longitude have to be valid Latitude in range -90 to 90 Longitude in range -180 to 180 Position must not be on land Observation latitude and longitude located in ocean For example, use 5-minute bathymetry (e.g. ETOPO5)

Further automatic checks Impossible speed Tests for acceptable speed between stations Spike Tests salinity and temperature data for large differences between adjacent values (other parameters also) Gradient Tests for gradient between vertically adjacent salinity and temperature measurements too steep Density inversion Tests where calculated density at a higher pressure in a profile is less than the calculated density at an adjacent lower pressure Pressure increasing Pressures from the profile monotonically increasing

Further automatic checks Global range Tests that observed temperature and salinity values are within the expected extremes encountered in the oceans Regional range Tests that observed temperature and salinity values are within the expected extremes encountered in particular regions Deepest pressure Tests that profile does not contain pressures higher than the highest value expected Check for duplicates Cruises or stations within a cruise using a space-time radius (e.g., for duplicate cruises: 1 mile, 15min or 1day if time is unknown)

SCIENTIFIC QUALITY CONTROL CHECKS

Quality Control Standards for SEADATANET Visual inspection of data Pressure/depth series (e.g. CTD) Property-property plot Time series (e.g. current meter, sea level) Scatter plot (e.g. current meter) Map covering the locations of series Ensure that data are free from instrument- generated spikes, gaps, spurious data at the start and end of the record and other irregularities Apply quality flags Quality flags do not change the data Visual inspection can be subjective, dependent on experience

Quality Control Standards for SEADATANET Spike detection Can require visual inspection to back up automatic check Range check Comparison with pre-existing climatological statistics Compare data collected on same cruise and in same area

Quality Control Standards for SEADATANET (Current Speed (ms -1 ), North velocity component (ms -1 ), East velocity component (ms -1 ), Current Direction (°), Temperature (°C) (not shown)) Example of a current meter time series with a possible rotor problem

Quality Control Standards for SEADATANET BELOW: example of a record with suspect directions ABOVE: example of a ‘good’ scatter plot

Quality Control Standards for SEADATANET Common Problems Associated with Current Meters (1) Rotor turns, but there is either a breakdown of magnetic coupling between the rotor and follower or reed switch which then fails to register rotations Rotor not turning due to fouling with weed or the suchlike. This results in a sudden drop in speed to zero or near zero. Directions not being resolved. This could result from a stiff meter suspension or a meter being fouled by its mooring wire. Compass sticking. This may occur if the meter is inclined too far from the horizontal plane and can be a problem in fast tidal streams when in-line instruments are used. This is commonly known as.mooring-knockdown’. This is seen in the data as a frequent recurrence of a single direction value or a narrow range of directions. Worn compass. This causes some directions to become repetitive.

Quality Control Standards for SEADATANET Common Problems Associated with Current Meters (2) Non linearity of compass. This is usually picked up from the scatter plot of u and v velocity components. Sticking encoder pins. This causes spikes in all parameters and is often manifested by the appearance of the value of the pin(s) in the listing (e.g. 0, 256, 512, 768 or 1023). Underrated power supply. This often shows in the compass channel first because of the extra current drain during clamping. Electronic failure (e.g. dry joints, circuitry broken). This does not always produce a total loss of data however. Poor quality recording tape. This is indicated by the appearance of suspect data at regular intervals in all parameters. Sensor drift. This is a slow change in the response of the sensor.

Quality Control Standards for SEADATANET Scatter plot of wave height against (zero up-crossing or crest) period Wave heights (maximum in pink; significant in blue) are ok for range, basically normal distribution, steepness (all < 5%)

Quality Control Standards for SEADATANET 1-Dimensional and Directional Wave Spectra Check slope of energy density spectrum – should follow a set slope due to transfer of energy from lower to higher frequencies (?) Check that energy in the spectrum at frequencies below 0.04 Hz is not more than 5% of the total spectral energy Check that energy in the spectrum at frequencies above 0.6 Hz is not more than 5% of the total spectral energy Check mean direction at high frequencies, which should correspond to the wind direction (assuming coincident meteorological data). For 1D spectra, calculate zeroth spectral moment from spectral variance densities and check that it corresponds to the given value For 1D spectra, calculate Te as the zeroth divided by first negative spectral moment and check that it correlates with (peak or zero upcrossing) period

Quality Control Standards for SEADATANET Sea Level Data Harmonic analysis - generate predictions Calculate residuals Spikes Constant values Clock malfunctions Gap filling Reference changes Calculation of statistics

Biological data quality control COPEPOD: A Global Plankton Database (2005) Plankton data are variable by nature, influenced by numerous physical and biological events. Unlike temperature or salinity values, no tight range of typical values that one can use to easily qualify or disqualify these data. Plankton values greatly affected by size of net mesh and depth of tow. Very basic value range and statistical techniques to look for anomalous or non-representative data. The variety of original units still do not allow for easy inter-comparison of the data: Common Baseunit Value (CBV) was calculated Biological Grouping Code (BGC) identifies the plankton taxa’s membership in up to four groupings

Biological data – range checks CBV and BGC are used together to perform broad, taxonomic group-based value range checks A single range (for the entire world ocean) was used for the major and minor taxonomic groups. Future work will divide these ranges into smaller taxonomic sub-groups and individual oceanographic basins or regions, allowing for tighter range checks Value ranges very general and encompass the effects of: Different mesh sizes Day versus night sampling Presence of smaller life stages (“number of adults” vs. “number of adults + juveniles”) Will be adjusted as new data and better techniques added to database New ranges, as well as ranges for additional plankton sub- groups will be available online.

Biological data - statistical checks Used to search for questionable values Not used to automatically flag values For each BGC group mean and standard deviation calculated based on all observations present in database Individual observations >5 standard deviations from mean investigated on a case by-case basis Natural variability may account for many “outliers”, method helped identify extreme values caused by misinterpreted units or typographic errors In many cases, values off by a factor of 1000 Readily detected by these simple statistical checks

SeaDataNet quality control flags FlagShort description 0No quality control 1The value appears to be correct 2The value appears to be probably good 3The value appears probably bad 4The value appears erroneous 5The value has been changed 6Below detection limit 7In excess of quoted value 8Interpolated value 9Missing value AIncomplete information Based on IGOSS/UOT/GTSPP & Argo quality flags

Quality Control Standards for SEADATANET Data Documentation Comprehensive documentation to accompany the data All data sets need to be fully documented to ensure they can be used in the future without ambiguity or uncertainty Compiled using: information supplied by the data originator (e.g. data reports, comments on data quality) any further information gained during QC Includes: instrument details, mooring details, data quality, calibration and processing carried out by the data originator and data centre processing and quality control

Quality Control Standards for SEADATANET REFERENCES NODC procedures (e.g. France, Greece, Italy, Norway, Spain, Sweden, UK) EU MEDAR-MEDATLAS procedures and SCOOP software EU SIMORC project (Met-ocean data QC) EU ESEAS (sea level) and IOC GLOSS documents Manual of Quality Control Procedures for Validation of Oceanographic Data, UNESCO, IOC - Manuals & Guides, 1993, Manual And Guides 26 GTSPP QC (IOC Manuals and Guides No. 22) Argo Quality Control Manual (Real Time and Delayed Mode) GOSUD Real-time quality control IODE’s OceanTeacher ICES WG Marine Data Management Data Type Guidelines JPOTS Manual, 1991 WOCE manuals JGOFS Protocols World Ocean Database Quality Control documentation TOGA/COARE Handbook of Quality Control Procedures for Surface Meteorology Data BODC-WOCE Sea Level Data Assembly Centre Quality Assessment AODC Quality Control Cookbook for XBT Data Chapman, A. D Principles and Methods of Data Cleaning – Primary Species and Species- Occurrence Data, version 1.0. Chapman, A. D Principles of Data Quality, version 1.0. Report for the Global Biodiversity Information Facility, Copenhagen. ‘Ocean biodiversity informatics’: a new era in marine biology research and management (Mark J. Costello, Edward Vanden Berghe) QARTOD (Quality Assurance of Real-Time Oceanographic Data)