Laura Russell Programmer VertNet Buenos Aires (Argentina) 30 September 2011 Training course on biodiversity data publishing and.

Slides:



Advertisements
Similar presentations
DS-01 Disaster Risk Reduction and Early Warning Definition
Advertisements

Data Quality Considerations
The Process of Data Ingestion in ÆKOS Andrew Graham and Matt Schneider TERN Ecoinformatics Data Analysts Logos used with consent. Content of this presentation.
V Alyssa Rosemartin 1, Lee Marsh 1, Ellen Denny 1, Bruce Wilson USA National Phenology Network, Tucson, AZ; 2 - Oak Ridge National Laboratory, Oak.
Publishing Sensitive Data Kyle Braak Programmer GBIF Secretariat Training course on data cleaning and data publishing Nairobi, February.
Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition Methods to Improve Fitness-For-Use of Biodiversity.
Arthur ChapmanData Quality Training SABIF June 2012 Taxonomic and Nomenclature Data A. D. Chapman Data Quality.
Oregon Spatial Data Library Partnership Metadata Training OU Knight Library Eugene, Oregon December 3, 2009 Kuuipo Walsh Institute for Natural Resources.
1 CPSC 695 Data Quality Issues M. L. Gavrilova. 2 Decisions…
June 15, 2015June 15, 2015June 15, THE COURSE Mapping and Surveying Geographical Information Systems Importance of Data Global Positioning Systems.
Creating Architectural Descriptions. Outline Standardizing architectural descriptions: The IEEE has published, “Recommended Practice for Architectural.
Short Course on Introduction to Meteorological Instrumentation and Observations Techniques QA and QC Procedures Short Course on Introduction to Meteorological.
Data Acquisition Lecture 8. Data Sources  Data Transfer  Getting data from the internet and importing  Data Collection  One of the most expensive.
©Ian Sommerville 2000 Software Engineering, 6th edition. Chapter 5 Slide 1 Requirements engineering l The process of establishing the services that the.
Dimensions of Data Quality M&E Capacity Strengthening Workshop, Addis Ababa 4 to 8 June 2012 Arif Rashid, TOPS.
BIS TDWG Conference 28 October 2013, Florence Documenting data quality in a global network: the challenge for GBIF Éamonn Ó Tuama, Andrea Hahn, Markus.
Data Quality Data quality Related terms:
Value of a coordinate: geographic analysis of agricultural biodiversity Andy Jarvis, Julian Ramirez, Nora Castañeda, Samy Gaiji, Luigi Guarino, Hector.
1 WORLD TOURISM ORGANIZATION (UNWTO) MEASURING TOURISM EXPENDITURE: A UNWTO PROPOSAL SESRIC-UNWTO WORKSHOP ON TOURISM STATISTICS AND THE ELABORATION OF.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 27 Slide 1 Quality Management 1.
Data Quality Issues-Chapter 10
Respected Professor Kihyeon Cho
Georeferencing Workshop Rebecca J. Rowe University of Chicago Committee on Evolutionary Biology & Division of Mammals The Field Museum.
Biological data: georeferencing Monica Papeş University of Kansas
Richard White Biodiversity Data. Outline Biodiversity: what is it? – Definitions: is biodiversity: A resource? Something which can be measured? How to.
Copyright 2010, The World Bank Group. All Rights Reserved. Integrating Agriculture into National Statistical Systems Section A 1.
Support the spread of “good practice” in generating, managing, analysing and communicating spatial information Data collection for scale mapping Unit:
Eastern Bearded-dragon (Pogona barbata) – Toowoomba, Australia © Arthur D. Chapman Principles of Data Quality Australian Biodiversity Information Services.
Metadata templates and patterns Sergey Sukhonosov, Dr. Sergey Belov National Oceanographic Data Centre, Russia Training course on establishment of the.
VARIATION, VARIABLE & DATA POSTGRADUATE METHODOLOGY COURSE Hairul Hafiz Mahsol Institute for Tropical Biology & Conservation School of Science & Technology.
A Remarkable Record of Science for Change Since 1967.
Basic Geographic Concepts GEOG 370 Instructor: Christine Erlien.
GIS Data Quality.
Preserving the Scientific Record: Preserving a Record of Environmental Change Matthew Mayernik National Center for Atmospheric Research Version 1.0 [Review.
GPS GIS GIS in Campbell River GPS: Global Positioning System Originally designed for use by the military It is a satellite-based, radio navigation.
April nd IBTrACS Workshop 1 Operational Procedures How can we build consistent, homogeneous, well- documented climate quality data?
Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition Tools and Resources to Assess and Enhance Fitness-For-Use.
June 2012 Spatial Data Cleaning Species Occurrence Data Arthur D. Chapman.
Module 6. Data Management Plans  Definitions ◦ Quality assurance ◦ Quality control ◦ Data contamination ◦ Error Types ◦ Error Handling  QA/QC best practices.
Digitization of Natural History Collections (DIGIT) Larry Speers Program Officer Digitization of Natural History Collections Data TDWG Annual Meeting Oct.
Characterization, Inventory and Monitoring of trends in indigenous livestock Dr. E. D. Ilatsia D. N. Kamiti 23-Oct-15Animal Breeding and Genomics Group1.
Good Practice for monitoring plans Viewpoint of a DOE Marco van der Linden, SGS Climate Change Programme
Role of Spatial Database in Biodiversity Conservation Planning Sham Davande, GIS Expert Arid Communities Technologies, Bhuj 11 September, 2015.
Definition of an Observation In general, an observation represents the measurement of some attribute, of some thing, at a particular time and place. Observations.
A Provisional Observational Data Standard to Facilitate Data Sharing and Aggregation Lynn Kutner, Bruce Stein, and Donna Reynolds TDWG Annual Meeting,
Uncertainty How “certain” of the data are we? How much “error” does it contain? Also known as: –Quality Assurance / Quality Control –QAQC.
PROCESSING OF DATA The collected data in research is processed and analyzed to come to some conclusions or to verify the hypothesis made. Processing of.
When weather means business  Managing the collection and dissemination of non-homogenous data from numerous, diverse, geographically scattered sources.
Copyright 2010, The World Bank Group. All Rights Reserved. Testing and Documentation Part II.

Copyright 2010, The World Bank Group. All Rights Reserved. Managing Data Processing Section B.
BOT / GEOG / GEOL 4111 / Field data collection Visiting and characterizing representative sites Used for classification (training data), information.
Special Considerations for Archiving Data from Field Observations A Presentation for “International Workshop on Strategies for Preservation of and Open.
Geographic data validation. Index Basic concepts Why do we need validation? How to assess geographic data Initial checks Intermediate checks Advanced.
ESRI Education User Conference – July 6-8, 2001 ESRI Education User Conference – July 6-8, 2001 Introducing ArcCatalog: Tools for Metadata and Data Management.
IABIN Species and Specimens Thematic Network (SSTN) IABIN Executive Committee/Coordinating Institution Meeting. Tierras Enamoradas, Costa Rica. February.
Laura Russell VertNet Meherzad Romer NatureServe Canada John Wieczorek
Session 6: Data Flow, Data Management, and Data Quality.
UNIT 3 – MODULE 5: Data Input & Editing. INTRODUCTION Putting data into a computer (called data coding) is a fundamental process for virtually all GIS.
Inspiring and Engaging the Public Towards a Shared Understanding and Sense of Ownership of Freshwater Ecosystems A. Mauroner a, I.J. Harrison ab, & M.
Module 4 – Biodiversity By Ms Cullen. Terminology Try and define the following terms used when studying the environment.
Reconciling Confidentiality Risk Measures from Statistics and Computer Science Jerry Reiter Department of Statistical Science Duke University.
Data Quality Data quality Related terms:
Monitoring and Evaluation Systems for NARS Organisations in Papua New Guinea Day 3. Session 9. Periodic data collection methods.
RCN Development of an Online Database to Enhance the Conservation of SGCN Invertebrates in the Northeastern Region James W. Fetzner Jr. & John.
Analysis Ready Data ..
Training Course on Integrated Management System for Regulatory Body
Template library tool and Kestrel training
Data Quality By Suparna Kansakar.
Presentation transcript:

Laura Russell Programmer VertNet Buenos Aires (Argentina) 30 September 2011 Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition Introduction to fitness-for-use

Overview 1.Value of data 2.Defining Fitness for use 3.Fitness for use in biological occurrence data – Metadata – Taxonomy Data – Spatial Data – Collector and Collection Data – Descriptive Data

Are we living in the "data century?”

Value of data Are we living in the "data century" ? Available data is increasing exponentially. -The GBIF community is part of this movement! These data have the potential to dramatically increase our knowledge and capabilities.

Data and Politics

Data and Advertising Advertising agencies are important consumers of data and statistics.

Data and Maps

2010 OpenStreetMap response to Haiti earthquake Before...

2010 OpenStreetMap response to Haiti earthquake...and a few days later

Climate change & “crop wild relatives” Crop wild relatives Data from GBIF 343 species Global Climate change models Current richness Future predicted richness Predicted change

Turning data to understanding Oceans of data...

...rivers of information...

...streams of knowledge...

... and drops of understanding.

Uses of biodiversity data Taxonomic research, species distribution modelling / predicting, invasive species, habitat loss, species inter-relations,... But also... Conservation planning, water resources management, antivenoms, ecotourism, history of sciences, hunting and fishing, data repatriation, nature photography and film-making,...

Fitness for use Definition "The general intent of describing the quality of a particular dataset or record is to describe the fitness of that dataset or record for a particular use that one may have in mind for the data." Chrisman, 1991

Fitness for use in action - Does species 'A' occur in Tasmania ? - Does species 'A' occur in National Park 'X' ?

Loss of data quality can occur at every step Collection time During digitization During documentation During storage/archiving During analysis and manipulation At time of presentation Through the use to which they are put

Data quality information chain Assign responsibility for the quality of data as close to data creation as possible.

Quality Assurance and Quality Control Judgment of quality based on internal or external standards, processes and tools. Both should be done when data quality is a concern !

It's important for organizations to have: A vision for providing good quality data A policy to implement that vision A strategy for implementation Considerations - Don’t reinvent the wheel; use standards - Look for inefficiencies (in data collection and QC procedures) and reduce duplication - Share data, information and tools - Look beyond immediate use - Take care of user needs - Invest in good documentation and metadata

Data responsibility is shared between Collectors : primary responsibility Label information is correct, as accurate as possible and readable Collection methodologies are fully documented Notes are clear and unambiguous Difficult (or impossible) to correct later

Data responsibility is shared between Curator/custodian : long-term responsibility Quality of data transcription in the database Validation checks are carried out (routinely) and documented Data stored and archived Earlier versions are systematically stored Ensure respect (privacy, IP, copyright sensibility of indigenous owners,...) Provide good documentation (including known errors) User feedback about data quality is taken into account Responsibility of maintenance, but also to superintend the data for use by future generations.

Data responsibility is shared between Users Provide feedback to custodians: errors / omissions in data and documentation setting future priorities User responsibility: determine fitness of the data for their use and not use the data in inappropriate ways.

Accuracy and precision Accuracy = correctness Precision: o Statistical = "repetition" o Numerical = "digits" Low accuracy High precision High accuracy Low precision High accuracy High precision

Errors & uncertainty Errors : both imprecision and inaccuracies Random or systematic Don't try to avoid (measure, calculate, record, document Uncertainty Always present (difficulty: understand, record and describe) Talks more about the observer's than about the data!

Fitness-for-use and metadata "Data about data(set)" content, accessibility, completeness,... dataset-level or record-level document error document data validation and cleaning/error correction The data must be documented with sufficient detailed metadata to enable its use by third parties without reference to the originator of the data.

Taxonomic data Names are often the first point of entry to biodiversity databases. => Risk of error propagation Possible errors: Wrong identification Wrong format Spelling errors

Taxonomic data Taxonomic data consists of: -Name -Nomenclatural status -Reference -Determination -Quality fields

Taxonomic data Error checking:  Missing values -Incorrect values -Non-atomic values -Domain schizophrenia -Duplicates -Inconsistent data

Spatial data Is one of the most crucial aspects in being able to determine the fitness-for- use of biodiversity data:  species distribution modeling  reserve selection -environmental planning and management

Spatial data What is it ? Point records as lat/lon ? => Area represented as:  Point/radius  Bounding box  Polyline  Grid reference

Example of grid-based data (checklists)

Spatial data definitions Georeference: the code that records a position on the surface of the earth, according to a spatial reference system (SRS). It's often a latitude/longitude pair.  synonyms: coordinates Georeferencing: the process of assigning geographic coordinates to a record. Syn: geocoding.

(Geodesic) datum

Things to know about GPS GPS technology use triangulation, a minimum of 4 satellites are needed. Since position in space and time is known, position on earth can be calculated. Historically, the number of receivable satellites was not always sufficient. Prior to May 2000, selective availability gave an accuracy of 100m or worse with most devices. Now, generally 10 meters in open areas with 4 satellites. Averaging = better precision (some devices do that automatically). Differential GPS, WAAS, LAAS, and Real-time Differential GPS are different techniques that makes use of bases stations at well- know position to applies appropriate corrections. Precision up to 1 cm. GPS height relates to the earth ellipsoid in use, not to Mean Sea Level.

Spatial data Common errors lat./lon. inversion zero value (one or both) no recorded datum wrong choice of SRS false sense of precision/conversion issues

Original GBIF data about USA

Collector and collection data collector date of collection additional information: habitat, soil, weather conditions... Importance vary with the type of data collected: Static collection for a museum: collector name and number, date, habitat, capture method... Observational data: +length of observation, area of observation, time of the day, activity, sex of observed animal... Survey data: +survey method and size (grid), frequency, if vouchers get collected (+collection number)

Collector and collection data Accuracy: of collector names, dates,... Consistency: use of a terminology in data fields such as habitat, soils, associated species... Completeness: rarely achieved for fields such as habitat, flowering... This makes a study of habitat from just collections alone difficult

Descriptive data Morphological, physiological, phenological,... Increasing use Quality and accuracy variable: data unobservable (historic), impractical to observe (too costly), perceived rather than real (abundance, color,...) In many cases, stored at taxonomic level rather than specimen level. Completeness: generally not possible at specimen level (i.e. fruit VS flowers characteristics) Consistency: inconsistent representation of the same attribute: o FLOWER_COLOUR = Carmine o FLOWER_COLOUR= crimson

Credits Based on Arthur Chapman's documents, mainly the presentation "principles of data quality" Crop Wild Relatives: Andy Jarvis(1), Samy Gaiji (2), Julian Ramirez (1) and Emmanuel Zapata (1) 1. The International Center for Tropical Agriculture (CIAT) 2. The Global Biodiversity Information Facility Secretariat (GBIF) Accuracy VS precision slide: Beach picture by Lali Masrieta : River: Johan J.Ingles-Le NobelJohan J.Ingles-Le Nobel Stream: bterrycomptonbterrycompton Reference: Chapman, A.D. and J. Wieczorek (eds) Guide to Best Practices for Georeferencing. Copenhagen: Global Biodiversity Information Facility. Available online from or in French as Chapman, A.D. and J. Wieczorek (eds) Principes de la bonne pratique sur le géoréférencement, version 1.0. Trad. Chenin, C. Copenhague: Global Biodiversity Information Facility, 95 pp. Disponible en ligne sur

Laura Russell Programmer VertNet Buenos Aires (Argentina) 30 September 2011 Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition Introduction to fitness-for-use