Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition Methods to Improve Fitness-For-Use of Biodiversity.

Slides:



Advertisements
Similar presentations
REQUIRING A SPATIAL REFERENCE THE: NEED FOR RECTIFICATION.
Advertisements

What is a Flora? Peter Hovenkamp. What is not a Flora? Labwork/ecology paper Species selection on non-taxonomic criteria No identification tool Character.
TDWG- Lisbon Oct 2003 Data Cleaning Tools and Methodologies Arthur D. Chapman Australia / Brazil Centro de Referência em Informação Ambiental.
System Design System Design - Mr. Ahmad Al-Ghoul System Analysis and Design.
Publishing Sensitive Data Kyle Braak Programmer GBIF Secretariat Training course on data cleaning and data publishing Nairobi, February.
Good and Bad Locality Descriptions Elements and Examples.
Arthur ChapmanData Quality Training SABIF June 2012 Taxonomic and Nomenclature Data A. D. Chapman Data Quality.
SANBI’s role in promoting Biodiversity Information Standards in South Africa Sediqa Khatieb TDWG 2011
RWAU Training – April 27 & 28, 2015 Water Rights Teresa Wilhelmsen, P.E. Utah Lake / Jordan River Regional Engineer.
Geog 458: Map Sources and Errors January 20, 2006 Data Storage and Editing.
Lecture 16: Data input 1: Digitizing and Geocoding By Austin Troy University of Vermont Using GIS-- Introduction to GIS.
With Microsoft Access 2010 © 2011 Pearson Education, Inc. Publishing as Prentice Hall1 PowerPoint Presentation to Accompany GO! with Microsoft ® Access.
Milan Janák Field Mapping Training Workshop, 13 – 17 June, 2011 Instructions for field inventory of species in Montenegro listed under Habitats Directive.
BIS TDWG Conference 28 October 2013, Florence Documenting data quality in a global network: the challenge for GBIF Éamonn Ó Tuama, Andrea Hahn, Markus.
U.S. Department of the Interior U.S. Geological Survey Geographic Names In the United States Geographic Names In the United States The National Map & Other.
Considerations for the Construction of Lichen Databases Data Management.
NGAC Interagency Data Sharing and Collaboration Spotlight Session: Best Practices and Lessons Learned Robert F. Austin, PhD, GISP Washington, DC March.
Data Quality Issues-Chapter 10
Topics Covered: Data preparation Data preparation Data capturing Data capturing Data verification and validation Data verification and validation Data.
Richard White Biodiversity Data. Outline Biodiversity: what is it? – Definitions: is biodiversity: A resource? Something which can be measured? How to.
DAY 15: ACCESS CHAPTER 2 Larry Reaves October 7,
Support the spread of “good practice” in generating, managing, analysing and communicating spatial information Data collection for scale mapping Unit:
Exploring Business Technologies “I Can” and “I Will” Statements By Mr. Free.
Best Practices for Preparing Data Sets Non-CO2 Synthesis Workshop Boulder, Colorado October 2008 Compiled by: A. Dayalu, Harvard University Adapted.
Eastern Bearded-dragon (Pogona barbata) – Toowoomba, Australia © Arthur D. Chapman Principles of Data Quality Australian Biodiversity Information Services.
Parallels of Latitude Meridians of Longitude Graticular Network Georeferencing Using MaNIS/HerpNET/ORNIS Guidelines.
Introduction: Databases and Database Users
Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition How Darwin Core Archives have changed the landscape.
 To explain the importance of software configuration management (CM)  To describe key CM activities namely CM planning, change management, version management.
GIS Data Quality.
GLOBAL BIODIVERSITY INFORMATION FACILITY Cataloging and using Taxonomic Data The Global Names Architecture David Remsen Senior Programme Officer, ECAT.
[] Where Did Those GBIF Occurrences Come From? Providing Digital Access to NatureServe's Reference Database: Report on a Project in the Early Stages of.
Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition Tools and Resources to Assess and Enhance Fitness-For-Use.
Ground-Truthing the Habitat Inventory for the Fraser River: Status Report and Lessons Learned March 2007 Fraser River Estuary Management Program.
June 2012 Spatial Data Cleaning Species Occurrence Data Arthur D. Chapman.
Module 6. Data Management Plans  Definitions ◦ Quality assurance ◦ Quality control ◦ Data contamination ◦ Error Types ◦ Error Handling  QA/QC best practices.
Adaptation Baselines Through V&A Assessments Prof. Helmy Eid Climate Change Experts (SWERI) ARC Egypt Material for : Montreal Workshop 2001.
A curation interface for reconciliation of species names for India. Thomas Vattakaven and R. Prabhakar, India Biodiversity Portal, Strand Life Sciences,
Global Biodiversity Information Facility GLOBAL BIODIVERSITY INFORMATION FACILITY Larry Speers Global Biodiversity Information Facility Arthur Chapman.
1 Introduction to Oracle Chapter 1. 2 Before Databases Information was kept in files: Each field describes one piece of information about student Fields.
Dealing with Data of Sensitive Taxa Seminar – Perth, Australia October 2008 Dicksonia sp., Canberra, Australia © Arthur D. Chapman Dealing with Sensitive.
Copyright 2010, The World Bank Group. All Rights Reserved. Principles, criteria and methods Part 2 Quality management Produced in Collaboration between.
EURISCO as a tool to assist in gap analysis of ex situ European CWR Sónia Dias Presented at the PGRForum Workshop 2: Threat and Conservation Assessment.
Workshop on Price Index Compilation Issues February 23-27, 2015 Data Collection Issues Gefinor Rotana Hotel, Beirut, Lebanon.
NeMys: an evolving biological information system, a state of art Deprez, Tim (UGent) Vincx, Magda (UGent) Vanden Berghe, Edward (VLIZ) Mees, Jan (VLIZ)
INTRODUCTION TO GIS  Used to describe computer facilities which are used to handle data referenced to the spatial domain.  Has the ability to inter-
Scratchpads and the new Biodiversity Data Journal Biodiversity Data Publishing made… easier Dimitris Koureas Natural History Museum London.
Fábio Lang da Silveira – This talk on behalf of OBIS International Committee and OBIS North & South America Nodes USP – Zoology.
Use of ICT in Data Management AS Applied ICT. Back to Contents Back to Contents.
Geographic data validation. Index Basic concepts Why do we need validation? How to assess geographic data Initial checks Intermediate checks Advanced.
Lesson 4.  After a table has been created, you may need to modify it. You can make many changes to a table—or other database object—using its property.
Final Rule for Preventive Controls for Animal Food 1 THE FUTURE IS NOW.
Laura Russell Programmer VertNet Buenos Aires (Argentina) 30 September 2011 Training course on biodiversity data publishing and.
Copyright 2010, The World Bank Group. All Rights Reserved. Recommended Tabulations and Dissemination Section B.
ESRI Education User Conference – July 6-8, 2001 ESRI Education User Conference – July 6-8, 2001 Introducing ArcCatalog: Tools for Metadata and Data Management.
The New GBIF Data Portal Web Services and Tools Donald Hobern GBIF Deputy Director for Informatics October 2006.
Laura Russell VertNet Meherzad Romer NatureServe Canada John Wieczorek
NPGS Georeference Project Stephanie L. Greene, Prosser, WA.
GBIF Governing Board 20 Module 6B: New GBIF Tools II 2013 Portal and NPT Startup Daniel Amariles IT Leader, National Biodiversity Information System of.
Session 6: Data Flow, Data Management, and Data Quality.
The IPT user interface and data quality tools
Flanders Marine Institute (VLIZ)
The International Plant Protection Convention
Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition How Darwin Core Archives have changed the landscape.
Template library tool and Kestrel training
WP 4 - Revision of Natura 2000 dataflow
SDLC Phases Systems Design.
Nothing Is Perfect: Error Detection and Data Cleaning
Are You Studbook Keeper Friendly?
Presentation transcript:

Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition Methods to Improve Fitness-For-Use of Biodiversity Data Meherzad Romer Senior Data Manager NatureServe Canada ) September 30, 2011

Overview Best practices – Taxonomic data – Spatial data – Sensitive data

Best Practices Taxonomy

Taxonomic data Identification certainty Address by database design – Verification level flag, name, date – Take care of terms such as “aff.”, “cf.”, “s.lat” etc – If identification is not from taxonomic expertise, keep that information: Taxonomic keys DNA Taxonomic revisions

Taxonomic data Identification certainty Measures in data entry: – Use of checklists – Use of authority file Error checking – Generally requires taxonomic expertise – Environmental/geographical outliers can help prioritizing

Taxonomic data Spelling issues – scientific names Database design – Atomize data (genus, species, author, certainty…) Use authority files – Global lists (Catalog of Life, Species2000,…) – Regional – Taxonomic (Fishbase, etc) – Duplicate entries – A specific interface is needed to suggest possible duplicates and flag them when importing secondary sources.

Infraspecific rank Database design: atomize fields, build scientific name later: Taxonomic data GenusSpeciesInfra RankInfra Value Stipiturusmalachurussubsp.parimeda Avoid ambiguous names Allows checks on the infra rank

Data Entry – Use a pick-list – Allows a limited number of values Error checking – Not much to do if database is properly designed Taxonomic data Infraspecific rank Subsp.Subspecies Var.Variety Subvar.Subvariety f.Form/forma Subf.subform

Taxonomic data Cultivars and hybrids Very complex cases to handle; database design should target the specific data Cultivars are subject to their specific code of nomenclature Include a field that states if the record is a cultivar of hybrid (to allow further extraction/specific checks)

Taxonomic data Unpublished names – What to avoid Make clear it’s an unpublished name – avoid binomials that look like published names Avoid names such as “Verticordia sp.,” “Verticordia sp.2”

Taxonomic data Unpublished names – What to do " sp. ( )" Prostanthera sp. Somersbey (B.J. Conn 4024) Advantages – Clear to users that it is NOT a published name – Avoids confusion between species/institutions – When taxon is properly described, it can be used as standard synonym – Little danger of confusion outside of scientific publications

Almost impossible to standardize because: – A same taxon can have many common names – Conversely, the common name may be applied to multiple taxa Don’t standardize common names, but document them as extensively as possible Taxonomic data Spelling issues – Common names NameLanguageRegionSourceComment

Don’t always include them! Only necessary when same name has been given to several different taxa If you choose to include them, use a separate field Take care of difference between animals and plants – Animal names include years: Amydura signata Ahl, 1932 – Plant names don’t Melaleuca nervosa (Lindley) Taxonomic data Author Names

Taxonomic data Author names – error checking For plants, abbreviations of author’s names follows a standard; we can check against this Checks against authority files Soundex-like techniques If authors are used, all published names should have an author

Taxonomic data Collector’s names Extensive lists of collector’s names have been published for some areas. Format should be standardized. The HISPID standard recommends the following: “Primary collector’s family name (surname) followed by comma and space (, ) then initials (all in uppercase and each separated by fullstops). All initials and first letter of the collector’s family name in uppercase. For example, Chambers, P.F.”

Taxonomic data Collector’s names – Error checking If the format is standardized, it’s easy to “sort by collectors” and look for slight variations (extreme care should be taken before renaming, though). We can match collector name and date of collection with data from historians: ship itineraries, description of scientific expeditions… Both databases can be improved as inconsistencies and errors are detected.

Best Practices Spatial Data

Spatial data Database design – 1/2 We should ensure that there are fields to properly cater for information often wrongly placed in the locality field. Eurasia: throughout Europe to northernmost extremity of Scandinavia, except Iberian Peninsula, central Italy, and Adriatic basin; Aegean Sea basin in Matriza and from Struma to Aliakmon drainages; Aral Sea basin; Siberia in rivers draining the Arctic Ocean eastward to Kolyma. Widely introduced. Several countries report adverse ecological impact after introduction.(Perca Fluviatilis distribution, from fishbase)

Spatial data Database design 2/2 Coordinates in decimal Geodetic datum Accuracy reported by the device Spatial uncertainty: preferably in meters "Nearest named place", "Distance" and "Direction" (+ Locality). All together will help geocoding and data cleaning. Geo-referencing method: use of differential GPS GPS corrupted by Selective Availability (before May 2000) A map reference at 1: and obtained by triangulation A map reference using dead reckoning Obtained automatically using geo-referencing software

Spatial data Error checking on existing data Checking against the rest of the record: locality, country name Checking against external data in a database: is the record consistent with the collector's visited places ? Checking against external data using GIS: point-in-polygon test - does this record falls on land or at sea ? Checking for geographic outliers for a species Checking for environmental outliers for a species

Spatial data Locality - Collecting in the field The most specific locality descriptions use an unambiguous, small, easily discoverable, persistent reference feature and orthogonal offsets from the center of that feature. "2.1 km N and 0.5 km E of North Head Lighthouse off Sydney Heads"

Best Practices Sensitive Data

Dealing with Sensitive Data Why generalize? Protect threatened species, economically important species and reduce impact on wild populations Preclude sabotage, collection by unscrupulous and commercial collectors, over exploitation, control bio- prospecting... Protect third-party data held by the institution Allow for publication of research results and to maintain competitive advantages Fear of the user making inappropriate use of the data Respect wishes of the private property owners

Dealing with Sensitive Data General Considerations Key issue is often a social one There are regional aspects to sensitivity Some will never release sensitive data Documentation is essential

Dealing with Sensitive Data How to generalize data Spatial data: o use of a geographic grid o 3 levels of generalization recommended by Chapman & Wieczorek(2006): 0.1 degrees (11-16 km) degrees ( km) degrees ( m) o In extreme cases, do not release Non-spatial data o should be replaced by appropriate wording o Do not restrict data on collection

Dealing with Sensitive Data Fields to generalize Locality and georeferencing information Other fields (taxonomic information, observer's name, Habitat information, hosts, traditional uses,...)

Dealing with Sensitive Data Documentation is essential What has been done to generalize the data, and the reasons, to allow the user to: know the data has been modified and how know that there is more information that may be obtained decide whether to ignore those data, include as is or to seek further information

Credits Based on Arthur Chapman's documents, mainly the presentation "principles of data quality" Reference: Chapman, A.D. and J. Wieczorek (eds) Guide to Best Practices for Georeferencing. Copenhagen: Global Biodiversity Information Facility. Available online from

Thank you. Questions?

Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition Methods to Improve Fitness-For-Use of Biodiversity Data Meherzad Romer Senior Data Manager NatureServe Canada ) September 30, 2011