Presentation is loading. Please wait.

Presentation is loading. Please wait.

Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition Methods to Improve Fitness-For-Use of Biodiversity.

Similar presentations


Presentation on theme: "Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition Methods to Improve Fitness-For-Use of Biodiversity."— Presentation transcript:

1 Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition Methods to Improve Fitness-For-Use of Biodiversity Data Meherzad Romer (mromer@natureserve.ca) Senior Data Manager NatureServe Canada www.natureserve-canada.ca ) September 30, 2011

2 Overview Best practices – Taxonomic data – Spatial data – Sensitive data

3 Best Practices Taxonomy

4 Taxonomic data Identification certainty Address by database design – Verification level flag, name, date – Take care of terms such as “aff.”, “cf.”, “s.lat” etc – If identification is not from taxonomic expertise, keep that information: Taxonomic keys DNA Taxonomic revisions

5 Taxonomic data Identification certainty Measures in data entry: – Use of checklists – Use of authority file Error checking – Generally requires taxonomic expertise – Environmental/geographical outliers can help prioritizing

6 Taxonomic data Spelling issues – scientific names Database design – Atomize data (genus, species, author, certainty…) Use authority files – Global lists (Catalog of Life, Species2000,…) – Regional – Taxonomic (Fishbase, etc) – Duplicate entries – A specific interface is needed to suggest possible duplicates and flag them when importing secondary sources.

7 Infraspecific rank Database design: atomize fields, build scientific name later: Taxonomic data GenusSpeciesInfra RankInfra Value Stipiturusmalachurussubsp.parimeda Avoid ambiguous names Allows checks on the infra rank

8 Data Entry – Use a pick-list – Allows a limited number of values Error checking – Not much to do if database is properly designed Taxonomic data Infraspecific rank Subsp.Subspecies Var.Variety Subvar.Subvariety f.Form/forma Subf.subform

9 Taxonomic data Cultivars and hybrids Very complex cases to handle; database design should target the specific data Cultivars are subject to their specific code of nomenclature Include a field that states if the record is a cultivar of hybrid (to allow further extraction/specific checks)

10 Taxonomic data Unpublished names – What to avoid Make clear it’s an unpublished name – avoid binomials that look like published names Avoid names such as “Verticordia sp.,” “Verticordia sp.2”

11 Taxonomic data Unpublished names – What to do " sp. ( )" Prostanthera sp. Somersbey (B.J. Conn 4024) Advantages – Clear to users that it is NOT a published name – Avoids confusion between species/institutions – When taxon is properly described, it can be used as standard synonym – Little danger of confusion outside of scientific publications

12 Almost impossible to standardize because: – A same taxon can have many common names – Conversely, the common name may be applied to multiple taxa Don’t standardize common names, but document them as extensively as possible Taxonomic data Spelling issues – Common names NameLanguageRegionSourceComment

13 Don’t always include them! Only necessary when same name has been given to several different taxa If you choose to include them, use a separate field Take care of difference between animals and plants – Animal names include years: Amydura signata Ahl, 1932 – Plant names don’t Melaleuca nervosa (Lindley) Taxonomic data Author Names

14 Taxonomic data Author names – error checking For plants, abbreviations of author’s names follows a standard; we can check against this Checks against authority files Soundex-like techniques If authors are used, all published names should have an author

15 Taxonomic data Collector’s names Extensive lists of collector’s names have been published for some areas. Format should be standardized. The HISPID standard recommends the following: “Primary collector’s family name (surname) followed by comma and space (, ) then initials (all in uppercase and each separated by fullstops). All initials and first letter of the collector’s family name in uppercase. For example, Chambers, P.F.”

16 Taxonomic data Collector’s names – Error checking If the format is standardized, it’s easy to “sort by collectors” and look for slight variations (extreme care should be taken before renaming, though). We can match collector name and date of collection with data from historians: ship itineraries, description of scientific expeditions… Both databases can be improved as inconsistencies and errors are detected.

17 Best Practices Spatial Data

18 Spatial data Database design – 1/2 We should ensure that there are fields to properly cater for information often wrongly placed in the locality field. Eurasia: throughout Europe to northernmost extremity of Scandinavia, except Iberian Peninsula, central Italy, and Adriatic basin; Aegean Sea basin in Matriza and from Struma to Aliakmon drainages; Aral Sea basin; Siberia in rivers draining the Arctic Ocean eastward to Kolyma. Widely introduced. Several countries report adverse ecological impact after introduction.(Perca Fluviatilis distribution, from fishbase)

19 Spatial data Database design 2/2 Coordinates in decimal Geodetic datum Accuracy reported by the device Spatial uncertainty: preferably in meters "Nearest named place", "Distance" and "Direction" (+ Locality). All together will help geocoding and data cleaning. Geo-referencing method: use of differential GPS GPS corrupted by Selective Availability (before May 2000) A map reference at 1:100 000 and obtained by triangulation A map reference using dead reckoning Obtained automatically using geo-referencing software

20 Spatial data Error checking on existing data Checking against the rest of the record: locality, country name Checking against external data in a database: is the record consistent with the collector's visited places ? Checking against external data using GIS: point-in-polygon test - does this record falls on land or at sea ? Checking for geographic outliers for a species Checking for environmental outliers for a species

21 Spatial data Locality - Collecting in the field The most specific locality descriptions use an unambiguous, small, easily discoverable, persistent reference feature and orthogonal offsets from the center of that feature. "2.1 km N and 0.5 km E of North Head Lighthouse off Sydney Heads"

22 Best Practices Sensitive Data

23 Dealing with Sensitive Data Why generalize? Protect threatened species, economically important species and reduce impact on wild populations Preclude sabotage, collection by unscrupulous and commercial collectors, over exploitation, control bio- prospecting... Protect third-party data held by the institution Allow for publication of research results and to maintain competitive advantages Fear of the user making inappropriate use of the data Respect wishes of the private property owners

24 Dealing with Sensitive Data General Considerations Key issue is often a social one There are regional aspects to sensitivity Some will never release sensitive data Documentation is essential

25 Dealing with Sensitive Data How to generalize data Spatial data: o use of a geographic grid o 3 levels of generalization recommended by Chapman & Wieczorek(2006): 0.1 degrees (11-16 km) - 0.01 degrees (1.1-1.6km) - 0.001 degrees (112-157m) o In extreme cases, do not release Non-spatial data o should be replaced by appropriate wording o Do not restrict data on collection

26 Dealing with Sensitive Data Fields to generalize Locality and georeferencing information Other fields (taxonomic information, observer's name, Habitat information, hosts, traditional uses,...)

27 Dealing with Sensitive Data Documentation is essential What has been done to generalize the data, and the reasons, to allow the user to: know the data has been modified and how know that there is more information that may be obtained decide whether to ignore those data, include as is or to seek further information

28 Credits Based on Arthur Chapman's documents, mainly the presentation "principles of data quality" Reference: Chapman, A.D. and J. Wieczorek (eds). 2006. Guide to Best Practices for Georeferencing. Copenhagen: Global Biodiversity Information Facility. Available online from http://www2.gbif.org/BioGeomancerGuide.pdf http://www2.gbif.org/BioGeomancerGuide.pdf

29 Thank you. Questions?

30 Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition Methods to Improve Fitness-For-Use of Biodiversity Data Meherzad Romer (mromer@natureserve.ca) Senior Data Manager NatureServe Canada www.natureserve-canada.ca ) September 30, 2011


Download ppt "Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition Methods to Improve Fitness-For-Use of Biodiversity."

Similar presentations


Ads by Google