TDWG- Lisbon Oct 2003 Data Cleaning Tools and Methodologies Arthur D. Chapman Australia / Brazil Centro de Referência em Informação Ambiental.

Slides:



Advertisements
Similar presentations
Setting Up a Custom Gradebook Logon to GDP and click Instructor options. Then click LAN Gradebook and Create New Class to create a class in which a Custom.
Advertisements

NWCA OPC Coaches Tutorial Roster, Schedules, Weigh-Ins, Results, and Reports.
Use of biodiversity modelling in environmental conservation - a case study Marinez Ferreira de Siqueira Giselda Durigan Mauro Muñoz Fabrício Pavarin A.
Chapter 6 Flowcharting.
Japan Historical GIS Lex Berman - Harvard Yenching Inst.
Notification of Digital requirements for the Draft Plan – Damascus, August Regional Information Meeting and Workshop related to the RRC-06.
Tutorial 9 – Creating On-Screen Forms Using Advanced Table Techniques
Combining OS and OSM Data - A Case Study for geospatial data integration Name: Du Heshan Supervisor: Dr Suchith Anand.
MaNIS/HerpNET/ORNIS Guidelines
Working with Profiles in IX1D v 3 – A Tutorial © 2006 Interpex Limited All rights reserved Version 1.0.
1 AirWare : urban and industrial air quality assessment and management Release R5.3 beta DDr. Kurt Fedra Environmental Software & Services GmbH A-2352.
Centro de Referência em Informação Ambiental, CRIA Sidnei de Souza Abril 2006 mapcria web service.
Linking AirBase with geo-info Future developments of air quality information system AirBase Wim Mol European Topic Centre on Air and Climate Change (ETC/ACC)
SpeciesLink The Brazilian experience on setting up a network Renato De Giovanni Centro de Referência em Informação Ambiental, CrIA.
Wallingford Software What’s New in InfoWorks CS v9.0 Andrew Walker Product Manager Wallingford Software.
Bioclimatic Modelling BIOCLIM Arthur D. Chapman Kakadu National Park.
Summary of Key Issues Examples of the following contained in the next series of slides: – Add new links and tab options – Move Customise box to bottom.
Logging In Go to web site:
Ice shelf retreat on the Antarctic Peninsula An investigation of the collapse of ice shelves in relation to climatic variables.
Ocean Biodiversity Informatics, Hamburg 29 Nov 2004 From Data to Uncertainty Principles of Data Quality Arthur D. Chapman Australian Biodiversity Information.
Arthur ChapmanData Quality Training SABIF June 2012 Taxonomic and Nomenclature Data A. D. Chapman Data Quality.
2008 Physiological Measurements Focusing on measurements that assess the function of the major body systems 1.
Georeferencing Specimen Data in GIS: DIVA-GIS
A New World for Mapping John Spencer Spatial Analysis Unit October 5, 2009.
The West Virginia GeoExplorer Project is located at: You can try everything you see here, and more, on this.
Welcome to EDINA Digimap Digimap is an EDINA service offering online access to a range of spatial data. It is authenticated using Athens and is available.
Welcome to EDINA Digimap Digimap is an EDINA service offering online access to a range of spatial data. It is authenticated using the UK Federation and.
Technical Support: (989) GIS and Mapping Procedures in ArcMap 9.x Creating an ArcMap Project Editing an ArcMap Project Printing an ArcMap Project.
SpeciesLink A System for integrating distributed primary biodiversity data Vanderlei Perez Canhos Centro de Referência em Informação Ambiental, CrIA.
Lecture 16: Data input 1: Digitizing and Geocoding By Austin Troy University of Vermont Using GIS-- Introduction to GIS.
GIS A geographic information system. A GIS is most often associated with a map. A GIS is most often associated with a map. The map is a display of a data.
Islamic Republic of Afghanistan Ministry of Education EMIS Directorate.
Expert Group Meeting on Price Statistics and National Accounts: ICP Round 2011 Jointly organized by: UN-ECLAC, CARICOM, CARTAC and ECCB 3rd-6th December.
Eastern Bearded-dragon (Pogona barbata) – Toowoomba, Australia © Arthur D. Chapman Principles of Data Quality Australian Biodiversity Information Services.
INCOFISH WP3 - Campinas, April 2006 WEB Tools and Data Cleaning Alexandre Marino Centro de Referência em Informação Ambiental, CrIA.
Metadata Normalisation in Europeana The Hague, 13 & 14 January 2009 Julie Verleyen Scientific Coordinator, Europeana Office EuropeanaLocal Knowledge Sharing.
Eurotrace Hands-On The Eurotrace File System. 2 The Eurotrace file system Under MS ACCESS EUROTRACE generates several different files when you create.
Why Is It There? Getting Started with Geographic Information Systems Chapter 6.
Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition Tools and Resources to Assess and Enhance Fitness-For-Use.
June 2012 Spatial Data Cleaning Species Occurrence Data Arthur D. Chapman.
Centro de Referência em Informação Ambiental, CRIA Dora Ann Lange Canhos March, 2007 mapcria web service openModeller Incofish & CRIA.
Comparison of the 3D MHD Solar Wind Model Results with ACE Data 2007 SHINE Student Day Whistler, B. C., Canada C. O. Lee*, J. G. Luhmann, D. Odstrcil,
Georeferencing Methods. 1) Read Guidelines: Point-radius method Point radius method for georeferencing locality descriptions and calculating associated.
Using Climatic data in Diva GIS Franck Theeten, Royal Museum for central Africa Cabin training 2013.
LATTICE TECHNOLOGY, INC. For Version 10.0 and later XVL Web Master Tutorial For Version 10.0 and later.
Niches, Interactions and Movements. Calculating a Species Distribution Range Jorge Soberon M. A. Townsend Peterson.
Utilizing GIS & GPS to Streamline Your Maintenance Operations 2008 Indiana GIS Conference February 20, 2008 In Association with: Nicholasville Jessamine.
STEP 1 Enter search words in the text box and click on “Search.” In this demo version, LaserSearch downloads just a few hundred documents from the Internet.
An introduction to data exchange protocols in TDWG Renato De Giovanni TDWG 2008.
INTRODUCTION TO GIS  Used to describe computer facilities which are used to handle data referenced to the spatial domain.  Has the ability to inter-
Geographic data validation. Index Basic concepts Why do we need validation? How to assess geographic data Initial checks Intermediate checks Advanced.
AS Level ICT Data entry: Creating validation checks.
Remote-sensing and biodiversity in a changing climate Catherine Graham SUNY-Stony Brook Robert Hijmans, UC-Berkeley Lianrong Zhai, SUNY-Stony Brook Sassan.
A new tool for fundamental niche modelling Renato De Giovanni Centro de Referência em Informação Ambiental, CrIA.
GIS for Environmental Modeling GIS and GIS Models.
Inter-American Workshop on Environmental Data Access geoLoc and spOutlier: on-line tools for geocoding and validating biological data geoLoc and spOutlier.
Georeferencing Botanical Data Using Text Analysis Tools Clare A Llewellyn, Elspeth Haston & Claire Grover.
THE ART OF GEOCODING By: Diana Tkacs Kaplan University.
Homework 2 Hints. General Tips Remember what FORM view you are in! – Design, form, and layout view TABLE views include: – Design and Datasheet view.
UNIT 3 – MODULE 5: Data Input & Editing. INTRODUCTION Putting data into a computer (called data coding) is a fundamental process for virtually all GIS.
Data Storage & Editing GEOG370 Instructor: Christine Erlien.
Why Is It There? Chapter 6. Review: Dueker’s (1979) Definition “a geographic information system is a special case of information systems where the database.
Geocoding Chapter 16 GISV431 &GEN405 Dr W Britz. Georeferencing, Transformations and Geocoding Georeferencing is the aligning of geographic data to a.
Geocoding Chapter 16 GISV431 &GEN405 Dr W Britz. Georeferencing, Transformations and Geocoding Georeferencing is the aligning of geographic data to a.
Tools and Techniques to Clean Up your Database
Tools and Techniques to Clean Up your Database
2008 Physiological Measurements
A Web-Based Tool for Gathering Ordinal Rankings
Checking and Editing AquaMap Outputs
From Data to Uncertainty Principles of Data Quality
Presentation transcript:

TDWG- Lisbon Oct 2003 Data Cleaning Tools and Methodologies Arthur D. Chapman Australia / Brazil Centro de Referência em Informação Ambiental

TDWG- Lisbon Oct 2003 Background ERIN/CRIA speciesLink FAPESP/Biota

TDWG- Lisbon Oct 2003 Species Data Museum/Herbarium Observation Survey

TDWG- Lisbon Oct 2003 Data Error Names Geocode Altitude Collectors Dates

TDWG- Lisbon Oct 2003 Adding Data to the Database Software –Biota –BRAHMS –Specify –BioLink –EGaz –Etc.

TDWG- Lisbon Oct 2003 On-line Tools BioGeomancer ( CRIA-localidade ( Guidelines –MANIS –HISPID –Data Cleaning and Validation

TDWG- Lisbon Oct 2003 Data quality - fitness for use

TDWG- Lisbon Oct 2003 Recording Accuracy and Error Additional Accuracy Fields –Preferably in meters (Point-Radius) Documenting Validation tests –Who –What –How

TDWG- Lisbon Oct 2003 Methods for geocode validation Internal Database Checks Outliers in Geographic Space - GIS Outliers in Environmental Space - Models Statistical outliers

TDWG- Lisbon Oct 2003 Internal Database Checks Internal inconsistencies Checking one field against another –Text location vs geocode Checking one database against another –Gazetteers –DEM –Collectors

TDWG- Lisbon Oct 2003 Geographic outliers - GIS Country, State, named district, etc.

TDWG- Lisbon Oct 2003 Geographic outliers - GIS

TDWG- Lisbon Oct 2003 Geographic Outliers - GIS Collectors – location vs date

TDWG- Lisbon Oct 2003 Environmental Outliers Cumulative Frequency Curves

TDWG- Lisbon Oct 2003 Acacia orites - 19 records - 9 Temperature parameters Reverse Jack-knife

TDWG- Lisbon Oct 2003 Outliers in climate space (T=0.95(√n)+0.2) where ‘n’ is the number of records

TDWG- Lisbon Oct 2003 FloraMap CIAT (Columbia) PCA Cluster Analysis $US100 Modelling 10-minute grids

TDWG- Lisbon Oct 2003 Principal Components Analysis - FloraMap Image from FloraMap (Jones and Gladkov 2001) showing use of Principal Components Analysis to identify an outlier in Rauvolfia littoralis specimen data. A. Principal Components Analysis B. Specimen record. C. Mapped specimen. D. Climate profile

TDWG- Lisbon Oct 2003 Cluster Analysis - FloraMap Image from FloraMap (Jones and Gladkov 2001) showing use of Cluster Analysis to identify an outlier in Rauvolfia littoralis specimen data. A.Cluster Analysis B. Principal Components Analysis. C. Mapped specimen. D. Climate profile. E. Specimen record

TDWG- Lisbon Oct 2003 Diva-GIS Free Simple GIS Modelling (BIOCLIM/Domain) Data Cleaning Tools

TDWG- Lisbon Oct 2003 Diva-GIS – Coordinate Check Using Diva-GIS to check coordinates by comparing a file of point specimen records (red) against a polygon of Bolivian provinces. Input dialogue box is shown at A, where it can be seen that “STATE” in the point file has been set to the equivalent “DEPARTMENT” in the polygon file (Hijmans et al. 2003).

TDWG- Lisbon Oct 2003 Points outside Polygon – Diva GIS Results from Diva-GIS (Hijmans et al. 2003) showing point records that fall outside all polygons in the Bolivian provinces polygon file. The highlighted record shows the linking between the results dialogue box and the mapped record

TDWG- Lisbon Oct 2003 Mismatched Provinces – Diva GIS Results from Diva-GIS (Hijmans et al. 2003) showing point records that do not match set relationships between the specimen point file and the polygon of Bolivian provinces. The highlighted record where the geocoding on the specimen record causes it to fall in the wrong province

TDWG- Lisbon Oct 2003 Assign Coordinates – Diva GIS Results from Diva-GIS (Hijmans et al. 2003) showing point records with geocodes automatically assigned. A. Unambiguous geocodes found by the program and assigned. B. Ambiguous geocodes identified. C. Appropriate geocodes not found.

TDWG- Lisbon Oct 2003 Multiple possibilities – Diva GIS Results from Diva-GIS (Hijmans et al. 2003) showing alternate geocodes for a record where use of the Gazetteer has produced a number of credible alternatives.

TDWG- Lisbon Oct 2003 Cumulative Frequency Curves - DivaGiS Results from Diva-GIS (Hijmans et al. 2003) showing the use of the Cumulative Frequency curve from BIOCLIM to identify possible geocoding errors in Rauvolfia littoralis. A1 and A2 show possible outliers in climate space, B1 and B2 the corresponding mapped records. The Blue lines represent the 97.5 percentile

TDWG- Lisbon Oct 2003 Bioclimatic Envelop – Diva GIS Results from Diva-GIS (Hijmans et al. 2003) showing the use of the Bioclimatic Envelope from BIOCLIM to identify outliers in climate space. In this case the percentile cut off is set at 95. Red points on the envelope correspond with red points on the map, green points in the envelope correspond with yellow points on the map

TDWG- Lisbon Oct 2003 ANUCLIM $AUD1000 (with data files) Modelling (BIOCLIM / ESOCLIM) Cumulative Frequency Curves Parameter Extremes

TDWG- Lisbon Oct 2003 Cumulative Frequency - ANUCLIM Log file of Eucalyptus fastigata from ANUCLIM Version 5.1 (Houlder et al. 2002) showing the species accumulation curve with an identified outlier (labelled “bad”). Information from the “bad” record is displayed at the top of the log file (from Houlder et al. 2000).

TDWG- Lisbon Oct 2003 Parameter extremes - ANUCLIM Log file of Eucalyptus fastigata from ANUCLIM Version 5.1 (Houlder et al. 2002) showing the parameter extremes (top) and associated species accumulation curve (bottom) (from Houlder et al. 2000

TDWG- Lisbon Oct 2003 Statistical Tests Outliers in Latitude Outliers in Altitude Outliers in collectors range/day or week –Especially 17 th, 18 th and 19 th Century collections

TDWG- Lisbon Oct 2003 Thank You… Questions?