Presentation is loading. Please wait.

Presentation is loading. Please wait.

June 2012 Spatial Data Cleaning Species Occurrence Data Arthur D. Chapman.

Similar presentations


Presentation on theme: "June 2012 Spatial Data Cleaning Species Occurrence Data Arthur D. Chapman."— Presentation transcript:

1 June 2012 Spatial Data Cleaning Species Occurrence Data Arthur D. Chapman

2 June 2012 Methods for Validating Georeferences Internal Database Checks –Logical inconsistencies within the database –Checking one field against another Text location vs geocode or District/State External Database Checks –Checking one database against another Gazetteers DEM Collectors Outliers in Geographic Space - GIS Outliers in Environmental Space - Models Statistical outliers

3 June 2012 Error Error is inescapable and it should be recognised as a fundamental dimension of data. Chrisman 1991 Bolax gummifera, Argentina

4 June 2012 Geographic outliers - GIS Country, State, named district, etc. Gazetteer of Brazilian localities

5 June 2012 How do we find the suspect records? Canus lupis locations – extracted from GBIF 2006 Data from FMNH, KU, PSM, UAM, MSB, Humboldt Univ. Some errors are easy to find! But! What does this say about the others? ?

6 June 2012 Geographic Outliers - GIS Collectors – location vs date

7 June 2012 Environmental Outliers Cumulative Frequency Curves ? ☻ X

8 June 2012 Using Climate to Identify Outliers Reverse Jack-knife Acacia dealbata, Australia Acacia orites - 19 records - 9 Temperature parameters NB. Because the value of ‘C’ relates to it’s nearest point, successive values may be very small, so we ensure that if ‘x[i]’ is an outlier, then all points beyond are outliers too (even if they are clustered)

9 June 2012 Concept of “Outlierness” Outlierness = c[i]/ T >1 <1 “Outlierness” is the degree to which a record is an outlier T=((0.95(√n)+0.2 ) X (Range/50)) where ‘n’ is the number of records

10 June 2012 FloraMap CIAT (Colombia) PCA Cluster Analysis $US100 Modelling 10-minute grids Nothofagus antarctica, Argentina

11 June 2012 Principal Components Analysis - FloraMap Image from FloraMap (Jones and Gladkov 2001) showing use of Principal Components Analysis to identify an outlier in Rauvolfia littoralis specimen data. A. Principal Components Analysis B. Specimen record. C. Mapped specimen. D. Climate profile

12 June 2012 Cluster Analysis - FloraMap Image from FloraMap (Jones and Gladkov 2001) showing use of Cluster Analysis to identify an outlier in Rauvolfia littoralis specimen data. A.Cluster Analysis B. Principal Components Analysis. C. Mapped specimen. D. Climate profile. E. Specimen record

13 June 2012 Diva-GIS Free Simple GIS Modelling (BIOCLIM/Domain) Data Cleaning Tools Brown Algae, Argentina

14 June 2012 Diva-GIS – Coordinate Check Using Diva-GIS to check coordinates by comparing a file of point specimen records (red) against a polygon of Bolivian provinces. Input dialogue box is shown at A, where it can be seen that “STATE” in the point file has been set to the equivalent “DEPARTMENT” in the polygon file.

15 June 2012 Points outside Polygon – Diva GIS Results from Diva-GIS showing point records that fall outside all polygons in the Bolivian provinces polygon file. The highlighted record shows the linking between the results dialogue box and the mapped record

16 June 2012 Mismatched Provinces – Diva GIS Results from Diva-GIS showing point records that do not match set relationships between the specimen point file and the polygon of Bolivian provinces. The highlighted record where the geocoding on the specimen record causes it to fall in the wrong province

17 June 2012 Cumulative Frequency Curves - DivaGiS Results from Diva-GIS showing the use of the Cumulative Frequency curve from BIOCLIM to identify possible geocoding errors in Rauvolfia littoralis. A1 and A2 show possible outliers in climate space, B1 and B2 the corresponding mapped records. The Blue lines represent the 97.5 percentile

18 June 2012 Bioclimatic Envelop – Diva GIS Results from Diva-GIS showing the use of the Bioclimatic Envelope from BIOCLIM to identify outliers in climate space. In this case the percentile cut off is set at 95. Red points on the envelope correspond with red points on the map, green points in the envelope correspond with yellow points on the map

19 June 2012 Reverse Jack-knife – Diva-GIS Stuff from Diva-GIS

20 June 2012 ANUCLIM $AUD1000 (with data files) Modelling (BIOCLIM / ESOCLIM) Cumulative Frequency Curves Parameter Extremes Gable Island, Tierra del Fuego, Argentina

21 June 2012 Cumulative Frequency - ANUCLIM Log file of Eucalyptus fastigata from ANUCLIM Version 5.1 (Houlder et al. 2002) showing the species accumulation curve with an identified outlier (labelled “bad”). Information from the “bad” record is displayed at the top of the log file (from Houlder et al. 2000).

22 June 2012 Parameter extremes - ANUCLIM Log file of Eucalyptus fastigata from ANUCLIM Version 5.1 (Houlder et al. 2002) showing the parameter extremes (top) and associated species accumulation curve (bottom) (from Houlder et al. 2000)

23 June 2012 spOutlier - CRIA

24 June 2012 CRIA Data Cleaning http://splink.cria.org.br/dc

25 June 2012 CRIA Data Cleaning

26 June 2012 CRIA Data Cleaning

27 June 2012 CRIA Data Cleaning

28 June 2012 The Atlas of Living Australia is using Reverse Jack-knifing to identify suspect records ALA Data Cleaning

29 June 2012 GBIF and Outlierness No longer operating Outlierness Values

30 June 2012 Errors in data In general, error must not be treated as a potentially embarrassing inconvenience, because error provides a critical component in judging fitness for use. Chrisman, 1991 Mizodendrum sp., Argentina

31 June 2012 Reference Chapman, A.D. (2005). Principles and Methods of Data Cleaning – Primary Species Occurrence Data. Report for the Global Biodiversity Information Facility 2005. 75pp. Copenhagen: GBIF http://www.gbif.org/orc/?doc_id=1262


Download ppt "June 2012 Spatial Data Cleaning Species Occurrence Data Arthur D. Chapman."

Similar presentations


Ads by Google