Presentation is loading. Please wait.

Presentation is loading. Please wait.

INCOFISH WP3 - Campinas, April 2006 WEB Tools and Data Cleaning Alexandre Marino Centro de Referência em Informação Ambiental, CrIA.

Similar presentations


Presentation on theme: "INCOFISH WP3 - Campinas, April 2006 WEB Tools and Data Cleaning Alexandre Marino Centro de Referência em Informação Ambiental, CrIA."— Presentation transcript:

1 INCOFISH WP3 - Campinas, April 2006 WEB Tools and Data Cleaning Alexandre Marino marino@cria.org.br Centro de Referência em Informação Ambiental, CrIA

2 WEB Tools and Data Cleaning These tools were developed within the scope of the speciesLink project, so, in some cases, there is a complete dependency on the architecture, the local database, and the libraries that were developed by CRIA. Data Cleaning started as an idea that had not a very clear direction, it became a very particular system.

3 The speciesLink project is being funded by FAPESP (São Paulo state agency) from October, 2001 to October, 2005.

4 Col 1 Col 2 Col 3 Col 4 Col 5 program search interface Win2000 Brahms Linux MySQL Win98 Access Win98 biota FreeeBSD PostgreSQL ? ? ? ? ? Different data sources software and systems

5 Protocol and Content Schema DiGIR protocol (Distributed Generic Information Retrieval) Potential to be globally accepted DiGIR software (Java Portal & PHP Provider) Collaborative development DarwinCore v.2 Covers the basic content elements (taxonomic identification, location and date of collecting event)

6 speciesLink site Presentation Layer speciesLink site Presentation Layer DiGIR Portal (Java) DiGIR Portal (Java) Perl Slow or unstable connectivity Fast and stable connectivity Data SOAP client Collection Management System SQL Collection C Data Repository Data SOAP client Collection Management System SQL Collection B Data Repository Postgres PHP Provider SOAP Server SQL Mirror Server Data PHP Provider Collection Management System SQL Collection A System’s Architecture

7 ~40 connected collections ~940.000 on-line records March/2006 JBRJ speciesLink network

8 WEB Tools geoLoc spOutlier infoXY conversor speciesMapper data cleaning

9 About geoLoc  to assist biological collections in geo-referencing their data  the database includes approximately 110 thousand names of Brazilian localities, obtained from:  Brazilian Institute of National Statistics and Geography (IBGE)  GEOnet Names Server (GNS)  speciesLink/Fapesp  algorithm based on concepts in the Egaz program (Shattuck 1997) capable of calculating a coordinate for a distance and direction Tools

10 26 Noroeste-NW Campinas São Paulo

11 Tools About spOutlier  to assist biological collections in identifying possible suspect points in existing records  uses techniques modified from Chapman 1999 to detect outliers in latitude, longitude and altitude  allows users to indicate their data set as either terrestrial or marine  useful to biologists around the world who wish to identify possible errors in their data

12 1, -63.25, -4.916666667, 795 2, -67.05, -10.96666667, 805 3, -68.0125, -12.66666667, 809 4, -68.75, -13.60111111, 815 5, -68.9102, -13.83333, 810 6, -72.3666, -14.36611111, 790 7, -78.3166, -14.38916667, 801 8, -72.137, -11.8647, 700

13 marine

14 1, -63.25, -4.91667 2, 34.3239,67.9836 aus, 150.0417,-34.9081 3, -68.0125, -12.6667 4, -22.0400, 63.9514 id_teste, -45, -22 6, -75.3667, -14.3661 7, 71.37, -19.37 eua, -80.8011,26.0506 9,-120.7642,58.7217 10,26.0089,-29.5197 11,-95.3781,16.7639

15 Input/Output: -degrees, min, sec -decimal degrees -UTM DATUM: -WGS84 (World) -SAD69 (Brazil) -Córrego Alegre (SP) -3.5800, 52.0633 34.3239, 67.9836 -45, -22 03d34'47"W, 52d3'47"N 34d19'23"E, 67d59'0"N 44d59'58"W, 21d59'58"S degrees, min, s

16 Plot georeferenced points on a map. Available layers: -World -South and Central America -Brazil -São Paulo State -95.6 -39.5166 -70.2833 -4.2 -70.033333 -4.35 -69.914889 0.274694 -69.7333 -4.2333 -69.6661 -3.908333...

17 Trachurus trachurus Pteroscion pele Gaidropsarus biscayensis

18 Using Data PostgreSQL spOutlier geoLoc SOAP Web service job1job2 Maps PostGIS Maps PostGIS

19 Tools About Data Cleaning  Aim at helping curators in identifying possible errors and to standardize data  Records are not modified  The system just presents "suspect" records

20 Col 1Col 2Col 3Col n National collections Col 1Col 2 Internacional collections... Tables of Suspect Records chart.pm (Perl) Local Database dc_tax dc_geo PostgreSQL Detect Suspect Records Perl Web speciesLink Portal Java How Data Cleaning Works

21 Demonstration on-line

22 Thank you! marino@cria.org.br Obrigado!


Download ppt "INCOFISH WP3 - Campinas, April 2006 WEB Tools and Data Cleaning Alexandre Marino Centro de Referência em Informação Ambiental, CrIA."

Similar presentations


Ads by Google