INCOFISH WP3 - Campinas, April 2006 WEB Tools and Data Cleaning Alexandre Marino Centro de Referência em Informação Ambiental, CrIA.

Slides:



Advertisements
Similar presentations
Infra-estrutura de Informação sobre a Biodiversidade Amazônica Conferência Científica Internacional Amazônia em Perspectiva Manaus, 18 Novembro de 2008.
Advertisements

Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Torsten Antoni – LCG Operations Workshop, CERN 02-04/11/04 Global Grid User Support - GGUS -
Centro de Referência em Informação Ambiental, CRIA Sidnei de Souza Abril 2006 mapcria web service.
EMu Online Data Sources Brad Lickman For Taxonomy and Geolocation (and Vocabulary Control)
SpeciesLink The Brazilian experience on setting up a network Renato De Giovanni Centro de Referência em Informação Ambiental, CrIA.
TDWG- Lisbon Oct 2003 Data Cleaning Tools and Methodologies Arthur D. Chapman Australia / Brazil Centro de Referência em Informação Ambiental.
DiGIR1 Distributed Databases and Applications John Wieczorek Museum of Vertebrate Zoology, UC Berkeley.
An Operational Metadata Framework For Searching, Indexing, and Retrieving Distributed GIServices on the Internet By Ming-Hsiang.
1 ORNL DAAC MODIS Web Service MODIS subsets through Simple Object Access Protocol (SOAP) Suresh Kumar ORNL DAAC User Working Group.
Wageningen Library content Management System Peter van Boheemen 32nd ELAG Library Systems Seminar 14 April 2008, Wageningen.
Gas Tracker 9000 Semester Project EEL 6788 Spring 2010 Chris Giles EEL April-2010 University of Central Florida.
Cláudio Baptista, UFCG A Model for Geographic Knowledge Extraction on Web Documents Cláudio E. C. Campelo and Cláudio de Souza.
SpeciesLink A System for integrating distributed primary biodiversity data Vanderlei Perez Canhos Centro de Referência em Informação Ambiental, CrIA.
WebBee A platform for a Brazilian information network on bees. Inter-American Workshop on Environmental Data Access 3-6 March 2004 – Campinas - Brazil.
Complaint Desk Team 8. Introduction A web based system that records grievances. A web based system that records grievances. Users can report their grievances.
CORDRA Philip V.W. Dodds March The “Problem Space” The SCORM framework specifies how to develop and deploy content objects that can be shared and.
Open Source Software Sustainability: A Case Study of Indiana University's Variations Software Jon W. Dunn, Phil Ponella, and Robert H. McDonald Indiana.
II Course on GBIF Node Management Arusha, Tanzania 31 st October and 1 st November 2008 Tim ROBERTSON Systems Architect GBIF Secretariat Data Publishing.
Biological data: georeferencing Monica Papeş University of Kansas
DEF System Architecture XML Web Services Fedora and the Zebra Search Engine in an OAI Eprints Application by Gert Schmeltz Pedersen, DTV
Metadata Harvesting The Hague, 13 & 14 January 2009 Julie Verleyen Scientific Coordinator, Europeana Office EuropeanaLocal Knowledge Sharing Workshop.
IT Terminology Quiz VSB 1002: Business Dynamics II Spring 2009.
SITools Enhanced Use of Laboratory Services and Data Romain Conseil
POSTER TEMPLATES BY: Meta data - data that provides information about data.Meta data - data that provides information about.
OBIS Portal Architecture Concepts plus potential for utilization as a basis for Regional OBIS Nodes Tony Rees, CSIRO Marine Research, Hobart (and OBIS.
Spatiotemporal Tile Indexing Scheme Oscar Pérez Cruz Polytechnic University of Puerto Rico Mentor: Dr. Ranga Raju Vatsavai Computational Sciences and Engineering.
Tunis International Centre for Environmental Technologies Small Seminar on Networking Technology Information Centers UNFCCC secretariat offices Bonn, Germany.
Open access to biodiversity data: the speciesLink experience Dora Ann Lange Canhos
Debby Quock November 13, 2012 IRMIS at CLS. IRMIS Currently at CLS PV Crawler –Perl modules that parse EPICS IOC st.cmd, db, and dbd files. Information.
BEN Architecture Isovera Consulting Feb Internet consulting for non-profits 2 BEN Architecture Diagram.
Zope/Plone/Python for Research Ben Best OBISSEAMAP mapping marine megavertebrates
Chapter 8 Evaluating Alternatives for Requirements, Environment, and Implementation.
Centro de Referência em Informação Ambiental, CRIA Dora Ann Lange Canhos March, 2007 mapcria web service openModeller Incofish & CRIA.
Claudinei Rodrigues de Aguiar Federal University of Technology - Parana Paulo de Oliveira Camargo São Paulo State University.
BioCASE – A Biological Collection Access Service for Europe BioCASE programme – metadata and computing methods The Irish National Node Workshop: October.
CONTENT DISCOVERY, SERVICES, AND SUSTAINED ACCESS Timothy Cole, William Mischo, Beth Sandore, Sarah Shreeves ~ University of Illinois Library
Metadata harvesting in regional digital libraries in PIONIER Network Cezary Mazurek, Maciej Stroiński, Marcin Werla, Jan Węglarz.
REAL ESTATE INVENTORY SYSTEM Training Seminar - December 1, 2011 Tirana, Albania Guidelines on how to work with the Promise System.
All Hands Meeting 2005 BIRN Portal Architecture: Security Jana Nguyen
OpenModeller framework for ecological niche modelling CRIA, INPE, Poli-USP.
Distributed Biodiversity Information Databases A. Townsend Peterson.
GBIF Data Access and Database Interoperability 2003 Work Programme Overview Donald Hobern, GBIF Programme Officer for Data Access and Database Interoperability.
An introduction to data exchange protocols in TDWG Renato De Giovanni TDWG 2008.
NeMys: an evolving biological information system, a state of art Deprez, Tim (UGent) Vincx, Magda (UGent) Vanden Berghe, Edward (VLIZ) Mees, Jan (VLIZ)
Mercury – A Service Oriented Web-based system for finding and retrieving Biogeochemical, Ecological and other land- based data National Aeronautics and.
OpenModeller A framework for biological/environmental modelling Inter-American Workshop on Environmental Data Access Campinas - SP, Brazil March 2004.
Fábio Lang da Silveira – This talk on behalf of OBIS International Committee and OBIS North & South America Nodes USP – Zoology.
DATA, TOOLS, AND OUTREACH DATA, TOOLS, AND OUTREACH WORKPACKAGE 1 “WP1 will provide a solid data base and archive, powerful tools, and user friendly access.
II Course on GBIF Node Management Arusha, Tanzania 31 st October and 1 st November 2008 Tim ROBERTSON Systems Architect GBIF Secretariat The GBIF Data.
Efficient computerization and management of biological collections and mobilization of specimen information onto the Internet.Efficient computerization.
IABIN Pollinator Thematic Network: Overview Washington, DC 28 October 2008 Michael Ruggiero Smithsonian Institution, USA
Networking Biodiversity Data – Online Access to Distributed Data Sources in GBIF-D Andrea Hahn, A. Kirchhoff & W.G. Berendsohn Botanic Garden and Botanical.
1 openModeller Presentation Plan: Overview of openModeller OMWS: an open standard for distributed ecological niche modelling openModeller in relation to.
The New GBIF Data Portal Web Services and Tools Donald Hobern GBIF Deputy Director for Informatics October 2006.
Train-the-Trainers 2 Workshop Overview August, 2013 iDigBio, Gainesville, Florida (What have we gotten ourselves into?)
Amazon Basin Biodiversity Information Facility – ABBIF.
Inter-American Workshop on Environmental Data Access geoLoc and spOutlier: on-line tools for geocoding and validating biological data geoLoc and spOutlier.
TapirLink: Enabling the transition to TAPIR Renato De Giovanni TDWG 2007.
Vision for Laboratory Data Distribution April 9, 2015 How Much of the Vision Has Been Realized since 2012 National Soil Survey Center Soil Survey Laboratory.
GRIN-Global Suite of Software Components. Updater.
TRIG: Truckee River Info Gateway Dave Waetjen Graduate Student in Geography Information Center for the Environement (ICE) University of California, Davis.
COUNTER Code of Practice - an introduction to Release 4
Network Quality Monitoring System NQMS
INTAROS WP5 Data integration and management
Flanders Marine Institute (VLIZ)
Multilevel Marketing Tree Viewer
CHAPTER 3 Architectures for Distributed Systems
Fast and stable connectivity registry http/xml speciesLink site Presentation Layer http/xml Registered Providers lib DiGIR UDDI Portal Fast.
Tango in a Nutshell 31/12/2018.
SDMX IT Tools SDMX Registry
Presentation transcript:

INCOFISH WP3 - Campinas, April 2006 WEB Tools and Data Cleaning Alexandre Marino Centro de Referência em Informação Ambiental, CrIA

WEB Tools and Data Cleaning These tools were developed within the scope of the speciesLink project, so, in some cases, there is a complete dependency on the architecture, the local database, and the libraries that were developed by CRIA. Data Cleaning started as an idea that had not a very clear direction, it became a very particular system.

The speciesLink project is being funded by FAPESP (São Paulo state agency) from October, 2001 to October, 2005.

Col 1 Col 2 Col 3 Col 4 Col 5 program search interface Win2000 Brahms Linux MySQL Win98 Access Win98 biota FreeeBSD PostgreSQL ? ? ? ? ? Different data sources software and systems

Protocol and Content Schema DiGIR protocol (Distributed Generic Information Retrieval) Potential to be globally accepted DiGIR software (Java Portal & PHP Provider) Collaborative development DarwinCore v.2 Covers the basic content elements (taxonomic identification, location and date of collecting event)

speciesLink site Presentation Layer speciesLink site Presentation Layer DiGIR Portal (Java) DiGIR Portal (Java) Perl Slow or unstable connectivity Fast and stable connectivity Data SOAP client Collection Management System SQL Collection C Data Repository Data SOAP client Collection Management System SQL Collection B Data Repository Postgres PHP Provider SOAP Server SQL Mirror Server Data PHP Provider Collection Management System SQL Collection A System’s Architecture

~40 connected collections ~ on-line records March/2006 JBRJ speciesLink network

WEB Tools geoLoc spOutlier infoXY conversor speciesMapper data cleaning

About geoLoc  to assist biological collections in geo-referencing their data  the database includes approximately 110 thousand names of Brazilian localities, obtained from:  Brazilian Institute of National Statistics and Geography (IBGE)  GEOnet Names Server (GNS)  speciesLink/Fapesp  algorithm based on concepts in the Egaz program (Shattuck 1997) capable of calculating a coordinate for a distance and direction Tools

26 Noroeste-NW Campinas São Paulo

Tools About spOutlier  to assist biological collections in identifying possible suspect points in existing records  uses techniques modified from Chapman 1999 to detect outliers in latitude, longitude and altitude  allows users to indicate their data set as either terrestrial or marine  useful to biologists around the world who wish to identify possible errors in their data

1, , , 795 2, , , 805 3, , , 809 4, , , 815 5, , , 810 6, , , 790 7, , , 801 8, , , 700

marine

1, , , , aus, , , , , , id_teste, -45, -22 6, , , 71.37, eua, , , , , , , ,

Input/Output: -degrees, min, sec -decimal degrees -UTM DATUM: -WGS84 (World) -SAD69 (Brazil) -Córrego Alegre (SP) , , , d34'47"W, 52d3'47"N 34d19'23"E, 67d59'0"N 44d59'58"W, 21d59'58"S degrees, min, s

Plot georeferenced points on a map. Available layers: -World -South and Central America -Brazil -São Paulo State

Trachurus trachurus Pteroscion pele Gaidropsarus biscayensis

Using Data PostgreSQL spOutlier geoLoc SOAP Web service job1job2 Maps PostGIS Maps PostGIS

Tools About Data Cleaning  Aim at helping curators in identifying possible errors and to standardize data  Records are not modified  The system just presents "suspect" records

Col 1Col 2Col 3Col n National collections Col 1Col 2 Internacional collections... Tables of Suspect Records chart.pm (Perl) Local Database dc_tax dc_geo PostgreSQL Detect Suspect Records Perl Web speciesLink Portal Java How Data Cleaning Works

Demonstration on-line

Thank you! Obrigado!