Presentation is loading. Please wait.

Presentation is loading. Please wait.

WDC-MARE – World Data Center for Marine Environmental Sciences Data portal based on Open Archives Initiative Protocols and Apache Lucene Uwe Schindler,

Similar presentations


Presentation on theme: "WDC-MARE – World Data Center for Marine Environmental Sciences Data portal based on Open Archives Initiative Protocols and Apache Lucene Uwe Schindler,"— Presentation transcript:

1 WDC-MARE – World Data Center for Marine Environmental Sciences Data portal based on Open Archives Initiative Protocols and Apache Lucene Uwe Schindler, uschindler@wdc-mare.org Michael Diepenbroek, mdiepenbroek@wdc-mare.org MARUM, University of Bremen, Germany EGU 2006, Vienna, 2006-04-03 WDC-MARE – World Data Center for Marine Environmental Sciences

2 Data Portals WDC-MARE with its information system PANGAEA provides data portals for several EU/international projects: CARBOOCEAN, EUR-OCEANS, IODP Problem: Not all data are stored centralized, so all datasets provided in portals must be consolidated from different sources!

3 WDC-MARE – World Data Center for Marine Environmental Sciences Example: CARBOOCEAN data portal Data stays at the data providers Metadata is harvested by the portal Search queries are handled by the centralized catalogue Scientist gets link to data at the provider

4 WDC-MARE – World Data Center for Marine Environmental Sciences Open Archives Protocol The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) is a protocol developed by the Open Archives Initiative. uses it during web crawling ( Scholar) Almost all digital libraries support it (most famous ones: arXiv and the CERN Document Server) Very simple to implement (XML over HTTP based) Repository software for databases or file system metadata providers is widely available

5 WDC-MARE – World Data Center for Marine Environmental Sciences Current OAI-PMH software 1.Limited to Dublin Core metadata (libraries)! 2.Limited full text search functionality due to relational databases in the background! 3.No geographic retrievals (because of Dublin Core limitation)! 4.End user interface is part of the software, this limits usability in CMS systems

6 WDC-MARE – World Data Center for Marine Environmental Sciences Requirements for portal software 1.Open for any XML metadata format 2.Any mappings to document fields should be done by XPath 3.Possibility to map incompatible XML schemas during harvesting by XSL 4.No relational database, only a full text search engine, that contains everything needed for operation 5.Range queries for specific fields (date/time or numeric) 6.Web service interface for the end user software that is accessible from any language (Java/JSP, PHP, Perl,...)

7 WDC-MARE – World Data Center for Marine Environmental Sciences Lucene XML- Files OAI- PMH OAI- PMH OAI- Harvester OAI- Harvester Filesystem- Harvester OAI protocol in HTTP OAI protocol in HTTP (specific set) filesystem directory, FTP,… Mini PanHTTP Server Jetty HTTP Server Tomcat Apache Axis Virtual Index Virtual Index XSL Portal 1 (Webserver, PHP) Portal 2 (Webserver, JSP) Stored: xmldata (same format everywhere, XSL before indexing), identifier, lastModified, sets Searchable: field1: “/oai_dc:dc/dc:author” field2: “/oai_dc:dc/dc:title” field3: “java:org.test.LatLon.parse(/oai_dc:dc/dc:coverage)” * default: “.” *) xmlns:java=“http://xml.apache.org/xalan/java” MetadataPortal Java Package

8 WDC-MARE – World Data Center for Marine Environmental Sciences Metadata standard harvested for search: DIF v9.4 Searchable fields: Bounding box, date/time, parameters, authors, investigators, title Data centers: World Data Center for Marine Environmental Sciences (WDC-MARE), University of Bremen and Alfred-Wegener- Institute in Bremerhaven, Germany Carbon Dioxide Information Analysis Center (CDIAC), Environmental Sciences Division at Oak Ridge National Laboratory, USA French National Oceanographic Data Centre, SISMER (Systèmes d'Informations Scientifiques pour la Mer) at the Ifremer in Brest, France CARBOOCEAN Data Portal

9 WDC-MARE – World Data Center for Marine Environmental Sciences Thank you!


Download ppt "WDC-MARE – World Data Center for Marine Environmental Sciences Data portal based on Open Archives Initiative Protocols and Apache Lucene Uwe Schindler,"

Similar presentations


Ads by Google