Download presentation
Presentation is loading. Please wait.
Published byMelissa Combs Modified over 10 years ago
1
Heinrich Stamerjohanns Institute for Science Networking Distributed Open Archives Dr. Heinrich Stamerjohanns Institute for Science Networking at the University of Oldenburg
2
Heinrich Stamerjohanns Institute for Science Networking Goals of PhysDoc dissemenation of articles (objects) low-barrier interoperability framework
3
Heinrich Stamerjohanns Institute for Science Networking Approaches (I) unstructured data self-archiving (on web servers) crawl, harvest articles by Harvester make searchable through unified search interface try to extract metadata and/or extend unstructured data by metadata approaches taken by Harvest, mnogosearch
4
Heinrich Stamerjohanns Institute for Science Networking PhysDoc together with Harvest documents Search Interface Gatherer Summarizer Filesystem with SOIF Records documents WWW-Client
5
Heinrich Stamerjohanns Institute for Science Networking Approaches (II) structured data in homogenous environment store data in relational databases replicate data with proprietary protocol can either be synchronous ar asynchronous or use distributed database but same definitions everywhere same data layout everywere
6
Heinrich Stamerjohanns Institute for Science Networking Approaches (III) structured data in heterogenous environment collect data from different databases through web search interfaces meta-search engines succesful implementation has been done: MetaPhys but: relies heavily on layout of presented data a lot of adjusting needs to be done
7
Heinrich Stamerjohanns Institute for Science Networking Approaches (IV) structured data in heterogenous environment should: be in machine-readable format not for humans use strict formats which can be validated support various content-models (metadata formats) use existing technologies easy to implement easy to adopt
8
Heinrich Stamerjohanns Institute for Science Networking Low-barrier framework Transport protocol HTTP Data format XML Metadata format interoperability at least Dublin Core extensibility communities can use the metadata format which fits their needs
9
Heinrich Stamerjohanns Institute for Science Networking Open Archive Initiative OAI defines such a protocol: OA-PMH is not intended to replace more complete interoperability protocols such as Z39.50 distinguishes between two classes Data providers expose metadata about their content Service providers harvest metadata from data providers by using OA-PMH and offer value-added services such as the possibility to search through the collected data
10
Heinrich Stamerjohanns Institute for Science Networking PhysDoc as OAI-Data-Provider PMH v2.0 has been implemented by us phpoai2 written in PHP open source (GNU license) supports various SQL databases through PEAR ( PHP Extension and Application Repository ) supports on-the-fly XML output compression, which greatly reduces bandwith needs easily configurable and adaptable to different metadata standards
11
Heinrich Stamerjohanns Institute for Science Networking PhysDoc as OAI Data-Provider documents Gatherer Summarizer Filesystem with SOIF Records documents Metadata container as SQL Database Mapper Quality function Normalizer offline OAI-Gateway DC, MARC Mapper XML on-the-fly
12
Heinrich Stamerjohanns Institute for Science Networking PhysDoc together with ??? Use of metadata container yields many advantages consistency check of data quality assurance static HTML export any desired export metadata format besides DC possible is prepared for any other exchange protocols than OAI
13
Heinrich Stamerjohanns Institute for Science Networking OAD PhysDoc as Service-Provider PhysDoc will offer services to the physics community through Open Archives articles are collected through OAI from various OAI Data-Providers other publishers are and will be incorporated through proprietary interfaces. these interfaces do not depend on layout of the offered data
14
Heinrich Stamerjohanns Institute for Science Networking OAD PhysDoc as Service-Provider OAI Data-Provider OAI Data-Provider Scheduler XML Parser Mapper Normalizer Metadata Container as SQL DB WWW Search Interface
15
Heinrich Stamerjohanns Institute for Science Networking OAD PhysDoc as Service-Provider uses expat library to parse XML currently supports only PMH-1.1 cannot be easily adapted by other sites support for PMH-2.0 is in progress
16
Heinrich Stamerjohanns Institute for Science Networking Technical Details local development implementation also written in PHP scheduler is based on database expat library is used as XML-Parser for OAI and proprietary interfaces database is again mySQL with tricks full text extensions cannot be easily adapted by other sites support for PMH-2.0 is in progress
17
Heinrich Stamerjohanns Institute for Science Networking Technical Details successful implementation by testing on the local data-provider Added another data-provider within five minutes normalization is again necessary (might raise further technical, textual and legal problems) [but yet problems vagueness in protocol definition 503 flow control… bad choice, because it depends on layout]
18
Heinrich Stamerjohanns Institute for Science Networking Thank you OAI am Institute for Science Networking, Oldenburg: http://physnet.uni-oldenburg.de/oai/ stamer@uni-oldenburg.de
19
Heinrich Stamerjohanns Institute for Science Networking OAD Distributed Open Archives joint project by Virginia Tech and University of Oldenburg Aims: setup prototype service based on Open Archives which focuses on physics design and implementation of prototype implementations which run the OAI protocol for metadata harvesting (PMH) enable establishment and scalable interoperation of hundreds of Open Archives
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.