Interoperability Among Scholarly Repositories: Enabling Workflows Across Distributed Information Carl Lagoze Information Science Cornell University, USA.

Slides:



Advertisements
Similar presentations
A centre of expertise in digital information management UKOLN is supported by: British Academy e-Resources Policy Review: UKOLN Report.
Advertisements

UKOLN is supported by: Put functionality Augmenting interoperability across scholarly repositories 20/21 April 2006 Rachel Heery, UKOLN, University of.
UKOLN is supported by: JISC Information Environment update Repositories and Preservation Programme meeting, October 24-25, 2006 Rachel Heery UKOLN
Digital Repositories: interoperability & common services Closing Remarks Dr Liz Lyon, UKOLN, University of Bath, UK
DuraSpace: Digital Information All Ways, Always Pretoria, South Africa May 14 th, 2009.
Institutional Repository for CDU What’s in your bottom drawer? Ruth Quinn, Director Library and Information Access Charles Darwin University.
Object Re-Use and Exchange Mellon Retreat, Nassau Inn, Princeton, NJ, March Herbert Van de Sompel, Carl Lagoze The OAI Object Re-Use & Exchange.
Planning for Flexible Integration via Service-Oriented Architecture (SOA) APSR Forum – The Well-Integrated Repository Sydney, Australia February 2006 Sandy.
UKOLN is supported by: OAI-ORE a perspective on compound information objects ( Defining Image Access.
Introduction and Overview “the grid” – a proposed distributed computing infrastructure for advanced science and engineering. Purpose: grid concept is motivated.
1 MPEG-21 : Goals and Achievements Ian Burnett, Rik Van de Walle, Keith Hill, Jan Bormans and Fernando Pereira IEEE Multimedia, October-November 2003.
UKOLN is supported by: A non-technical introduction to: OAI-ORE ( Defining Image Access project meeting.
UKOLN is supported by: OAI-ORE : Object Reuse and Exchange an introduction ( UKOLN staff seminar UKOLN,
The Open Archives Initiative Simeon Warner (Cornell University) Symposium on “Scholarly Publishing and Archiving on the Web”, University.
Integrating Repositories into a New Model of Scholarly Communication Dr Andrew Treloar Director, Information Management and Strategic Planning, Monash.
OAI Standards for Sheet Music Meeting March 28-29, 2002 Basic OAI Principals How They Apply to Sheet Music Presenter: Curtis Fornadley, Senior Programmer/Analyst.
The Open Archives Initiative Simeon Warner (Cornell University) Open Archives seminar “Facilitating Free and Efficient Scientific.
Institutional Repositories Tools for scholarship Mary Westell University of Calgary AMTEC Conference May 26, 2005.
Dienst Distributed Networked Publishing Carl Lagoze Digital Library Scientist Cornell University.
Teaching Metadata and Networked Information Organization & Retrieval The UNT SLIS Experience William E. Moen School of Library and Information Sciences.
Geoff Payne ARROW Project Manager 1 April Genesis Monash University information management perspective Desire to integrate initiatives such as electronic.
Using IESR Ann Apps MIMAS, The University of Manchester, UK.
Supporting further and higher education The UK FAIR Programme: OAI in context Chris Awre OAI3, CERN, February 2004.
MPEG-21 : Overview MUMT 611 Doug Van Nort. Introduction Rather than audiovisual content, purpose is set of standards to deliver multimedia in secure environment.
Perspectives on scholarly communication Herbert Van de Sompel Los Alamos National Laboratory – Research Library Universitaire Stichting – Brussels - October.
Dec 9-11, 2003ICADL Challenges in Building Federation Services over Harvested Metadata Hesham Anan, Jianfeng Tang, Kurt Maly, Michael Nelson, Mohammad.
Digital/Open Access repositories Paul Sheehan Director of Library Services DCU HEAnet National Networking Conference Athlone 11 th November 2005.
Fedora Content Models for the National Science Digital Library Data Repository Fedora User’s Group Meeting Copenhagen, September 28, 2005 Carl Lagoze Cornell.
PLoS ONE Application Journal Publishing System (JPS) First application built on Topaz application framework Web 2.0 –Uses a template engine to display.
OAI-PMH for Resource Harvesting Tutorial OAI4, October 20 th 2005, CERN, Geneva, Switzerland OAI-PMH for Resource Harvesting Herbert Van de Sompel Digital.
Linking research & learning technologies through standards 1 Lyle Winton lylejw AT unimelb.edu.au.
Ocean Observatories Initiative Data Management (DM) Subsystem Overview Michael Meisinger September 29, 2009.
Networked Information Resources SPARC, E-prints & Open Access initiatives.
CBSOR,Indian Statistical Institute 30th March 07, ISI,Kokata 1 Digital Repository support for Consortium Dr. Devika P. Madalli Documentation Research &
Van de Sompel, Herbert Los Alamos National Laboratory – Research Library OAI-PMH for Resource Harvesting.
Modularization and Interoperability: Dublin Core and the Warwick Framework Sandra D. Payette Digital Library Research Group Cornell University November.
Digital Commons & Open Access Repositories Johanna Bristow, Strategic Marketing Manager APBSLG Libraries: September 2006.
10/07/2008 Semantic Web Technologies & Higher Education.
The OAI: overview and historical context OAI Open Meeting – Washington DC – January 23 rd 2001 Herbert Van de Sompel & Carl Lagoze Cornell University --
1 GRID Based Federated Digital Library K. Maly, M. Zubair, V. Chilukamarri, and P. Kothari Department of Computer Science Old Dominion University February,
OAI Overview DLESE OAI Workshop April 29-30, 2002 John Weatherley
ARROW Institutional Repositories for Managing e-Theses Presentation to ETD September 2005 Geoff Payne, ARROW Project Manager.
JISC/NSF PI Meeting, June Archon - A Digital Library that Federates Physics Collections with Varying Degrees of Metadata Richness Department of Computer.
Oct 12-14, 2003NSDL Challenges in Building Federation Services over Harvested Metadata Kurt Maly, Michael Nelson, Mohammad Zubair Digital Library.
Publishing & Citing Research Data Arun Prakash. Agenda  Introduction  Why is Data publishing important ?  Ongoing Work  Role of Semantics.
DSpace - Digital Library Software
April 14, 2005MIT Libraries Visiting Committee Libraries Strategic Plan Theme III Work to shape the future MacKenzie Smith Associate Director for Technology.
Open Archives Initiative Gail McMillan Digital Library and Archives, Virginia Tech Society for Scholarly Publishing: June 1, 2000.
Research Library, Los Alamos National Laboratory RESEARCH OAI4 - Geneva, Switzerland Digital Library Research & Prototyping Team OAI-PMH and.
OAI-PMH for Resource Harvesting Tutorial OAI4, October 20 th 2005, CERN, Geneva, Switzerland The American Physical Society Project: Standards-based Mirroring.
Herbert Van de Sompel Research Library, Los Alamos National Laboratory OAI4, October , CERN, Geneva, Switzerland RESEARCH LIBRARY Lessons in.
Carl Lagoze Digital Library Service Registry Workshop Services in a Scholarly Communication Framework.
IESR, A Registry of Collections and Services: Using the DCMI Collection Description Profile in Practice Ann Apps MIMAS, The University of Manchester, UK.
Herbert van de sompel Frye Leadership Institute Emory University, June 11th 2002 Herbert Van de Sompel Los Alamos National Laboratory – Research Library.
Harokopio University of Athens – Department of Informatics and Telematics HAROKOPIOUNIVERSITY A Distributed Architecture for Building Federated Digital.
The JISC Information Environment Service Registry (IESR) Ann Apps Mimas, The University of Manchester, UK.
Fedora Commons Overview and Background Sandy Payette, Executive Director UK Fedora Training London January 22-23, 2009.
Networked Information Resources Federated search, link server, e-books.
The Multi-Faceted Use of the OAI-PMH in the LANL Repository Written By: Henry, Xiaoming,Patrick Henry, Xiaoming,Patrick and Herbert. Presented By: Shashi.
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
Packaging Specification Package Ingest Service
An Overview of Data-PASS Shared Catalog
Flexible Extensible Digital Object Repository Architecture
Jenn Riley Metadata Librarian Digital Library Program
Flexible Extensible Digital Object Repository Architecture
An Architecture for Complex Objects and their Relationships
VI-SEEM Data Repository
Outline Pursue Interoperability: Digital Libraries
NSDL Data Repository (NDR)
Jenn Riley Metadata Librarian Digital Library Program
Presentation transcript:

Interoperability Among Scholarly Repositories: Enabling Workflows Across Distributed Information Carl Lagoze Information Science Cornell University, USA Herbert Van de Sompel Research Library Los Alamos National Laboratory, USA

Acknowledgments This talk based on the following work: o NSF-funded Pathways project (IIS ) - Cornell University (PIs: Carl Lagoze, Sandy Payette, Simeon Warner) - LANL Digital Library Research & Prototyping Team (PI Herbert Van de Sompel). o The LANL aDORe repository effort o The PhD thesis by Jeroen Bekaert (Advisor Herbert Van de Sompel) regarding protocol-based interfaces for Open Archival Information Systems (OAIS). -

References “Rethinking Scholarly Communication”, D-Lib September 2004 “Interoperability for Distributed Scholarly Workflows”, D-Lib October 2006 “Pathways: Augmenting Interoperability for Scholarly Repositories”, Upcoming Journal of Digital Libraries

Some Background Digital transition of scholarly communication has been in form rather than nature Try and build a scholarly communication system that is more natively digital, i.e. use the capabilities of digital, network technologies o Collaboration o Immediacy o Reuse o Dynamic Exploit advances in institutional repositories and interest in open access Frame scholarly communication as a workflow among distributed information units Provide framework for new advanced services o Visualization o Usage analysis o …

Interoperability in a Heterogeneous World Diversity of (repository) technology o DSpace o Fedora o aDORe o EPrints o Greenstone Define an interoperability layer in which o Information can be modeled o Information can be shared o Information can be transfered o Information can be reused

Some Meta-Observations on Interoperability Scholarly communication is a long-term endeavor: Dependent on stability and integrity of participants Need abstract definitions of models and interfaces that can be instantiated on the basis of various technologies as time goes by Identification is particularly important: Scalable Agnostic about existing identification schemes Granular Object decomposition Repository origination Value chains do not require transfer of all digital object content The content that needs to be transferred depends on the nature of the value chain

Augmenting interoperability across Repositories DSpaceFedoraaDOReePrintsarXivNature Individual Data Models and Services Shared Data Model and Services

Scholarly communication as a cross- repository value chain

Motivation 1 : Richer cross-Repository services Distributed Repositories provide source materials for cross- Repository overlay services such as discovery services Manner in which those materials are exposed must allow for the seamless emergence of rich and meaningful services

Scenario 1: Chemical search engine A search engine monitors scholarly repositories but is only interested in making machine-readable chemical structures contained in Digital Objects available from those repositories searchable. This constitutes re-use of the (part of) the Digital Objects by a service overlaid upon the monitored repositories. And, of course, a chemical compound discovered via the search engine can be cited in some new paper, i.e. the value chain does not stop here Richer cross-Repository services : Scenario

Motivation 2 : Scholarly communication workflow Distributed Repositories at the basis of a digital scholarly communication system Scholarly communication as a global workflow (value chain) across those Repositories Digital Objects from Repositories are the subject of the workflow; they are used and re-used in many contexts.

Scholarly communication workflow : Scenarios Scenario 2: Citation An author writes a paper (to be Put into her institutional repository) and cites 10 papers available from other repositories. A citation to a paper is a type of re-use of the cited paper in a new context. And, of course, the new paper can be cited too, i.e. the value chain does not stop here.

Adding Value to Fundamental Units Paul Ginsparg

Scholarly communication workflow : Scenarios Scenario 3: Overlay journal The editor of an overlay journal selects papers from 3 different repositories for inclusion in the next issue of the overlay journal. Each of those articles is being re-used in a new context, with value being added. And, the overlay journal can be mirrored for preservation purposes, i.e. the value chain does not stop here.

Scholarly communication workflow : Scenarios Scenario 4: eScience A researcher uses datasets from 2 different dataset repositories, performs operations on those, and creates a publication that contains a resulting new dataset and an accompanying paper, and deposits this publication in her institutional repository. This constitutes re-use of the origin datasets, and value added through the creation of the new publication. And, of course, the new dataset can be re-used too, i.e. the value chain does not stop here.

Building Block I - Repositories Networked system that provides services pertaining to a managed collection of digital objects. Institutional repositories, online journals, dataset stores, learning objects, etc.

Aim: Digital Object use and re-use We must leverage the value of the materials that become available in those distributed Repositories. Think about these Repositories as active nodes in a global environment, not as passive local nodes o These Repositories are about facilitating the use and re- use of materials in many contexts o These Repositories are the starting point of value chains

Building Block II: Digital Objects id Digital Objects Abstract units of scholarly communication Compound aggregations consisting of: Multiple media types Linkage to services Have a persistent identifier Can be recursive: digital objects within digital objects Instantiated in various implementations c.f. Kahn/Wilensky Model

Digital Object: A data structure whose principal components are digital data and key-metadata. Digital data can be a Datastream or a Digital Object, i.e. a Digital Object may have one or more other Digital Objects as nested components. Key-metadata must include an identifier for the Digital Object. id Data Model: An abstraction for Digital Objects such that each Digital Object can be seen as an instance of the class defined by a Data Model. Example Data Models include the Pathways Core model, the MPEG-21 Digital Item Declaration model, etc. Surrogate: A serialization of a Digital Object according to a Data Model. m Datastream: An ordered sequence of bytes. Terminology

Obtain interface: a Repository interface that supports the request of services pertaining to individual Digital Objects (including their component Datastreams). Terminology Obtain Repository: a networked system that provides services pertaining to a collection of Digital Objects. Harvest Harvest interface: a Repository interface that exposes Surrogates for incremental collecting/harvesting. Put Put interface: a Repository interface that supports submission of one or more Surrogates into the Repository, thereby facilitating the addition of Digital Objects to the collection of the Repository.

Augmenting interoperability across Repositories DSpaceFedoraaDOReePrintsarXivNature Individual Data Models and Services m Obtain Harvest Put

Common Data Model Provides a common abstraction for describing digital objects despite their (repository, service)-specific implementation. A common denominator: Does not completely cover implementation-specific features Features conform to requirements of interoperability fabric (e.g., identity, workflow support, etc.) m

Model Core Requirement Recursion for n-levels of information containment Identity independent of specific schemes Lineage relationships among objects o evidence of workflow for evidential citation Semantics associated with entities o facilitate service mapping Link to concrete representation Assertion of persistence levels m

Data Model

Recursion m

Entities Entity : to represent Digital Object to attach properties to contained elements hasEntity : to express containment/recursion m

Identity m

2 levels of Identity hasIdentifier ~ traditional identifier(s) of Digital Object (e.g., DOI) providerInfo ~ repository-centric, fine granularity identification ( provider, preferredIdentifier, versionKey ) supports service requests at the granularity of the repository m

Lineage Relationships m

Lineage Provides the basis for evidential citation Co-exists and complements bibliographic citation hasLineage : value is providerInfo of object from which it derives. Basis of value chains. m

Basis for a Network of Linked Digital Objects

Semantics m

Concrete Representation m

Persistence Guarantees m

Augmenting interoperability across Repositories A Surrogate is available for every Digital Object A Surrogate is a representation of the Digital Object according to the Pathways Core data model The representation is uniform across repositories; not tied to identifier type, content type, application domain. The Surrogate is what is used in the value chains; the Surrogate is used at Obtain, Harvest and Put interfaces. o Expresses properties and access points for the Digital Object (see later) m Pathways Core Surrogates (currently XML/RDF)

Augmenting interoperability across Repositories The Surrogates provide By-Reference access to constituent datastreams of Digital Objects Full asset transfer is only required for certain applications Avoid IP issues at the level of the interoperability framework The idea is that the Surrogate itself is not encumbered by IP issues; attach - by definition - a liberal Creative Commons license to Surrogates Allow Surrogates to flow freely independent of business models of the underlying content m Pathways Core Surrogates (currently XML/RDF)

info:doi/ / info:doi/ / info:sid/overlay.org info:arxiv/cs.DL/ info:arxiv/cs.DL/ info:sid/arXiv.org

Obtain interface: a Repository interface that supports the request of services pertaining to individual Digital Objects (including their component Datastreams). The core service is the request of a Surrogate for a Digital Object. Augmenting interoperability across Repositories Obtain Harvest Harvest interface: a Repository interface that exposes Surrogates for incremental collecting/harvesting. Put Put interface: a Repository interface that supports submission of one or more Surrogates into the Repository, thereby facilitating the addition of Digital Objects to the collection of the Repository.

Surrogate is at the core of the value chain id Obtain Put Obtain recombine & add value Lineage providerInfo

Repo1 Obtain Harvest Put 1 Harvest 1 Obtain 1 Put Repo2 Obtain Harvest Put 2 Harvest 2 Obtain 2 Put service

Repo2 Repo1 Obtain Harvest Obtain Harvest Put 2 Harvest 2 Obtain 2 Put 1 Harvest 1 Obtain 1 Put providerObtainHarvestPut Repo1Obtain 1 Harvest 1 Put 1 Repo2Obtain 2 Harvest 2 Put 2 Service Registry

Meeting in NYC, April Supported by Microsoft, Mellon Foundation, Coalition for Networked Information, Digital Library Federation, JISC Representatives from institutional Repository projects, scholarly content Repositories, Registry projects, various projects that touch on interoperability See for Agenda, Participants, Topics & Goals, Terminology, Presentations, Prototype demonstration. Report available since beginning of August 2006 Very likely that an international interoperability effort will be started towards the end of 2006

Demonstration Overlay journal Scenario combined with Search engine Scenario Surrogates compliant with Pathways Core Data Model, expressed in RDF/XML. Obtain interfaces (OpenURL Application) at: o an aDORe repository o arXiv o a DSpace repository o a Fedora repository Harvest interfaces (OAI-PMH) at: o an aDORe repository o arXiv o a Fedora repository Put interface at a Fedora repository MS Live Clipboard functionality in user interfaces of arXiv, Fedora, and the overlay search engine

Demonstration Acknowledgments: o Carl Lagoze, Sandy Payette, Simeon Warner, Chris Wilper at Cornell University o Rob Tansley at HP o Luda Balakireva, Xiaoming Liu, Herbert Van de Sompel, Zhiwu Xie at the Los Alamos National Laboratory

Demonstration id Obtain Put Live Clipboard Copy Live Clipboard Paste Submit

Questions, Comments, Flames