Breaking down the walls Moving libraries from collectors to portals Carl Lagoze Cornell University

Slides:



Advertisements
Similar presentations
A brief overview of the Open Archives Initiative Steve Hitchcock Open Citation Project (OpCit) Southampton University Prepared for Z39.50/OAI/OpenURL plenary.
Advertisements

DuraSpace: Digital Information All Ways, Always Pretoria, South Africa May 14 th, 2009.
An Introduction to Repositories Thornton Staples Director of Community Strategy and Alliances Director of the Fedora Project.
Y.T. a brief history of the OAI 0 Kaynak: Herbert van de Sompel.
ILRT / Harmony / JISC Synthesis Dan Brickley, ILRT. University of Bristol Harmony and Synthesis.
1 Building the NSDL William Y. Arms Cornell University Thinking aloud about the NSDL.
Object Re-Use and Exchange Mellon Retreat, Nassau Inn, Princeton, NJ, March Herbert Van de Sompel, Carl Lagoze The OAI Object Re-Use & Exchange.
Planning for Flexible Integration via Service-Oriented Architecture (SOA) APSR Forum – The Well-Integrated Repository Sydney, Australia February 2006 Sandy.
Project Prism Virtual Remote Control: Preservation Risk Management for Web Resources Nancy Y. McGovern, ECURE 2002.
Building Reliable Distributed Information Spaces Carl Lagoze CS /22/2002.
UKOLN is supported by: A non-technical introduction to: OAI-ORE ( Defining Image Access project meeting.
The Open Archives Initiative Simeon Warner (Cornell University) Symposium on “Scholarly Publishing and Archiving on the Web”, University.
Cornell CS 502 Metadata for the Web From Discovery to Description CS 502 – Carl Lagoze – Cornell University.
OAI Standards for Sheet Music Meeting March 28-29, 2002 Basic OAI Principals How They Apply to Sheet Music Presenter: Curtis Fornadley, Senior Programmer/Analyst.
The Open Archives Initiative Simeon Warner (Cornell University) Open Archives seminar “Facilitating Free and Efficient Scientific.
Corporation For National Research Initiatives NSF SMETE Library Building the SMETE Library: Getting Started William Y. Arms.
1 An introduction to the NSDL William Y. Arms Cornell University.
Some URLs JODI Paper – Harmony project –
The Open Archives Initiative Simeon Warner Cornell University, Ithaca, NY, USA CREPUQ 2002, Montréal, Canada 14:00, 24 October 2002.
System Design/Implementation and Support for Build 2 PDS Management Council Face-to-Face Mountain View, CA Nov 30 - Dec 1, 2011 Sean Hardman.
Dienst Distributed Networked Publishing Carl Lagoze Digital Library Scientist Cornell University.
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
Introduction to the OAI Metadata Harvesting Protocol Hussein Suleman, Digital Library Research Laboratory Virginia Tech.
1 The NSDL: A Case Study in Interoperability William Y. Arms Cornell University.
Metadata Repositories for Interoperable/Shareable Metadata.
Herbert van de sompel Workshop on OAI and peer review journals in Europe Geneva, Switserland – March 22nd to 24th 2001 Herbert Van de Sompel Cornell University.
Preserving Digital Collections for Future Scholarship Oya Y. Rieger Cornell University
U.S. Department of the Interior U.S. Geological Survey CDI Webinar Sept. 5, 2012 Kevin T. Gallagher and Linda C. Gundersen September 5, 2012 CDI Science.
Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
Meta Tagging / Metadata Lindsay Berard Assisted by: Li Li.
Metadata Modularization Concepts and Tools Carl Lagoze CS
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Internet2 Middleware Initiative. Discussion Outline  What is Middleware why is it important why is it hard  What are the major components of middleware.
The OAI: overview and historical context OAI Open Meeting – Washington DC – January 23 rd 2001 Herbert Van de Sompel & Carl Lagoze Cornell University --
OAI-PMH: Open Archives Initiative Protocol for Metadata Harvesting T.B. Rajashekar National Centre for Science Information (NCSI) Indian Institute of Science,
1 A Very Large Digital Library Technology Demonstration William Y. Arms Cornell University.
Digital Library Interoperability Architecture CS 502 – Carl Lagoze – Cornell University.
Kurt Maly Department of Computer Science Old Dominion University Norfolk, Virginia 23529, USA Digital Libraries, OAI and Free Software.
The OAI: overview and historical context OAI Open Meeting – Washington DC – January 23 rd 2001 Herbert Van de Sompel & Carl Lagoze Cornell University --
Joint Information Systems Committee Supporting Higher and Further Education Rachel Bruce Programme Manager, JISC Executive Collection.
Slavic Digital Text Workshop 2006 The Open Archives Initiative Protocol for Metadata Harvesting: an Opportunity for Sharing Content in a Distributed Environment.
1 GRID Based Federated Digital Library K. Maly, M. Zubair, V. Chilukamarri, and P. Kothari Department of Computer Science Old Dominion University February,
1 The ABC Metadata Ontology and Model Carl Lagoze, Cornell University Jane Hunter, DSTC.
OAI Overview DLESE OAI Workshop April 29-30, 2002 John Weatherley
Integrating Access to Digital Content Sarah Shreeves University of Illinois at Urbana-Champaign Visual Resources Association 23 rd Annual Conference Miami.
1 The NSDL Program Stephen Griffin National Science Foundation.
Search Interoperability, OAI, and Metadata Sarah Shreeves University of Illinois at Urbana-Champaign Basics and Beyond Grainger Engineering Library April.
Introduction to the Semantic Web and Linked Data
The OAI: technical overview OAI Open Meeting – Washington DC – January 23 rd 2001 Herbert Van de Sompel & Carl Lagoze Cornell University -- Computer Science.
26/05/2005 Research Infrastructures - 'eInfrastructure: Grid initiatives‘ FP INFRASTRUCTURES-71 DIMMI Project a DI gital M ulti M edia I nfrastructure.
Cornell CS 502 Metadata for the Web Issues and Simple Answers CS 502 – Carl Lagoze – Cornell University.
Metadata for the Web Beyond Dublin Core? CS 431 – March 9, 2005 Carl Lagoze – Cornell University Acknowledgements to Liz Liddy and Geri Gay.
GRID ANATOMY Advanced Computing Concepts – Dr. Emmanuel Pilli.
Carl Lagoze Digital Library Service Registry Workshop Services in a Scholarly Communication Framework.
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems 1.
Fedora Commons Overview and Background Sandy Payette, Executive Director UK Fedora Training London January 22-23, 2009.
Digital Preservation Initiatives in the United States A Summary Deanna B. Marcum.
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
Developing our Metadata: Technical Considerations & Approach Ray Plante NIST 4/14/16 NMI Registry Workshop BIPM, Paris 1 …don’t worry ;-) or How we concentrate.
1 CS 430: Information Discovery Lecture 13 Case Study: the NSDL.
Vision... “… a network of learning environments and resources for Science, Mathematics, Engineering and Technology education, will ultimately meet the.
WHAT DOES THE FUTURE HOLD? Ann Ellis Dec. 18, 2000
Metadata for the Web From Discovery to Description
An Architecture for Complex Objects and their Relationships
Outline Pursue Interoperability: Digital Libraries
OAI and Metadata Harvesting
NSDL Data Repository (NDR)
Open Archive Initiative
Presentation transcript:

Breaking down the walls Moving libraries from collectors to portals Carl Lagoze Cornell University

The Library should selectively adopt the portal model for targeted program areas. By creating links from the Library’s Web site, this approach would make available the ever-increasing body of research materials distributed across the Internet. The Library would be responsible for carefully selecting and arranging for access to licensed commercial resources for its users, but it would not house local copies of materials or assume responsibility for long-term preservation. LC21: Digital Strategy for the Library of Congress page 5

Some of the most fundamental aspects of library operations entail the existence of a border, across which objects of information are transferred and maintained. Such a parameter, demarcating a single, distributed digital library (the "control zone"), needs to be created and managed by the academic library community at the earliest opportunity. Ross Atkinson Library Quarterly, 1996 Towards a Virtual Control Zone

Why distributed collections? Scale of the Web Prevalence of new publishing models and agents Increasing complexity of licensing and access management Dynamic nature of content

Towards Hybrid Portals Traditional portal (e.g., Yahoo!) –linkage without responsibility Hybrid Portal –assertion of (some semblance) of curatorial role over linked objects

New models have cultural/organizational ramifications… Performance and ranking metrics – "bigger is better" Levels of confidence Trust

…that can be assisted by new technical foundations Digital object architectures –that enable aggregating and customizing content for local access and management Metadata frameworks –that model changes of objects and their management over time OAI Harvesting Protocol –for exchange of structured information Preservation models –that enable non-cooperative and cooperative offsite monitoring

Digital Object Architectures: aggregating & localizing distributed content Acknowledgements: – Naomi Dushay – Sandy Payette – Thorton Staples (U. Va.) – Ross Wayland (U. Va.)

From Mediators to Value-Added Surrogates Wiederhold – mediators between raw data and end-user applications for integration and transformation Paepcke – mediators as foundation for digital library interoperability Payette and Lagoze – mediators (V-A surrogates) to aggregate and create a localized service layer for distributed resources

FEDORA Digital Object Model

Establishing a Virtual Control Zone

V-A Surrogate Applications Access management –Shared responsibility among trusted partners Enhanced and customized functionality –Examples: reference linking, format translation, special needs Preservation –Monitoring "significant" events and acting on them

Context Broker A DigitalObject A Structural Characteristics Realaudio video Powerpoint presentation SMIL synchronization metadata Tool DigitalObject A: View Slides View Video View synchronized presentation using applet Tool Context Broker B DigitalObject A: Get Transcript of Audio Search for keyword Get Slides translated to French

Digital Object Structural Metadata Context Broker A Tool 8 Tool 93 Context Broker B User requests DigitalObject from ContextBroker 1 DigitalObject presented to user with contextually bound behaviors 4 Tool 5 Tool 27 Find/Retrieve tools 3 Get DigitalObject Structural Metadata 2

Where we are now… Ongoing FEDORA reference prototype – –Policy enforcement research –Content mediation Proposed joint deployment with University of Virginia –Open source scalable implementation of FEDORA architecture –Testing and deployment with a number of research library partners.

Event-Aware Metadata Frameworks: describing changes over time Acknowledgements: – Dan Brickley (ILRT, Bristol) – Martin Doer (FORTH, Crete) – Jane Hunter (DSTC, Brisbane)

Distributed Content The Metadata Challenge From fixed, contained physical artifacts to fluid, distributed digital objects Need for basis of trust and authenticity in network environment Decentralization and specialization of resource description and need for mapping formalisms

Multi-entity nature of object description Photographe r Camera type Software Computer artist

Attribute/Value approaches to metadata… Hamlet has a creator Shakespeare subjectimplied verbmetadata nounliteral Playwright metadata adjective The playwright of Hamlet was Shakespeare R1 “ Shakespeare ” “ Hamlet ” dc:creator.playwright dc:title

…run into problems for richer descriptions… Hamlet has a creator Stratford birthplace The playwright of Hamlet was Shakespeare, who was born in Stratford “ Stratford ” R1 “ Shakespeare ” dc:creator.playwright dc:creator.birthplace

…because of their failure to model entity distinctions R1 “ Stratford ” creator R2 name “ Shakespeare ” birthplace title “ Hamlet ”

ABC/Harmony Event-aware metadata model Recognizing inherent lifecycle aspects of description (esp. of digital content) Modeling incorporates time (events and situations) as first-class objects –Supplies clear attachment points for agents, roles, occurrent properties Resource description as a “story-telling” activity

Resource-centric Metadata TitleAnna Karenina AuthorLeo Tolstoy IllustratorOrest Vereisky TranslatorMargaret Wettlin Date Created1877 Date Translated1978 Description Adultery & Depression BirthplaceMoscow Birthdate1828 ?

“translator” “Margaret Wettlin” “Orest Vereisky” “illustrator” “Anna Karenina” “Tragic adultery and the search for meaningful love” “English” “author” “creation” “1877” “1978” “translation” “Russian” “Leo Tolstoy” "Moscow" “1828”

Queries over descriptive graphs List details of events where Lagoze is a participating agent SELECT ?title, ?type, ?time, ?place, ?name FROM WHERE (web::type ?event abc::Event) (abc::context ?event ?context) ….. AND ?name ~ lagoze USING web FOR Rudolf Squish –

Where we are now Stabilization of model Collaboration with museum/CIDOC community for joint modeling principles Plans –RDF api for model elements –UI for metadata creation –Query engine testing

Open Archives Initiative: facilitating exchange of structured information Acknowledgements: – Herbert Van de Sompel – OAI Steering and Technical Committees

Open Archives Initiative Testing the hypotheses –exposing metadata in various forms will facilitate creation of value-added services –key to deployable DL infrastructure is low- entry cost –Individual communities can/will customize common infrastructure

Where we’ve come from Late 1999 Santa Fe UPS meeting – increase impact of eprint initiatives through federation Santa Fe Convention – metadata harvesting among eprint archives Increasing interest outside the eprint community –Research libraries –Museums –Publishers

Progress over the past year OAI workshops at US and EC DL conferences Organizational stability –Executive committee and steering committee September 2000 technical meeting –Reframe and rethink technical solutions for broader domain Extensive testing and refinement of technical infrastructure

Technical Infrastructure – key technical features Deploy now technology – 80/20 rule Two-party model – providers and consumers Simple HTTP encoding XML schema for some degree of protocol conformance Extensibility –Multiple item-level metadata –Collection level metadata

OAI protocol requests Supporting protocol requests: Identify ListMetadataFormats ListSets Harvesting protocol requests: ListRecords ListIdentifiers GetRecord repositoryrepository harvesterharvester service providerdata provider

Where we are now “Stable” 1.0 protocol specification Hopefully, self-documenting infrastructure – 27 registered data providers Increasing number of tools available Research initiatives –NSF-funded NSDL –EC-funded Cyclades –Andrew W. Mellon service proposals –EC-funded community building

Where do we go from here Controlling the stampede Maintaining the organizational model – lean and mean while encouraging community-specific exploitation Encouraging testing especially through deployment and especially service development Encouraging metadata diversification – this isn’t just above Dublin Core!!! –Preservation –Document access –Authentication

OAI & Metadata Research Dictionary of metadata terms (Tom Baker) Mandating usage rules has only limited effectiveness Compiling usage of those terms is vital to machine understanding and interoperability –Provide context heuristics for search engine and indexer processing Large-scale deployment of OAI and web crawling enables (partial) automation of usage compilation (e.g., data mining of term usage)

Preservation Models: monitoring threats to distributed content Acknowledgements: – Bill Arms – Peter Botticelli (CUL) – Anne Kenney (CUL)

Preservation & Remote Control Organization Issues –“assured preservation” may not be possible without direct custodial control. –what are the levels of acceptability and for which types of resources? Technical Issues –what are the technologies for remote control at the various levels of assurance deemed acceptable by the library? –what is the probability of a reasonable level of preservation in the context of such technologies?

Cost vs. Functionality

Leveraging Current Work Event-based metadata Metadata harvesting Longevity and threats to digital resources

Level 0 Experiment

Level 1 Experiment

One of Six Core Integration Demonstration Projects for the NSDL

How Big might the NSDL be? The NSDL aims to be comprehensive -- all branches of science, all levels of education, very broadly defined. Five year targets: 1,000,000 different users 10,000,000digital objects 100,000independent sites Requires: low-cost, scalable, technology automated collection building and maintenance

Levels of Interoperability: Metadata Harvesting Agreements on simple protocol and metadata standard(s) Example: Metadata harvesting protocol of the Open Archives Initiative (MHP) Moderate-quality services Low cost of entry to participating sites Moderately large numbers of loosely collaborating sites Promising but still an emerging approach

Levels of Interoperability: Gathering Robots gather collections automatically with no participation from individual sites Examples: Web search services (e.g., Google) CiteSeer (a.k.a. ResearchIndex) Restricted but useful services Zero cost of entry to gathered sites Very large numbers of independent sites Only suitable for open access collections