NSDL: OAI and a large-scale digital library

Slides:



Advertisements
Similar presentations
The REPOX system Nuno Freire -
Advertisements

Putting the Pieces Together Grace Agnew Slide User Description Rights Holder Authentication Rights Video Object Permission Administration.
Open Archives Initiative Where we are, Where we are going Carl Lagoze 4 th OAF Workshop September, 2003.
Heinrich Stamerjohanns Institute for Science Networking Distributed Open Archives Dr. Heinrich Stamerjohanns Institute for Science Networking at the University.
OAForum – September 2003 Muriel Foulonneau Open Archives Initiatives Protocol for Metadata Harvesting Practices for the cultural heritage sector Muriel.
DELOS Highlights COSTANTINO THANOS ITALIAN NATIONAL RESEARCH COUNCIL.
Background Chronopolis Goals Data Grid supporting a Long-term Preservation Service Data Migration Data Migration to next generation technologies Trust.
1 Building the NSDL William Y. Arms Cornell University Thinking aloud about the NSDL.
Building Reliable Distributed Information Spaces Carl Lagoze CS /22/2002.
Web Crawling/Collection Aggregation CS431, Spring 2004, Carl Lagoze April 5 Lecture 19.
1 DLESE in Context: Educational Computing, Digital Libraries and Scientific Education William Y. Arms Cornell University.
1 NSDL The National Science Foundation's National Digital Library for Science, Mathematics, Engineering and Technology Education [a.k.a. Smete, NSDL, Learns,...]
SCORM-NSDL Workshop May 18, Educational Materials are Scattered across the Internet NASA Math Forum State standards Scientific American Ask.
Mixed content, mixed metadata: Information discovery in the NSDL.
The NSDL Registry Diane Hillmann  Jon Phipps. What We’re Doing Received an NSF grant in Oct. 2006, to: Register metadata schemas, vocabularies, application.
Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.
The Open Archives Initiative Simeon Warner (Cornell University) Symposium on “Scholarly Publishing and Archiving on the Web”, University.
1 William Y. Arms September 26, 2002 A Research Program for Information Science with the NSDL as an Example.
Corporation For National Research Initiatives NSF SMETE Library Building the SMETE Library: Getting Started William Y. Arms.
1 An introduction to the NSDL William Y. Arms Cornell University.
Metadata: Its Functions in Knowledge Representation for Digital Collections 1 Summary.
Introduction to Digital Libraries hussein suleman uct cs honours 2004.
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
1 The NSDL: A Case Study in Interoperability William Y. Arms Cornell University.
Creating and Operating a Digital Library for Information and Learning– the GROW Project Muniram Budhu Department of Civil Engineering & Engineering Mechanics.
Fedora Content Models for the National Science Digital Library Data Repository Fedora User’s Group Meeting Copenhagen, September 28, 2005 Carl Lagoze Cornell.
NSDL: OAI and a large- scale digital library Carl Lagoze, Cornell University NSDL Director of Technology
Building a large-scale digital library for education Carl Lagoze Common Solutions Group January 16, 2003.
Themes Architecture Content Metadata Interoperability Standards Knowledge Organisation Systems Use and Users Legal and Economic Issues The Future.
Lifecycle Metadata for Digital Objects (INF 389K) September 18, 2006 The Big Metadata Picture, Web Access, and the W3C Context.
Metadata Lessons Learned Katy Ginger Digital Learning Sciences University Corporation for Atmospheric Research (UCAR)
1 CS 430 / INFO 430 Information Retrieval Lecture 24 Architecture of Information Retrieval Systems.
CONTENT DISCOVERY, SERVICES, AND SUSTAINED ACCESS Timothy Cole, William Mischo, Beth Sandore, Sarah Shreeves ~ University of Illinois Library
Discovery Metadata for Special Collections Concepts, Considerations, Choices William E. Moen School of Library and Information Sciences Texas Center for.
Alexandria Digital Earth ProtoType DIGITAL LIBRARIES AND ENVIRONMENTAL INFORMATION Terence R. Smith Alexandria Digital Library Project.
Extending Access To Information Resource Discovery Service William E. Moen, Ph.D. Kathleen R. Murray, Ph.D. School of Library and Information Sciences.
1 A Very Large Digital Library Technology Demonstration William Y. Arms Cornell University.
Slavic Digital Text Workshop 2006 The Open Archives Initiative Protocol for Metadata Harvesting: an Opportunity for Sharing Content in a Distributed Environment.
1 GRID Based Federated Digital Library K. Maly, M. Zubair, V. Chilukamarri, and P. Kothari Department of Computer Science Old Dominion University February,
OAI Overview DLESE OAI Workshop April 29-30, 2002 John Weatherley
Core Integration Web Services Dean Krafft, Cornell University
Integrating Access to Digital Content Sarah Shreeves University of Illinois at Urbana-Champaign Visual Resources Association 23 rd Annual Conference Miami.
1 The NSDL Program Stephen Griffin National Science Foundation.
Agenda Why discuss Digital Libraries What is a digital Library History Meta-data FEDORA NSDL D Space.
Search Interoperability, OAI, and Metadata Sarah Shreeves University of Illinois at Urbana-Champaign Basics and Beyond Grainger Engineering Library April.
“A Library outranks any other one thing a community can do to benefit its people.” --Andrew Carnegie.
Metadata and OAI DLESE OAI Workshop April 29-30, 2002 Katy Ginger Presentation available at:
Metadata and OAI DLESE OAI Workshop June 29 to July 2, 2002 Katy Ginger Presentation available at:
Millman—Nov 04—1 An Update on Digital Libraries David Millman Director of Research & Development Academic Information Systems Columbia University
NSDL & Access Management David Millman Columbia University Jan ‘02.
Digitization – Basics and Beyond workshop Interoperability of cultural and academic resources New services for digitized collections Muriel Foulonneau.
A Resource Discovery Service for the Library of Texas Requirements, Architecture, and Interoperability Testing William E. Moen, Ph.D. Principal Investigator.
Sharing Digital Scores: Will the Open Archives Initiative Protocol for Metadata Harvesting Provide the Key? Constance Mayer, Harvard University Peter Munstedt,
Open Archive Forum Rachel Heery UKOLN, University of Bath UKOLN is funded by Resource: The Council for Museums, Archives.
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems 1.
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems.
Grid Services for Digital Archive Tao-Sheng Chen Academia Sinica Computing Centre
1 CS 430: Information Discovery Lecture 13 Case Study: the NSDL.
An Overview of Data-PASS Shared Catalog
Joseph JaJa, Mike Smorul, and Sangchul Song
Overview: Fedora Architecture and Software Features
CS 430 / INFO 430 Information Retrieval
VI-SEEM Data Repository
Outline Pursue Interoperability: Digital Libraries
OAI and Metadata Harvesting
NSDL Data Repository (NDR)
Building a large-scale digital library for education
Open Archive Initiative
Institutional Repositories
Technical Issues in Sustainability
Presentation transcript:

NSDL: OAI and a large-scale digital library Carl Lagoze, Cornell University NSDL Director of Technology lagoze@cs.cornell.edu

What is the NSDL? NSF program to move science, math, engineering education in the US to digital age http://www.ehr.nsf.gov/ehr/due/programs/nsdl/ Over 80 independent grants exploring NSDL goals http://comm.nsdlib.org Focused effort to develop and model infrastructure for science education on the web. http://cinews.comm.nsdlib.org/cgi-bin/wiki.pl A production digital library http://www.nsdl.org

Short History of the NSDL 1996 Vision articulated by NSF's Division of Undergraduate Education 1997 National Research Council workshop 1998 Preliminary grants through Digital Libraries Initiative 2 1998 SMETE-Lib workshop 1999 NSDL Solicitation 2000 6 Core Integration demonstration projects + 23 others funded 2001 1 large Core Integration System project funded More than 80 independent projects funded Core Integration funding fixed until 2006

NSF Grant Structure http://www. nsf. gov/pubs/2002/nsf02054/nsf02054 Collections Develop and maintain content Services For users, collection providers, core integration Targeted research Core Integration Organizational, economic, technical $US5M of total $US25M total budget

NSDL CI Technical Organization A collaborative project University Corporation for Atmospheric Research - Dave Fulker Cornell University - William Arms Columbia University - Kate Wittenberg With additional partners Eastern Michigan University Syracuse University U Mass-Amherst UC-Santa Barbara UC-San Diego (Supercomputer Center) Director of Technology - Carl Lagoze

Building service and knowledge layers over a variety of resources for a variety of users browsing searching annotating curriculum building filtering quality rating Open Access Web Publishers NSF-funded Collections

How Big might the NSDL be? All branches of science, all levels of education, very broadly defined: Five year targets 1,000,000 different users 10,000,000 digital objects 10,000 to 100,000 independent sites

Core Integration Philosophy It is possible to build a very large digital library with a small staff. But ... Every aspect of the library must be planned with scalability in mind. Some compromises will be made.

Perspective on the Budget

Resources for Core Integration Budget $4-6 million Staff 25 - 30 Management Diffuse How can a small team, without direct management control, create a very large-scale digital library?

NSDL technical mantras Aggregation rather than collection Core integration team will not manage any collections Spectrum of interoperability Accommodate diversity of participation models Open interfaces and standards permitting plug in of array of value-added services One library many portals Accommodate multiple quality and selection metrics Tailor presentation of content and nature of services to audience needs Open toolkit of software and services for library building

Spectrum of interoperability Level Agreements Example Federation Strict use of standards AACR, MARC (syntax, semantic, Z 39.50 and business) Harvesting Digital libraries expose Open Archives metadata; simple metadata harvesting protocol and registry Gathering Digital libraries do not Web crawlers cooperate; services must and search engines seek out information

Translating to first release goals This is a big task that no one has done before! Work on the priorities Focus on one point on spectrum of interoperability Metadata harvesting Incorporate NSF funded collections and selected other collections Leverage existing (or at least emerging) technologies and protocols OAI, uPortal, Shibboleth, SDLIP, InQuery Provide reliable base level services Search and Discovery, Access Management, User Profiles, Exemplary Portals, Persistence Plant some seeds for the future Machine-assisted metadata generation Automated collection aggregation Web gathering strategies

Metadata Repository Central storage of all metadata about all resources in the NSDL Defines the extent of NSDL collection Metadata includes collections, items, annotations, etc. MR main functions Aggregation Normalization redistribution Ingest of metadata by various means Harvesting, manual, automatic, cross-walking Open access to MR contents for service builders via OAI-PMH

Metadata Strategy Collect and redistribute any native (XML) metadata format Provide crosswalks to Dublin Core from eight standard formats Dublin Core, DC-GEM, LTSC (IMS), ADL (SCORM), MARC, FGCD, EAD Concentrate on collection-level metadata Use automatic generation to augment item-level metadata

Importing metadata into the MR Collections Harvest Staging area Cleanup and crosswalks Database load Metadata Repository

Exporting metadata from the MR

Simple Metadata-Based Services: The recognition of common elements among a set of core Library services (initially Exhibits News, Annotation, Equivalence, and My Site), led the NSDL Team to create a model for the development and implementation of services that could be based on simple extensions to standard Metadata Records. Services that fit this model are known as Simple Metadata-Based Services, or SiMBaS.

SIMBaS Characteristics Services provide metadata records for harvesting by MR Metadata records may include typed relationship links to each other or to pre-existing Metadata Records in the MR. Example relationship links Collections->items. Annotation metadata record->item-level metadata record.

Searching What to Index? What Architecture to Use? When possible, full text indexing is excellent, but full text indexing is not possible for all materials (non-textual, no access for indexing). Comprehensive metadata is an alternative, but available for very few of the materials. What Architecture to Use? Few collections support an established search protocol (e.g., Z39.50)

Search system general features Implement a query language that includes most features that are common in commercial and Web search engines. Periodically harvest the MR (via OAI-PMH) to incorporate the latest changes in the library. Allow search on resources’ metadata as well as textual content, when available. Communication with portals is done via the Simple Digital Library Interoperability Protocol (SDLIP).

Search Architecture Metadata Repository Search and Discovery Server OAI OAI Harvester Portal “Document” generator Content SDLIP Wrapper http/ftp Harvester Search Engine SDLIP http/ftp

Persistent Archive for the NSDL Provide a persistent copy of the resources identified in the NSDL repository Provide a mechanism to retrieve prior versions of resources Verify availability of on-line digital resources that have presence in MR

Persistent Archive Approach Use data grid technology to: Implement a persistent logical name space for registering resources Manage archiving of modules on distributed storage systems Use OAI harvesting to extract metadata from the NSDL repository Crawl the web to retrieve resources Provide OAI interface for reporting validation results Manage the persistent archive through a separate information repository

Experience thus far OAI – low barrier? XML flakiness Sets Identifiers XML flakiness Limitations of basic Dublin Core Metadata quality and trust Resource granularity

Closing Thoughts We have only just begun! Automation is key to scalability Metadata generation Longevity/preservation Quality and selection Collection development The NSDL needs to be more that data Knowledge Curricula Community collaboration