Building a large-scale digital library for education Carl Lagoze Common Solutions Group January 16, 2003.

Slides:



Advertisements
Similar presentations
Open Archives Initiative Where we are, Where we are going Carl Lagoze 4 th OAF Workshop September, 2003.
Advertisements

DELOS Highlights COSTANTINO THANOS ITALIAN NATIONAL RESEARCH COUNCIL.
EXtensible Catalog David Lindahl University of Rochester.
1. The Digital Library Challenge The Hybrid Library Today’s information resources collections are “hybrid” Combinations of - paper and digital format.
CNRIS CNRIS 2.0 Challenges for a new generation of Research Information Systems.
1 Building the NSDL William Y. Arms Cornell University Thinking aloud about the NSDL.
Building Reliable Distributed Information Spaces Carl Lagoze CS /22/2002.
Web Crawling/Collection Aggregation CS431, Spring 2004, Carl Lagoze April 5 Lecture 19.
1 DLESE in Context: Educational Computing, Digital Libraries and Scientific Education William Y. Arms Cornell University.
1 NSDL The National Science Foundation's National Digital Library for Science, Mathematics, Engineering and Technology Education [a.k.a. Smete, NSDL, Learns,...]
SCORM-NSDL Workshop May 18, Educational Materials are Scattered across the Internet NASA Math Forum State standards Scientific American Ask.
Mixed content, mixed metadata: Information discovery in the NSDL.
Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.
The Open Archives Initiative Simeon Warner (Cornell University) Symposium on “Scholarly Publishing and Archiving on the Web”, University.
The National Science Digital Library (NSDL) An Introduction and Overview ASEE 2005 Annual Conference.
Mike Smorul Saurabh Channan Digital Preservation and Archiving at the Institute for Advanced Computer Studies University of Maryland, College Park.
Institutional Repositories Tools for scholarship Mary Westell University of Calgary AMTEC Conference May 26, 2005.
1 William Y. Arms September 26, 2002 A Research Program for Information Science with the NSDL as an Example.
Corporation For National Research Initiatives NSF SMETE Library Building the SMETE Library: Getting Started William Y. Arms.
1 An introduction to the NSDL William Y. Arms Cornell University.
Data-PASS Shared Catalog Micah Altman & Jonathan Crabtree 1 Micah Altman Harvard University Archival Director, Henry A. Murray Research Archive Associate.
Digital Library Architecture and Technology
Dienst Distributed Networked Publishing Carl Lagoze Digital Library Scientist Cornell University.
Introduction to Digital Libraries hussein suleman uct cs honours 2004.
Chinese-European Workshop on Digital Preservation, Beijing July 14 – Network of Expertise in Digital Preservation 1 Trusted Digital Repositories,
W w w. i l u m i n a – d l i b. o r g iLumina: A Digital Library of Educational Resources for Science & Mathematics National Science Digital Library All-Projects.
1 The NSDL: A Case Study in Interoperability William Y. Arms Cornell University.
LIS 506 (Fall 2006) LIS 506 Information Technology Week 11: Digital Libraries & Institutional Repositories.
Preserving Digital Collections for Future Scholarship Oya Y. Rieger Cornell University
Ensemble Computing in the National Science Digital Library (NSDL)
Ms. Irene Onyancha ISTD/Library & Information Management Services United Nations Economic Commission for Africa The Second Session of the Committee on.
Creating and Operating a Digital Library for Information and Learning– the GROW Project Muniram Budhu Department of Civil Engineering & Engineering Mechanics.
Fedora Content Models for the National Science Digital Library Data Repository Fedora User’s Group Meeting Copenhagen, September 28, 2005 Carl Lagoze Cornell.
NSDL: OAI and a large- scale digital library Carl Lagoze, Cornell University NSDL Director of Technology
Themes Architecture Content Metadata Interoperability Standards Knowledge Organisation Systems Use and Users Legal and Economic Issues The Future.
PLoS ONE Application Journal Publishing System (JPS) First application built on Topaz application framework Web 2.0 –Uses a template engine to display.
Metadata Lessons Learned Katy Ginger Digital Learning Sciences University Corporation for Atmospheric Research (UCAR)
19/10/20151 Semantic WEB Scientific Data Integration Vladimir Serebryakov Computing Centre of the Russian Academy of Science Proposal: SkTech.RC/IT/Madnick.
1 CS 430 / INFO 430 Information Retrieval Lecture 24 Architecture of Information Retrieval Systems.
Semantic Web, Web Services and Museums: Mapping the Road to Implementation John Perkins “MESMUSES Workshop” Florence, June 16-17, 2003.
Enhancing Science Education through a Partnership of Digital Libraries Presentation for the Indo-US Workshop on Digital Libraries National Science Foundation,
CONTENT DISCOVERY, SERVICES, AND SUSTAINED ACCESS Timothy Cole, William Mischo, Beth Sandore, Sarah Shreeves ~ University of Illinois Library
Extending Access To Information Resource Discovery Service William E. Moen, Ph.D. Kathleen R. Murray, Ph.D. School of Library and Information Sciences.
1 A Very Large Digital Library Technology Demonstration William Y. Arms Cornell University.
Recent Developments in CLARIN-NL Jan Odijk P11 LREC, Istanbul, May 23,
1 GRID Based Federated Digital Library K. Maly, M. Zubair, V. Chilukamarri, and P. Kothari Department of Computer Science Old Dominion University February,
1 The NSDL Program Stephen Griffin National Science Foundation.
Search Interoperability, OAI, and Metadata Sarah Shreeves University of Illinois at Urbana-Champaign Basics and Beyond Grainger Engineering Library April.
How to Implement an Institutional Repository: Part II A NASIG 2006 Pre-Conference May 4, 2006 Technical Issues.
“A Library outranks any other one thing a community can do to benefit its people.” --Andrew Carnegie.
Metadata and OAI DLESE OAI Workshop April 29-30, 2002 Katy Ginger Presentation available at:
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
Metadata and OAI DLESE OAI Workshop June 29 to July 2, 2002 Katy Ginger Presentation available at:
Millman—Nov 04—1 An Update on Digital Libraries David Millman Director of Research & Development Academic Information Systems Columbia University
NSDL & Access Management David Millman Columbia University Jan ‘02.
A Resource Discovery Service for the Library of Texas Requirements, Architecture, and Interoperability Testing William E. Moen, Ph.D. Principal Investigator.
Open Archive Forum Rachel Heery UKOLN, University of Bath UKOLN is funded by Resource: The Council for Museums, Archives.
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems 1.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Fedora Commons Overview and Background Sandy Payette, Executive Director UK Fedora Training London January 22-23, 2009.
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems.
Grid Services for Digital Archive Tao-Sheng Chen Academia Sinica Computing Centre
1 CS 430: Information Discovery Lecture 13 Case Study: the NSDL.
Joseph JaJa, Mike Smorul, and Sangchul Song
NSDL: OAI and a large-scale digital library
CS 430 / INFO 430 Information Retrieval
NSDL Data Repository (NDR)
Building a large-scale digital library for education
Technical Issues in Sustainability
Presentation transcript:

Building a large-scale digital library for education Carl Lagoze Common Solutions Group January 16, 2003

What is the NSDL?  A library of exemplary collections and services with practical educational value  A center of innovation in digital libraries applied to education  A community center, focused on digital- library-enabled science education  A network of NSDL-funded projects

browsing searching annotating curriculum building filtering quality rating Building service, collaboration, and knowledge layers over a variety of resources for a variety of users Open Access Web Publishers NSF-funded Collections

1996 Vision articulated by NSF's Division of Undergraduate Education 1997 National Research Council workshop 1998 Preliminary grants through Digital Libraries Initiative SMETE-Lib workshop 1999 NSDL Solicitation Core Integration demonstration projects + 23 others funded large Core Integration System project funded 2002More than 80 independent projects funded 2003Core Integration funding fixed until 2006 Short History of the NSDL

NSF Grant Structure  Collections  Develop and maintain content  Services  For users, collection providers, core integration  Targeted research  Core Integration  Organizational, economic, technical  $US5M of total $US25M total budget

 A collaborative project University Corporation for Atmospheric Research-Dave Fulker Cornell University-William Arms Columbia University-Kate Wittenberg  With additional partners Eastern Michigan University Syracuse University U Mass-Amherst UC-Santa Barbara San Diego Supercomputer Center  Director of Technology -Carl Lagoze NSDL CI Technical Organization

It is possible to build a very large digital library with a small staff. But... Every aspect of the library must be planned with scalability in mind. Some compromises will be made. Automation is key. Core Integration Philosophy

Perspective on the Budget

Resources for Core Integration Core Integration Budget $4-6 million Staff Management Diffuse How can a small team, without direct management control, create a very large-scale digital library?

 Aggregation rather than collection  Core integration team will not manage any collections  Spectrum of interoperability  Accommodate diversity of participation models  Open interfaces and standards permitting plug in of array of value-added services  One library many portals  Accommodate multiple quality and selection metrics  Tailor presentation of content and nature of services to audience needs  Open toolkit of software and services for library building NSDL technical mantras

LevelAgreementsExample FederationStrict use of standardsAACR, MARC (syntax, semantic, Z and business) HarvestingDigital libraries exposeOpen Archives metadata; simplemetadata harvesting protocol and registry GatheringDigital libraries do not Web crawlers cooperate; services mustand search engines seek out information Spectrum of interoperability

 This is a big task that no one has done before!  Work on the priorities  Focus on one point on spectrum of interoperability  Metadata harvesting  Incorporate NSF funded collections and selected other collections  Leverage existing (or at least emerging) technologies and protocols  OAI, uPortal, Shibboleth, SDLIP, InQuery  Provide reliable base level services  Search and Discovery, Access Management, User Profiles, Exemplary Portals, Persistence  Plant some seeds for the future  Machine-assisted metadata generation  Automated collection aggregation  Web gathering strategies Translating to first release goals

 Central storage of all metadata about all resources in the NSDL  Defines the extent of NSDL collection  Metadata includes collections, items, annotations, etc.  MR main functions  Aggregation  Normalization  redistribution  Ingest of metadata by various means  Harvesting, manual, automatic, cross-walking  Open access to MR contents for service builders via OAI- PMH Metadata Repository

Metadata Strategy  Collect and redistribute any native (XML) metadata format  Provide crosswalks to Dublin Core from eight standard formats  Dublin Core, DC-GEM, LTSC (IMS), ADL (SCORM), MARC, FGCD, EAD  Concentrate on collection-level metadata  Use automatic generation to augment item- level metadata

Importing metadata into the MR Collections Harvest Staging area Cleanup and crosswalks Database load Metadata Repository

Exporting metadata from the MR

Metadata Triage

What to Index? When possible, full text indexing is excellent, but full text indexing is not possible for all materials (non-textual, no access for indexing). Comprehensive metadata is an alternative, but available for very few of the materials. What Architecture to Use? Few collections support an established search protocol (e.g., Z39.50) Searching

 Implement a query language that includes most features that are common in commercial and Web search engines.  Periodically harvest the MR (via OAI-PMH) to incorporate the latest changes in the library.  Allow search on resources’ metadata as well as textual content, when available.  Communication with portals is done via the Simple Digital Library Interoperability Protocol (SDLIP). Search system general features

Search Architecture Metadata Repository Content Portal Search Engine SDLIP Wrapper SDLIP OAI Harvester OAI Search and Discovery Server http/ftp Harvester http/ftp “Document” generator

Persistent Archive for the NSDL  Provide a persistent copy of the resources identified in the NSDL repository  Provide a mechanism to retrieve prior versions of resources  Verify availability of on-line digital resources that have presence in MR

Persistent Archive Approach  Use data grid technology to:  Implement a persistent logical name space for registering resources  Manage archiving of modules on distributed storage systems  Use OAI harvesting to extract metadata from the NSDL repository  Crawl the web to retrieve resources  Provide OAI interface for reporting validation results  Manage the persistent archive through a separate information repository

Access Management  Authentication: user identity established by origin servers at home institution—NSDL central will run an origin if no other home available  Authorization: access classes of users, collections, & services established by NSDL community  anonymous and pseudo-anonymous access available  Internet2 “Shibboleth” framework satisfies these requirements

Access Management Flow browsercollection institution’s authentication and authorization service (e.g., Kerberos & LDAP) 1. attempt to access collection 2. redirected back to local login 3. login to local jurisdiction 4. attempt access again 5. confirm request valid organizational boundary

The Problem Cannot handcraft every web page Must be usable on a very wide range of equipment and with a very diverse group of users The Solution Data driven portals using channels (components that encapsulate a library function). Current NSDL portal technology is uPortal, a free, shareable portal being developed by a college and university consortium. Initial NSDL channels will include simple and advanced Search, Browse, News, Exhibits, Help, and Login/Registration. User Interfaces

Demonstration

We have only just begun…  Funding through 2006  Provide infrastructure that both:  Advances state-of-the-art of digital libraries  Reliably delivers services and resources to targeted users  Making this possible through  Integration of work of partners (NSDL and external)  Co-development with partners  Internal development

Long-term technical capabilities: Facilities for Collaboration  All users can contribute resources to the library  Collections (favorites), value added enhancements (curricula), original contributions  Community formation, long and short term  Persistence of results of community formation

Long-term technical capabilities: Management of Entities  Resources  Services  Relationships  Users

Long-term technical capabilities: Discovery of Entities  Capabilities for humans and agents  Searching through structured queries  Browsing of indexes, vocabularies, classifications

Long-term technical capabilities: Relationship Management  Relationships are first-class objects  Annotations, collections, equivalence, inclusion  Facilities  Identification  Discovery  Persistence  Evolution  Relationships of relationships

Long-term technical capabilities: Knowledge layered on data  Ontologies, classification schemes, taxonomies, standards, and authority lists  Organize resources within concept spaces  Cross-walk and establish relationships among concept spaces

Long-term technical capabilities: Control of entities  Access management for controlling the dissemination of intellectual property.  Mechanisms controlling disclosure of information with the goal of protecting privacy (i.e. COPPA)  Mechanisms for limiting inappropriate actions and entities

Long-term technical capabilities: Customization and Personalization  Portals that provide specialized user interfaces and aggregation of collections and services in the library.  Mechanisms for users and communities to specialize their library experience.  Mechanisms to automatically adapt library behavior to user needs and abilities.

Long-term technical capabilities: Accessibility  Platform  Connectivity  Physical Ability  Language

Long-term technical capabilities: Measurement  Usage of the main NSDL portal and supported portals.  Performance of core services and network connections.  Popularity of various resources.  Reliability of access to various resources.  Data and metadata quality.  User demographics (where possible)

Realizing Goals and Capabilities: Building & supporting infrastructure  Maintain and evolve the metadata repository  Maintain and evolve the main portal  Define, disseminate and support a service integration architecture  Develop, integrate, support core services:  Search and discovery  Persistence  Metadata and data normalization & enhancement  Authentication  Annotation  Resource access

Realizing Goals and Capabilities: Defining and building exemplars  General theme: collaborative spaces for specialized communities, disciplines, resources  Motivations:  Develop real products meeting needs of real audiences  Extrapolate from special cases to general infrastructure  Build essential partnerships

Realizing Goals and Capabilities: Defining and building exemplars  Primary life science education  Eisenhower National Clearinghouse  Undergraduate math education  Math Forum  Secondary geospatial education  Alexandria digital library

How do we do this:  Constructing targeted portals/libraries  Primary life science education  Undergraduate mathematics education  Secondary geospatial education  To build generalized architecture  Collaborative spaces  Knowledge management  Automatic data and metadata management

Some Closing Thoughts  Difficulty of building stability on shifting sands  What is low-barrier infrastructure?  Barriers to ‘simple’ OAI and Dublin Core have been relatively high  Multiple problems with metadata from distributed sources  Correctness  Trust  Information content  Resource granularity and identity  Automation is the key to success