Presentation is loading. Please wait.

Presentation is loading. Please wait.

Building Digital Libraries Made Easy: Toward Open Digital Libraries ICADL 2002 – Singapore – Dec. 2002 Edward A. Fox (with Hussein Suleman, Ming Luo)

Similar presentations


Presentation on theme: "Building Digital Libraries Made Easy: Toward Open Digital Libraries ICADL 2002 – Singapore – Dec. 2002 Edward A. Fox (with Hussein Suleman, Ming Luo)"— Presentation transcript:

1 Building Digital Libraries Made Easy: Toward Open Digital Libraries ICADL 2002 – Singapore – Dec. 2002 Edward A. Fox (with Hussein Suleman, Ming Luo) fox@vt.edu http://fox.cs.vt.edu CS DLRL Internet TIC NDLTD CITIDEL NSDL … Virginia Tech, Blacksburg, VA, USA

2 Acknowledgements (Selected) Sponsors: ACM, Adobe, DLF, IBM, Mellon Foundation, Microsoft, NSF (Grants CDA-9312611; DUE-0121741, 0136690, 0121679; IIS-0080748, 0086227, 0002935, and 9986089), OCLC, SOLINET, UNESCO, US Dept. Ed. (FIPSE), VTLS, … Faculty/Staff (now): Boots Cassel, Su-Shing Chen, Debra Dudley, Jeremy Frumkin, Joe Futrelle, Lee Giles, Martin Halbert, Rex Hartson, John Impagliazzo, Deborah Knox, JAN Lee, Kurt Maly, Gail McMillan, Eric Morgan, Manuel Perez, Muhammad Zubair, … Students: Fernando Das Neves, Marcos Goncalves, Rohit Kelapure, Aaron Krowne, Paul Mather, Ryan Richardson, Priya Shivakumar, Wensi Xi, Liang Xu, Baoping Zhang, …

3 Outline Overview, Problem Experience: Case Study Projects Open Archives Initiative Hussein Suleman Dissertation DL in a Box, OCKHAM Summary and Conclusion

4 Overview We address the problem of how to develop DLs; build on experience in building many DLs; strive for simplicity as per OCKHAM initiative; build upon the Open Archives Initiative; demonstrate our approach in diverse situations; and invite all to use DL-in-a-box and help build Open Digital Libraries.

5 Problem Why do DL developers continue to “reinvent the wheel”? The top 10 reasons are: 1.The library budget won’t allow purchase of a commercial DL system. 2.Unless the development effort is local, there won’t be any control. 3.DLs are extensions of DBMSs, so they are simple applications to develop. 4.Since DLs operate on the Web, one must adopt the newest W3C proposal.

6 Problem – cont’d 5.Since technology moves so quickly, it is essential to follow the latest fad. 6.CS students always develop from scratch. 7.This team knows it can do it better. 8.This system must have more capabilities than any other system. 9.This DL has to be more flexible and extensible. 10.This is the right system architecture – at last!

7 Outline Overview, Problem Experience: Case Study Projects Open Archives Initiative Hussein Suleman Dissertation DL in a Box, OCKHAM Summary and Conclusion

8 Experience: Case Study Projects AmericanSouth.org NDLTD CSTC JERIC CITIDEL NSDL Digital Library in a Box

9 AmericanSouth.org Domain: culture and history of the southern region of America (USA) Genre: diverse distributed collections at a dozen universities Submission & Collection: local sites  Emory University (for SOLINET)

10 Networked Digital Library of Theses and Dissertations (NDLTD) Domain: graduate education and research Genre: electronic theses and dissertations (ETDs) Submission & Collection: local sites  www.ndltd.org, www.theses.org

11 Computer Science Teaching Center (CSTC) Domain: teaching computer science Genre: courseware Submission & Collection: www.cstc.org

12 CS Teaching Center (CSTC): Lessons Learned Instead of building large, expensive multimedia packages, that become obsolete and are difficult to re-use, concentrate on small knowledge units. Learners benefit from having well-crafted modules that have been reviewed and tested. Use digital libraries to build a powerful base of support for learners, upon which a variety of courses, self-study tutorials & reference resources can be built.

13

14 Browsing (2)

15

16

17

18 ACM Journal of Educational Resources in Computing (JERIC) Domain: teaching computer science Genre: courseware, scholarly articles Submission & Collection: CSTC, ACM Digital Library

19 JERIC JER iCJournal of Educational Resources in Computing Accessible from www.cstc.org and www.acm.org and www.citidel.org ACM and SIGCSE support Refereed and interactive Part of ACM Digital Library

20 Computing and Information Technology Interactive Digital Educational Library (CITIDEL) Domain: computing / information technology Genre: one-stop-shopping for teachers & learners: courseware (CSTC, JERIC), leading DLs (ACM, IEEE-CS, DB&LP, CiteSeer), PlanetMath.org, technical reports, … Submission & Collection: sub/partner collections  www.citidel.org

21 CITIDEL Team An NSDL Collection Track project Led by Virginia Tech, with co-PIs: Fox (director, DL systems) Lee (history) Perez (user interface, Spanish support) Partners College of New Jersey (Knox) Hofstra (Impagliazzo) Villanova (Cassel) Penn State (Giles)

22 Summary of Spring 2001 Survey of CITIDEL-related Collections and their Sizes Size of Collection 1-5 items 6-100 items 101-999 items +1000 items Number of Collections Identified 100-3005020-3510-25

23 Multi-dimensional Categorization

24 CITIDEL Collection Sources metadata JERIC fulltext Experts’ finding aids IEEE-CS … include CSTCResearch Index ACM NEC’s data processed w. R.I. SIGCSE proceedings ACM DL include Borner’s info viz software repository NCSTRL

25 CITIDEL Collection Building thru aided by after using or thru using Submitting VIADUCT GetSmart Searching, Browsing Classifying Nominating Crawling Crawlifier thru Composing include after Creating include after

26 Overview of CITIDEL architecture

27 Distributed repository structure

28 Digital library architecture for local and interoperable CITIDEL services

29 National Science Digital Library (NSDL) Domain: undergraduate and K-12 education, etc. Genre: educational resources Submission & Collection: sites of 90 projects  www.nsdl.org

30 NSDL Information Architecture Developed by the Technical Infrastructure Workgroup referenced items & collections referenced items & collections Special Databases NSDL Services NSDL Services Other NSDL Services CI Services annotation CI Services discussion CI Services personalization CI Services authentication CI Services browsing Core Services: information retrieval Core Collection- Building Services harvesting Core Collection- Building Services protocols Core Services: metadata gathering Portals & Clients Portals & Clients Portals & Clients Usage Enhancement Collection Building User Interfaces NSDL Collections NSDL Collections NSDL Collections Core NSDL “Bus”

31 Digital Library in a Box Domain: helping DL projects Genre: any domain, but especially those involved in NSDL (since funded in part is through NSDL – with U. FL, NCSA) Software and Documentation: http://dlbox.nudl.org

32 Outline Overview, Problem Experience: Case Study Projects Open Archives Initiative Hussein Suleman Dissertation DL in a Box, OCKHAM Summary and Conclusion

33 Open Archives Initiative OAI www.openarchives.org openarchives@openarchives.org

34 Discovery Current Awareness Preservation Service Providers Data Providers Metadata harvesting The World According to OAI

35 Technical Umbrella for Practical Interoperability… Reference Libraries Publishers E-Print Archives …that can be exploited by different communities Museums

36 Tiered Model of Interoperability Mediator services Metadata harvesting Document models

37 OAI – Black Box Perspective OA 1OA 2OA 4OA 3OA 5OA 6OA 7 BrowseSummarizeSearchVisualize DO Services: Docs: Metadata:

38 Aggregation through OAI Harvesting ArchiveLite SitesNCSTRLEprints IEEE-CS, ACM, … Own: History, ResearchIndex, CSTC, … CITIDELActive

39 Protocol for Metadata Harvesting Service Requests Identify ListMetadataFormats ListSets GetRecord ListIdentifiers ListRecords Metadata Multiplicity Date/Time Ranges Sets (with semantics depending on local data providers) Resumption Tokens

40 NDLTD OAI Example

41 Outline Overview, Problem Experience: Case Study Projects Open Archives Initiative Hussein Suleman Dissertation DL in a Box, OCKHAM Summary and Conclusion

42 Open Digital Library (ODL) Hypothesis (Hussein Suleman) Can we leverage the successful model of the OAI Protocol for Metadata Harvesting to alleviate our architectural problems ? Maybe … if Digital Libraries can be modeled as networks of extended Open Archives, where each extended Open Archive is a source of data and/or a provider of services.

43 Example Architecture (NDLTD) Humboldt Duisburg MIT Filter MIT Browse Union Catalog SearchRecent User Interface OAI/ODL archive OAI/ODL protocol legend Virginia Tech PhysNet CalTech Dresden

44 ODL Demonstration - FrontPage

45 ODL Demonstration - Search

46 ODL Demonstration - Browse

47 Hussein Suleman’s Thesis Summary Open Digital Libraries (DLs) Open Archives Initiative (OAI) Protocol for Metadata Harvesting (PMH) Extending OAI-PMH provides the glue for building componentized DLs. Lightweight protocols connect the components to support modular systems with good efficiency.

48 Research in a Nutshell We build extensible modular systems with customizable services. This supports interoperability and allows distributed development. This is in use in www.cstc.org, AmericanSouth.org, www.citidel.org, … Components include search, browse, annotate, editorial support, union, filter, whats-new, submit, rate, recommend, …

49 1010100101 0100101010 1001010101 0101010101 Program 1010100101 0100101010 1001010101 0101010101 Document 1010100101 0100101010 1001010101 0101010101 Document 1010100101 0100101010 1001010101 0101010101 Document 1010100101 0100101010 1001010101 0101010101 Program 1010100101 0100101010 1001010101 0101010101 Program 1010100101 0100101010 1001010101 0101010101 Image 1010100101 0100101010 1001010101 0101010101 Image 1010100101 0100101010 1001010101 0101010101 Image 1010100101 0100101010 1001010101 0101010101 Video 1010100101 0100101010 1001010101 0101010101 Video 1010100101 0100101010 1001010101 0101010101 Video usersdigital objects ?

50 1010100101 0100101010 1001010101 0101010101 Program 1010100101 0100101010 1001010101 0101010101 Document 1010100101 0100101010 1001010101 0101010101 Document 1010100101 0100101010 1001010101 0101010101 Document 1010100101 0100101010 1001010101 0101010101 Program 1010100101 0100101010 1001010101 0101010101 Program 1010100101 0100101010 1001010101 0101010101 Image 1010100101 0100101010 1001010101 0101010101 Image 1010100101 0100101010 1001010101 0101010101 Image 1010100101 0100101010 1001010101 0101010101 Video 1010100101 0100101010 1001010101 0101010101 Video 1010100101 0100101010 1001010101 0101010101 Video componentized digital library ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

51 1010100101 0100101010 1001010101 0101010101 Program 1010100101 0100101010 1001010101 0101010101 Document 1010100101 0100101010 1001010101 0101010101 Document 1010100101 0100101010 1001010101 0101010101 Document 1010100101 0100101010 1001010101 0101010101 Program 1010100101 0100101010 1001010101 0101010101 Program 1010100101 0100101010 1001010101 0101010101 Image 1010100101 0100101010 1001010101 0101010101 Image 1010100101 0100101010 1001010101 0101010101 Image 1010100101 0100101010 1001010101 0101010101 Video 1010100101 0100101010 1001010101 0101010101 Video 1010100101 0100101010 1001010101 0101010101 Video open digital library OA PMH XPMH

52 ODL Component Requirements Search Retrieve a list of items Index new items Annotate Add annotation to item Retrieve a list of annotations for an item

53 Open Digital Library Components Running now XML-File (data provider from file system) Union, search, browse, recent, filter E-journal/review, Submit, Edit, Annotation Class projects High performance multilingual search Recommender, Rating; Mirroring (see JCDL’02) Working with NCSA: from DB, unstructured text Others discussed Classification/categorization DL-Viz interconnection (VIDI – Jun Wang ETD)

54 Harvest from data providers DBUnion Archive Merger Component DBBrowse Browse Engine IRDB-1 Search Engine As Metadata Search Service Provider As Metadata Browse Service Provider XML File Coll. & Data Provider 1 XML File Coll. & Data Provider 2 XML File Coll. & Data Provider 3 Open Digital Library: Extended What’s New Engine As What’s New Service Provider OAI-PMH Data Provider Submit Archive OAIB (NCSA: from RDBMS) Filter Recommend Rate Engine Annotation Engine IRDB-2 Search Engine As Annotation Search Service Provider As Recommend & Rate Service Provider

55 1010100101 0100101010 1001010101 0101010101 Program 1010100101 0100101010 1001010101 0101010101 Document 1010100101 0100101010 1001010101 0101010101 Document 1010100101 0100101010 1001010101 0101010101 ETD-1 1010100101 0100101010 1001010101 0101010101 Program 1010100101 0100101010 1001010101 0101010101 ETD-2 1010100101 0100101010 1001010101 0101010101 Image 1010100101 0100101010 1001010101 0101010101 Image 1010100101 0100101010 1001010101 0101010101 ETD-3 1010100101 0100101010 1001010101 0101010101 Video 1010100101 0100101010 1001010101 0101010101 Video 1010100101 0100101010 1001010101 0101010101 ETD-4 Digital Library for the Networked Digital Library of Theses and Dissertations (www.ndltd.org) Search Filter Union Recent Browse PMH ODLRecent ODLBrowse ODLUnion ODLSearch ODLUnion PMH USER INTERFACE Students and researchers ETD collections Example Open Digital Library

56 Digital Library for the Computer Science Teaching Center (www.cstc.org)

57 CSTC User Interface

58 Open Digital Library Component Extended OPEN ARCHIVE OPEN ARCHIVE

59 Layer 1 : OAI PMH Protocol for Metadata Harvesting Transfer stream of metadata from one archive or component to another Service Requests Identify, ListSets, ListMetadataFormats GetRecord, ListIdentifiers, ListRecords

60 Layer 2 : Extended OAI-PMH OAI-PMH + extensions for general-purpose inter-component communication Added in generic containers in every response for additional information Added “PutRecord” to submit a record Increased granularity to support times as well as dates (same as OAI-PMH v2.0) Ignored DC requirement

61 Layer 3 : ODL Protocols Specialized protocol semantics for different components, e.g.: Search component uses ODLSearch protocol ListRecords and ListIdentifiers embed query terms in “set” parameter Annotation component uses ODLAnnotate protocol ListRecords and ListIdentifiers specify the item for which annotations are requested in the “set” parameter PutRecord adds an annotation to an item

62 Performance Optimizations Caching of responses Persistent CGI mechanisms FastCGI SpeedyCGI Request multiple records in a single operation (proposed)

63 What have we accomplished ? Complete protocol-level separation among components within the DL Seamless integration with little “glue” Simple extensions of OAI-PMH Modular and portable components Efficient in speed – but not as efficient in storage

64 Outline Overview, Problem Experience: Case Study Projects Open Archives Initiative Hussein Suleman Dissertation DL in a Box, OCKHAM Summary and Conclusion

65 Digital Library In A Box http://dlbox.nudl.org Part of NSF’s National Science Digital Library (www.nsdl.org) Offers “Shrink-wrap” Open Digital Library Components – Open Source Software Users install ready-made digital library solutions, or build their own from snap- together components.

66

67 OCKHAM Simplicity (a la OCCAM’s razor) Support by Mellon and DLF Next meeting in Atlanta Jan. 8, 2003 Four main ideas: 1.Components 2.Lightweight protocols 3.Open reference models (e.g., 5S, OAIS) 4.Community perspective and involvement

68 5S Layers Societies Scenarios Spaces Structures Streams

69 Outline Overview, Problem Experience: Case Study Projects Open Archives Initiative Hussein Suleman Dissertation DL in a Box, OCKHAM Summary and Conclusion

70 It is possible to build DLs easily. The ODL approach to this has been developed and validated in a number of settings. Everyone is invited to: Use ODL components Refine or add ODL components, protocols Join ODL and OCKHAM For more information see:

71 (Somewhat) Open Issues Is this scalable? Portable ? Extensible ? Can we define all popular DL services using such a methodology? (completeness problem) Can we define DLs as configurations of ODL components? (composition problem) Is OAI-PMH a good baseline protocol ? Can we design a better baseline protocol upon which to base harvesting and repository access? To what degree is an ODL network equivalent to a monolithic system? (comparison problem)

72 Ultimate Goal Package different configurations into instant DL systems or subsystems DL building = component configuration All DLs speak the same language(s) Basic services are trivial to provide so more effort is spent on advanced capabilities of DLs

73 Selected Links CITIDEL – www.citidel.org NCSTRL – www.ncstrl.org NDLTD – www.ndltd.org NSDL – www.nsdl.org Open Archives Initiative www.openarchives.org www.openarchives.org/OAI/openarchivesprotocol.htm www.dlib.vt.edu/projects/OAI/

74 More Links Hussein Suleman’s Dissertation http://purl.org/net/hsdiss/odl.pdf Repository Explorer http://purl.org/net/oai_explorer DL Courseware – http://ei.cs.vt.edu/~dlib Virginia Tech Digital Library Research Laboratory (DLRL) – www.dlib.vt.edu Listservs dl-in-a-box-l@listserv.vt.edu ockham-sys@listserv.cc.emory.edu


Download ppt "Building Digital Libraries Made Easy: Toward Open Digital Libraries ICADL 2002 – Singapore – Dec. 2002 Edward A. Fox (with Hussein Suleman, Ming Luo)"

Similar presentations


Ads by Google