Presentation is loading. Please wait.

Presentation is loading. Please wait.

Case Studies in the US National Science Digital Library (NSDL): DL-in-a-box, CITIDEL, OCKHAM ICADL2003, Dec, 8-11, 2003 Kuala Lumpur, Malaysia Edward A.

Similar presentations


Presentation on theme: "Case Studies in the US National Science Digital Library (NSDL): DL-in-a-box, CITIDEL, OCKHAM ICADL2003, Dec, 8-11, 2003 Kuala Lumpur, Malaysia Edward A."— Presentation transcript:

1 Case Studies in the US National Science Digital Library (NSDL): DL-in-a-box, CITIDEL, OCKHAM ICADL2003, Dec, 8-11, 2003 Kuala Lumpur, Malaysia Edward A. Fox fox@vt.edu CS / DLRL, Virginia Tech, USA http://fox.cs.vt.edu http://www.dlib.vt.edu

2 ACKNOWLEDGEMENTS Helpful sponsorship by many organizations, especially Adobe, AOL, CONACyT, DFG, FIPSE (US Dept. Education), IBM, Mellon, Microsoft, NSF (IIS-9986089, 0086227, 0080748, 0325579; DUE-0121679, 0136690, 0121741, 0333601), OCLC, SOLINET, SUN, SURA, UNESCO, VTLS, many governments (Australia, Brazil, Germany, India, …), … Colleagues at Virginia Tech (faculty, staff, students), and collaborators at many universities –Boots Cassel, Su-Shing Chen, Debra Dudley, Jeremy Frumkin, Lee Giles, Martin Halbert, Rex Hartson, JAN Lee, Kurt Maly, Gail McMillan, Eric Morgan, Manuel Perez, Layne Watson, … –Yuxin Chen, Fernando Das Neves, Marcos Goncalves, Rohit Kelapure, Aaron Krowne, Ming Luo, Paul Mather, Ryan Richardson, Rao Shen, Hussein Suleman, Wensi Xi, Baoping Zhang, Qinwei Zhu, …

3 Outline Context Digital Libraries for Education (DLE) National Science Digital Library (NSDL) OAI, ODL, DL-in-a-box OCKHAM CITIDEL (incl. GrapeZone, PIPE) Conclusions

4

5 Information Life Cycle Authoring Modifying Organizing Indexing Storing Retrieving Distributing Networking Retention / Mining Accessing Filtering Using Creating

6 Digital Libraries in Education Analytical Survey, ed. Leonid Kalinichenko © 2003, www.iite-unesco.org, info@iite.ru Transforming the Way to Learn DLs of Educational Resources & Services Integrated/Virtual Learning Environment Educational Metadata Current DLEs: US (NSDL, DLESE, CITIDEL, NDLTD), Europe (Scholnet, Cyclades), UK (Distributed National Electronic Resource)

7 Digital Libraries in Education - 2 Advanced Frameworks & Methodologies –Instructional course development with learning module repositories, Learning Object reuse –Community organization around DLEs –Other content for science and research –Cyberinfrastructure, data grids –Curriculum-based interfaces (see Krowne et al.) –Concept-based organization of learning materials and courses (CMs, ontologies)

8 DLEs: Future Vision (p. 6) Global learning environment of the future: Student-centered Interactive and dynamic Enabling group work on real world problems Enabling students to determine their own learning routes (styles, personalization) Supporting lifelong learning

9 DLEs: Objectives (p. 11) Long-range: lifelong/distance/anytime-anywhere Intermediate goals –Support for students, teachers, parents –Enhanced student performance –More students excited about science –More Internet-based science educational resources with increased quality and comprehensiveness, easy to discover and retrieve, preserved and universally available

10 DLEs: Guiding Principles (p. 12) Driven by educational and science needs Facilitating educational innovation Stable, reliable, permanent Accessible to all Leveraging prior research: DL, courseware, … Adaptable to new technologies Supporting decentralized services Resource integration thru tools/organization

11 “The network is the library.” NSDL Visioning: Learning Environments and Resources Network for STEM Education

12 NSDL Tracks include CI (Core Integration) ServicesCollections Research CITIDELGetSmart Concept Maps include supports OCKHAM P2P libraries include supports

13 Expectations of NSDL ProgramTracks Core Integration: coordinate a distributed alliance of resource collection and service providers; and ensure reliable and extensible access to and usability of the resulting network of learning environments and resources Collections: aggregate and actively manage a subset of the digital library’s content within a coherent theme / specialty Services: increase the impact, reach, efficiency, and value of the digital library in its fully operational form Targeted (Applied) Research: have immediate impact on one or more of the other three tracks

14 Collections Discovery of content Classification and cataloguing Acquisition and/or linking; referencing Disciplinary-based themes define a natural body of content, but other possibilities are also encouraged Access to massive real-time or archived datasets Software tool suites for analysis, modeling, simulation, or visualization Reviewed commentary on learning materials and pedagogy

15 Services Help services, frequently asked questions, etc. Synchronous/asynchronous collaborative learning environments using shared resources Mechanisms for building personal annotated digital information spaces Reliability testing for applets or other digital learning objects Audio, image, and video search capability Metadata system translation Community feedback mechanisms

16 NSDL Information Architecture Essentially as developed by the Technical Infrastructure Workgroup referenced items & collections referenced items & collections Special Databases NSDL Services NSDL Services Other NSDL Services CI Services annotation CI Services discussion CI Services personalization CI Services authentication CI Services browsing Core Services: information retrieval Core Collection- Building Services harvesting Core Collection- Building Services protocols Core Services: metadata gathering Portals & Clients Portals & Clients Portals & Clients Usage Enhancement Collection Building User Interfaces NSDL Collections NSDL Collections NSDL Collections Core NSDL “Bus”

17 OAI, ODL, DL-in-a-box Open Archives Initiative –since 1999, www.openarchives.org Open Digital Libraries –since 2001, from www.dlib.vt.edu –with Hussein Suleman (now U. Capetown) DL-in-a-box –NSDL support since 2001 –Aimed to help new collections / services projects –http://dlbox.nudl.org

18 Open Archives Initiative (OAI) Advocacy for interoperability Standard for transferring metadata among digital libraries –Protocol for Metadata Harvesting (PMH) Simplicity Generality Extensibility Support for PMH => Open Archive (OA)

19 OAI = Technical Umbrella for Practical Interoperability… Reference Libraries Publishers E-Print Archives …that can be exploited by different communities Museums

20 OAI – Repository Perspective Required: Protocol DO MDO

21 OAI – Black Box Perspective OA 1OA 2 OA 4 OA 3 OA 5OA 6OA 7

22 Tiered Model of Interoperability Mediator services Metadata harvesting Document models

23 Discovery Current Awareness Preservation Service Providers Data Providers Metadata harvesting The World According to OAI

24 1010100101 0100101010 1001010101 0101010101 Program 1010100101 0100101010 1001010101 0101010101 Document 1010100101 0100101010 1001010101 0101010101 Document 1010100101 0100101010 1001010101 0101010101 Document 1010100101 0100101010 1001010101 0101010101 Program 1010100101 0100101010 1001010101 0101010101 Program 1010100101 0100101010 1001010101 0101010101 Image 1010100101 0100101010 1001010101 0101010101 Image 1010100101 0100101010 1001010101 0101010101 Image 1010100101 0100101010 1001010101 0101010101 Video 1010100101 0100101010 1001010101 0101010101 Video 1010100101 0100101010 1001010101 0101010101 Video usersdigital objects ?

25 ? 101010010 101001010 101001010 101010101 0101 Program 101010010 101001010 101001010 101010101 0101 Document 101010010 101001010 101001010 101010101 0101 Document 101010010 101001010 101001010 101010101 0101 Document 101010010 101001010 101001010 101010101 0101 Program 101010010 101001010 101001010 101010101 0101 Program 101010010 101001010 101001010 101010101 0101 Image 101010010 101001010 101001010 101010101 0101 Image 101010010 101001010 101001010 101010101 0101 Image 101010010 101001010 101001010 101010101 0101 Video 101010010 101001010 101001010 101010101 0101 Video 101010010 101001010 101001010 101010101 0101 Video ? digital library Monolithic and/or Custom-built web-based application

26 101010010 101001010 101001010 101010101 0101 Program 101010010 101001010 101001010 101010101 0101 Document 101010010 101001010 101001010 101010101 0101 Document 101010010 101001010 101001010 101010101 0101 Document 101010010 101001010 101001010 101010101 0101 Program 101010010 101001010 101001010 101010101 0101 Program 101010010 101001010 101001010 101010101 0101 Image 101010010 101001010 101001010 101010101 0101 Image 101010010 101001010 101001010 101010101 0101 Image 101010010 101001010 101001010 101010101 0101 Video 101010010 101001010 101001010 101010101 0101 Video 101010010 101001010 101001010 101010101 0101 Video componentized digital library ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

27 101010010 101001010 101001010 101010101 0101 Program 101010010 101001010 101001010 101010101 0101 Document 101010010 101001010 101001010 101010101 0101 Document 101010010 101001010 101001010 101010101 0101 Document 101010010 101001010 101001010 101010101 0101 Program 101010010 101001010 101001010 101010101 0101 Program 101010010 101001010 101001010 101010101 0101 Image 101010010 101001010 101001010 101010101 0101 Image 101010010 101001010 101001010 101010101 0101 Image 101010010 101001010 101001010 101010101 0101 Video 101010010 101001010 101001010 101010101 0101 Video 101010010 101001010 101001010 101010101 0101 Video open digital library OA PMH XPMH

28 Open Digital Library Protocol Extended OAI-PMH Protocol for Metadata Harvesting

29 Open Digital Library Component Extended OPEN ARCHIVE OPEN ARCHIVE

30 Open Digital Library Deployments NDLTD (www.ndltd.org) Computer Science Teaching Center (www.cstc.org) Computing and Information Technology Interactive Digital Educational Library (www.citidel.org) Open Archives Distributed (NSF, DFG) – enhancements to PhysNet OCKHAM Open to others through DL-in-a-box

31 Open Digital Library Network of Extended Open Archives where each node acts as either a provider of data, services or both. Component = Node Protocol = Arc

32 Open Digital Library Components Running now –XML-File (data provider from file system) –Search: simple or in-memory (Essex) or generalized –Union, browse, recent, filter –E-journal/review, Submit, Edit, Annotation –Recommender, Rating; Mirroring (see JCDL’02) –Working with NCSA: from DB, unstructured text Others in process –Classification/categorization –Registry (and other connections with web services)

33 101010010 101001010 101001010 101010101 0101 Program 101010010 101001010 101001010 101010101 0101 Document 101010010 101001010 101001010 101010101 0101 Document 101010010 101001010 101001010 101010101 0101 ETD-1 101010010 101001010 101001010 101010101 0101 Program 101010010 101001010 101001010 101010101 0101 ETD-2 101010010 101001010 101001010 101010101 0101 Image 101010010 101001010 101001010 101010101 0101 Image 101010010 101001010 101001010 101010101 0101 ETD-3 101010010 101001010 101001010 101010101 0101 Video 101010010 101001010 101001010 101010101 0101 Video 101010010 101001010 101001010 101010101 0101 ETD-4 ETD DL for the Networked Digital Library of Theses and Dissertations (www.ndltd.org) Search Filter Union Recent Browse PMH ODLRecent ODLBrowse ODLUnion ODLSearch ODLUnion PMH USER INTERFACE Students and researchers ETD collections Example Open Digital Library

34 Harvest from data providers DBUnion Archive Merger Component DBBrowse Browse Engine IRDB-1 Search Engine As Metadata Search Service Provider As Metadata Browse Service Provider XML File Coll. & Data Provider 1 XML File Coll. & Data Provider 2 XML File Coll. & Data Provider 3 Open Digital Library: Extended What’s New Engine As What’s New Service Provider OAI-PMH Data Provider Submit Archive OAIB (NCSA: from RDBMS) Filter Recommend Rate Engine Annotation Engine IRDB-2 Search Engine As Annotation Search Service Provider As Recommend & Rate Service Provider

35 New ODL Component: Generalized Search Platform CS6604 Client: Patrick Fan, Wensi Xi Group Member: Ming Luo, Rui Yang, Xiaoyan Yu

36 Introduction Background –The importance of search service in a digital library –Problems of search engines in DLRL IRDBLow search effectiveness, insufficient parsing component ESSEXLess scalability due to in- memory Index MARIANLow search efficiency

37 Algorithms Phrase Searching Algorithms –Adjacency of terms Ranking Functions –Okapi (baseline) –GP-based ranking function

38 Genetic Programming (GP) A problem solving system designed based on principles of evolution and heredity

39 An Example of GP-based RF (log (+ (* df (log (log (* (* (/ n df) (* (* (/ n df) (* (* df_max_Col tf) (+ df_max_Col tf_avg))) (* (/ tf tf_max) (log tf_avg_Col)))) (* (/ (* (* (/ n df) (* (* df_max_Col tf) (+ df_max_Col tf_avg))) (* (/ tf tf_max) (log tf_avg_Col))) (+ (* length df) tf_avg_Col)) (log tf_avg_Col)))))) (+ (* (* df_max_Col tf) (/ (* (* (/ (/ (* tf 6.720) (/ df N)) (* df_max_Col tf)) (* (* tf N) (+ df_max_Col tf_avg))) (* (/ tf tf_max) (log tf_avg_Col))) (+ (* length df) (* (* (/ tf tf_max) (+ (* length df) (* 2.812 1))) tf_avg)))) (+ (/ df tf_avg) tf))))

40 Parser Flexibility –TREC Style SGML/HTML –Configurable tagging Abbreviation and number detection Case sensitive Phrase parsing

41 Interface –(I) 1. Receive user query 2. Send query to search engine 3. Get ranked list 4. Search database 5. Get document information 6. Return results to user Servlet Socket JDBCJDBC 1 6 Database 4 5 Search Engine 23

42 Interface –(II) 1. Receive user query thru ODL’s XOAI searching protocol 2. Send query to search engine 3. Get ranked list 4. Request metadata 5. Get metadata 6. Return results in format complying with ODL’s searching protocol Perl Adaptor Socket 1 6 OAI data provider 4 5 Search Engine 23 As an ODL component

43 OCKHAM Initiative, Contact Info Supported by DL Federation, Mellon, NSF, … P2P University Network involving: Emory, Notre Dame, U. Arizona, Virginia Tech, … PI: Martin Halbert Phone 404-727-2204 Email: mhalber@emory.edu OCKHAM URL: http://ockham.library.emory.edu

44 The Problem Digital library development is complex and expensive. Various DL development communities (in the USA at least) are not working together well. Results exhibit much incompatibility, little common practice, slow progress, and no leverage on investment. If this continues, we are just going to languish and fester.

45 Lightweight Protocols “Lightweight”, or relatively small and simple protocols seem to have clear advantages over “Full” protocols that attempt to be comprehensive. Successes of protocols considered lightweight is illuminating. Examples: TCP/IP, HTTP, LDAP, and the OAI PMH

46 Reference Models Reference Model: a common vocabulary and description of components, services, and inter-relationships that comprise a system under consideration Useful as a tool to foster consensus and common understanding in a time of rapid change and/or disagreement Explored in CS6604 class project with 2 focus groups: librarians, education experts

47 Current Focus: Peer-to-Peer (P2P) Lightweight (Protocol) Reference Models Builds on successful example of the OAI PMH, clearly understood minimalist concept of metadata distribution, implemented in simple protocols (e.g., ODL) Leads to developing simple reference models of specific subsystems, with associated simple protocols and standards Testing in NSDL, connecting university libraries to support teaching & learning

48 OCKHAM Proposed Services Alerting Browsing Cataloging Conversion OAI – Z39.50 Pathfinding Registry – prototype in CS6604 now (plus others such as from adapted ODL)

49 Computing and Information Technology Interactive Digital Educational Library Technical Development Content Collection Edward Fox (director) John A. N. Lee Manuel Pérez-Quiñones Community Development John Impagliazzo Assessment Lillian Cassel Search Engines C. Lee Giles CSTC Deborah Knox http://www.citidel.org/

50 CITIDEL -> NSDL CITIDEL is a collection project in the: US National STEM (science, technolgy, engineering, and mathematics) education Digital Library – NSDL (www.nsdl.org)

51 Multi-dimensional Categorization

52 CITIDEL: Computing & Information Technology Interactive Digital Education Library

53

54

55

56

57 CITIDEL Technology Features Component architecture (Open Digital Library) Re-use and compose re-deployable digital library components. Built Using Open Standards & Technologies OAI: Used to collect DL Resources and DL Interoperability XSL and XML: Interface rendering with multi-lingual community based translation of screens and content (Spanish, …) Perl: Component Integration ESSEX: Search Engine Functionality Very fast, utilizing in-memory processing Includes snap-shots for persistence Multi-scheming Integrates multiple classifications / views through maps, closure

58 Programming Team DL Project Logan Hanks, logan@vt.edu Mike Scarborough, mscarbor@vt.edu Stafford Fuller, stfuller@vt.edu Problem Description: VT has multiple programming teams, and has sent a team to the ACM world finals every year for the past decade. Each week during the semester, the teams practice using a problem set from a past regional or international contest. Each practice generates multiple solutions for each problem. What is needed is a digital library to collect these solutions and serve as a reference.

59 Programming Team DL Project Deliverables: Problem statement and solution importer/archiver. Classification framework for problems and solutions. Search engine for the DL to locate problems and solutions by their relevance to a set of classes given as input. Web interface for browsing problems and solutions as well as accessing all of the above deliverables. Integration with CITIDEL. Requirements: Importing and classifying problem statements and solutions. Solutions should be classified based on what algorithms and methods they use and what problems they solve. Interface for browsing problem statements and their solutions. A search engine for finding problem statements or solutions based on their classifications.

60 Searching CITIDEL searching, which is driven by the ESSEX search engine for relevance computation (fast, in-RAM processing with checkpoints), also provides a list of relevant categories within the classification schemes. Browsing and Searching with Filters Users are placed in chosen sub-communities. They can filter results based on these sub-communities. Also there is further customization. Alternatively, users may view all results. Users may set up multiple filters for simple or complex filtering based on many factors such as education level, role, resource type, language, source, and much more. This allows users to get exactly what they are or are not looking for in the digital library. At any time, users are free to disable these filters or see results excluded by them.

61

62 Enjoy in GrapeZone Derived from Carrot 2 project (http://www.cs.put.poznan.pl/dweiss/carrot/index.php/ind ex.xml?lang=en) Online Grape Cluster search results from CITIDEL Offline Grape Cluster a static collection

63 Cluster search results from CITIDEL

64 Cluster a member collection (from a content source) in CITIDEL The Computer Science Teaching Center (CSTC) NDLTD-Computing ACM Digital Library …

65 Cluster CSTC

66 Cluster NDLTD-Computing

67 Cluster ACM

68 MOCA Algorithm

69 PIPE: Personalization by Partial Evaluation Interactions at existing web sites are predefined by the site designer Personalization is achieved by the designer’s anticipation of users’ expectations PIPE allows automatic personalization of a web site without designer anticipation –Recognized with the 2001 New Century Technology Council Innovation award

70 CITIDEL + PIPE Adds Interaction Personalization to CITIDEL Automatically handles multi-modal conversion to Cell phone, PDA, Etc. Can be adopted to any digital data set, only requires XML file of content with hierarchy maintained.

71 PIPE provides Mixed-Initiative Interaction Involves an extra specification window (e.g., a toolbar) system-initiated + user-initiated modes of interaction Traditional browser: the user merely clicks on available hyperlinks. PIPE window: the user can type in any information out-of-turn Can also mix-n-match

72 Features of PIPE Applicable to many information system technologies web sites (even third-party) Digital Libraries (currently working on CITIDEL integration) voice-activated systems (e.g., pizza ordering, movie information, and flight reservation services) PIPE is available for licensing and is ready for commercialization, through VTIP PIPE has been featured in IEEE Internet Computing, IEEE IT Professional, and the Appian Web Personalization Report.

73 PIPE system architecture

74 Conclusions UNESCO analytical survey: DLE in every nation NSDL as an example,; case studies inside it OAI -> ODL -> DL-in-a-box -> OCKHAM as framework for collaboration on services CITIDEL to highlight NSDL collection efforts –Many sources for computing resources –Software deployed from above efforts, refined, and then the results made available for reuse –Even class projects can lead to useful DL components!


Download ppt "Case Studies in the US National Science Digital Library (NSDL): DL-in-a-box, CITIDEL, OCKHAM ICADL2003, Dec, 8-11, 2003 Kuala Lumpur, Malaysia Edward A."

Similar presentations


Ads by Google