Presentation is loading. Please wait.

Presentation is loading. Please wait.

Overview of Digital Libraries: From Requirements to Theory to System to Projects JCDL 2003 – Houston, TX, USA Tutorial – May 27, 2003 Edward A. Fox

Similar presentations


Presentation on theme: "Overview of Digital Libraries: From Requirements to Theory to System to Projects JCDL 2003 – Houston, TX, USA Tutorial – May 27, 2003 Edward A. Fox"— Presentation transcript:

1 Overview of Digital Libraries: From Requirements to Theory to System to Projects JCDL 2003 – Houston, TX, USA Tutorial – May 27, 2003 Edward A. Fox fox@vt.edu http://fox.cs.vt.edu CS DLRL Internet TIC Virginia Tech, Blacksburg, VA, USA

2 Acknowledgements (Selected) Sponsors: ACM, Adobe, IBM, Microsoft, NLM, NSF, OCLC, SOLINET, SURA, US Dept. of Ed. (FIPSE), … VT Faculty/Staff: Marc Abrams, Tony Atkins, Debra Dudley, John Eaton, H. Rex Hartson, Deborah Hix, JAN Lee, Mann-Ho Lee, Gail McMillan, James Powell, Naren Ramakrishnan, Shalini Urs, … VT Students: Fernando Das Neves, Robert France, Marcos Goncalves, Neill Kipp, Paul Mather, Ryan Richardson, Rao Shen, Ohm Sornil, Hussein Suleman, Wensi Xi, Ye Zhou…

3 Outline Virginia Tech context Why DLs? What are DLs? (5S theory) Case Study: CSTC -> CITIDEL-> NSDL Case Study: NDLTD Accessibility and Visualization DL Software: MARIAN Interoperability: OAI, ODL Topical Outline Selected Links

4 Outline Virginia Tech context Why DLs? What are DLs? (5S theory) Case Study: CSTC -> CITIDEL-> NSDL Case Study: NDLTD Accessibility and Visualization DL Software: MARIAN Interoperability: OAI, ODL Topical Outline Selected Links

5 Virginia Tech Background Largest university in Virginia, land-grant, football, town population 35K plus 26K students Blacksburg Electronic Village, since 1992, with > 80% of community on Internet Net.Work.Virginia, with sites for education, research, government LMDS, Local Multipoint Distribution Service, gigabit wireless networking - 1/3 of Virginia Math Emporium, 500 workstations Faculty Development Initiative, round 3 Torgersen Hall, $30M Advanced Communications and Information Technology Center, with DLRL

6 Internet Technology Innovation Center Supported by Virginia’s Center for Innovative Technology Statewide University Partners - Governing Board: Christopher Newport University William Winter, William Muir, Virginia Electronic Commerce Technology Center / Southeastern Virginia Network (VECTEC/SEVAnet) George Mason University Steven Ruth, International Center for Applied Studies in IT (ICASIT) Old Dominion University – Kurt Maly (CS Head), … University of Virginia Alf Weaver, Internet Commerce Group (InterCom) Jim French, Internet Digital Library Virginia Tech Edward Fox, Digital Library Research Laboratory (DLRL), CC, CS Scott Midkiff, Center for Wireless Telecomm. (CWT), VTISC, ECpE

7 ITIC @ VT Research Areas Collaboration (e.g., group decision support) Community networking (e.g., BEV) Internet access (e.g., statewide network) Information services (e.g., digital libraries) Modeling and simulation (e.g., Web traffic) Usability (e.g., human factors engineering) Virtual environments (e.g., CAVE, visualization)

8 Digital Libraries --- Virginia Tech MARIAN (NLM, NSF) CS DL Prototype - ENVISION (NSF, ACM) TULIP (Elsevier, OCLC) BEV History Base (NSF, Blacksburg) DL for CS Education - EI (NSF, ACM) WATERS, NCSTRL (NSF) NDLTD (SURA, US Dept. of Education, NSF) CSTC (NSF, ACM), CRIM (NSF, SIGMM) WCA (Log) Repository (W3C) VT-PetaPlex-1 (Knowledge Systems) NSDL (NSF): CITIDEL, DL-in-a-Box, GetSmart AmericanSouth.Org (Mellon)

9 DL Examples IBM Digital Library Virtua (www.vtls.com) Greenstone (www.greenstone.org) Eprints (www.eprints.org) Many systems in NSF DLI projects VT systems: MARIAN, CSTC, NDLTD Work on ODL, DL-in-a-box, CITIDEL, NCSTRL

10 Outline Virginia Tech context Why DLs? What are DLs? (5S theory) Case Study: CSTC -> CITIDEL-> NSDL Case Study: NDLTD Accessibility and Visualization DL Software: MARIAN Interoperability: OAI, ODL Topical Outline Selected Links

11 Digital Libraries SGML (1985) PDF (1992) NSF DLI (1994) Library Cancellations (1988) University Scholarly Electronic Pub. (1988) Info. Literacy (1995) Improving Education Internet (1984) WWW (1994) Multimedia (1986)

12 Synchronous Scholarly Communication Same time, Same or different place

13 Asynchronous, Digital Library Mediated Scholarly Communication Different time and/or place

14 Borgman et al.: Workshop Report on Social Aspects of Digital Libraries: http://www-lis.gseis. ucla.edu/DL/ Information Life Cycle

15 Information Life Cycle Authoring Modifying Organizing Indexing Storing Retrieving Distributing Networking Retention / Mining Accessing Filtering Using Creating

16 Computing (flops) Digital content Communicat i ons (bandwidth, connectivity) Locating Digital Libraries in Computing and Communications Technology Space Digital Libraries technology trajectory: intellectual access to globally distributed information lessmore

17

18 Integrated CCLINC Translingual Information System DARPA Extraction What is the north korean movement in the front line? CCLINC SERVER Info Detection Summarization It seems that North Korea launch a missile again After North Korea launched a Daipodong missile last month, NK is perceived to proceed to an additional test launch. Korea, US and Japan enter into an alert state, and prepare for a joint response policy. Korea estimates that the additional launch will be on 09/05. Japan estimates that NK’s missile range is short. US information says that there is no sign of launch yet. Translation What is the status of nk missile launch against japan? BugHanI IlBonE Ddo MiSaIlEul BalSaHan Deus HaDa 2-way Speech Transation

19 Structured Video Browser (making video into hypermedia) www.learn.umd.edu IBrowse Expository multimedia Narrative Structures

20 ICU Information and Communication University Users Web Search Engines WWW Servlet Engine Web Server OS DB Search Server Servlet MPEG-7 Description Module 1 2 3 4 5 3’ 4’ 5’ MPEG-7 Image Library Systems Tech. t MPEG-7 Image Library Systems

21 t MPEG-7 Video Library Systems Tech. ICU Information and Communication University MPEG-7 Video Library Systems Tech. Video Data Description Generator Description Schemes Design Tool Description Scheme Meta Database Video Database Retrieval Server Module Player Presentation Module Architecture

22 About enumerate founders know numbers, computers, and the Web. They saw the Web’s potential to solve a big problem: Working with numeric information is too hard!

23 How does enumerate help? RDL Value - What is the number? Format - $100 [in thousands] Semantics - How it translates or corresponds to other formats Provenance – created by whom and when? Measure - scale about the number: Units: feet, meters, $, pounds, RBI Magnitude: thousands, millions, billions Modifiers: number been manipulated? Structure - relationship of numbers to each other

24 The enumerate formula

25 enumerate’s interactive data can be analyzed instantly

26 Standards Protocols/federation Z39.50, CIMI Dienst, NCSTRL OAI protocol Metadata TEI: inline, detailed (structure in stream) MARC: two-level, fine-grained Dublin Core: high-level, 15 elements RDF: describing resources/collections, annotation OAMS -> DC and others used in OAI

27 AmericanSouth.Org – Roles, Content SOLINETLibraries (Data Providers)Scholars Intellectual Organization Controlled vocabulary Metadata extension development Collection Decisions Selection Criteria Controlled vocabulary Central Server MaintenanceLocal Server MaintenanceProvision of Context Metadata RepositoryMetadata Creation/MaintenanceOrganizational Structure and Annotation Tools Central Interface Design/MaintenanceLocal Interface Design/MaintenanceSelection of Other Annotation Tools Central Indices Creation/MaintenanceLocal IndicesSelection of Thesauri Coordination of Metadata Gateway Development Gateway ImplementationConcept Mapping Digital Objects

28 Content Area DescriptionAudioDigitalFinding Aid MSSOtherPhotoVideoMFPrintTotal African-American cultural life64694123101872 Agricultural crisis of late 19 th century113114819 Codification of segregation laws13211816 Configuration of white supremacy13331920 Cultural values and activities31517415152071 Disenfranchising movements122121615 Educational movements61118621352798 Emergence of Holiness & Pentecostal Groups111710 Emergence of new musical forms311128 Emergence of organized groups expressing farmers concerns 221813 Expansion of Southern evangelical Protestant Churches 31939112359

29 Content Area DescriptionAudioDigitalFinding Aid MSSOtherPhotoVideoMFPrintTotal Expansion of industrial activity61251051452 Forms of inter-racialism11121410 Great Migration & its relationship to worsened race relations in the South 33 Growth of business151211351552 Growth of cities & towns11512413121857 Interplay of economic interest among regions114121616 Local literature312174733168 Lost Cause monument movement3238 Political relationships between Populist & other groups 12249

30 Content Area DescriptionAudioDigitalFinding Aid MSSOtherPhotoVideoMFPrintTotal Popular magazines & newspapers221131735 Reactions of African-American leaders to Segregation 212412111024 Relationship among Southern Populists & those in the West 11 Relationship between new racial system of 1890s and other 241815 Role of immigration112642925 Survival of African-American communities & Culture 22157121333 Women’s Groups21101514933 Total Each Format411451161381331379301831

31 Digital Libraries Shorten the Chain from Editor Publisher A&I Consolidator Library Reviewer

32 DLs Shorten the Chain to Author Reader Digital Library Editor Reviewer Teacher Learner Librarian

33 Digital Libraries --- Objectives World Lit.: 24hr / 7day / from desktop Integrated “super” information systems: 5S: streams, structures, spaces, scenarios, societies Ubiquitous, Higher Quality, Lower Cost Education, Knowledge Sharing, Discovery Disintermediation -> Collaboration Universities Reclaim Property Interactive Courseware, Student Works Scalable, Sustainable, Usable, Useful

34 Benefits Ease of use Effectiveness “The benefits of digital libraries will not be appreciated unless they are easy to use effectively.” - IITA Workshop report

35 DLs: Why of Global Interest? National projects can preserve antiquities and heritage: cultural, historical, linguistic, scholarly Knowledge and information are essential to economic and technological growth, education DL - a domain for international collaboration wherein all can contribute and benefit which leverages investment in networking which provides useful content on Internet & WWW which will tie nations and peoples together more strongly and through deeper understanding

36 Application Domain Related InstitutionsExamples Technical ChallengesBenefit / Impact Publishing Publishers, Eprint archives OAI Quality control, opennessAggregation, organization Education Schools, colleges, universities NSDL, NCSTRL Knowledge management, reuseability Access to data Art, CultureMuseumAMICO, PRDLA Digitization, describing, catalogingGlobal understanding Science Government, Academia, Commerce NVO, PDG, SwissProt, UK eScience,European Union Commission Data models reproducibility, faster reuse, faster advance (e) Government Government Agencies (all levels) Census Intellectual property rights, privacy, multi-national Accountability, homeland security (e) Commerce, (e) Industry Legal institutionsCourt cases, patents Developing standardsStandardization, economic development History, Heritage FoundationsAmerican Memory Content, context, interpretation Long term view, perspective, documentation, recording, facilitating, interpretation, understanding Cross- cutting Library, Archive Web, personal collections Multi-language, preservation, scalability, interoperability, dynamic behavior, workflow, sustainability, ontologies, distributed data, infrastructure Reduced cost, increased access, pereservation, democratization, leveling, peace, competitiveness Reagan MooreEd FoxReagan MooreEd Fox June 2002for NSFJune 2002for NSF

37 Libraries of the Future JCR Licklider, 1965, MIT Press World Nation State City Community

38 DL Challenges Preservation - so people with trust DLs Supporting infrastructure - networks,... Scalability, sustainability, interoperability DL industry - critical mass by covering libraries, archives, museums, corporate info, govt info, personal info - “quality WWW” integrating IR, HT, MM,... Need tools & methods to make them easier to build

39 DL Examples IBM Digital Library Virtua (www.vtlc.com) Greenstone (www.greenstone.org) Eprints (www.eprints.org) Many systems in NSF DLI projects VT systems: MARIAN, CSTC, NDLTD Work on ODL, DL-in-a-box, CITIDEL, NCSTRL

40 Definitions Library ++ (library+archive+museum+…) Distributed information system + organization + effective interface User community + collection + services Digital objects, repositories, IPR management, handles, indexes, federated search, hyperbase, annotation

41 DL Services/Activities Taxonomy (Gonçalves) Browsing Collaborating Customizing Filtering Providing access Recommending Requesting Searching Visualizing Annotating Classifying Clustering Evaluating Extracting Indexing Measuring Publicizing Rating Reviewing (peer) Surveying Translating (language) Conserving Converting Copying/Replicating Emulating Renewing Translating (format) Acquiring Cataloging Crawling (focused) Describing Digitizing Federating Harvesting Purchasing Submitting PreservationalCreational Add Value Repository-Building Information Satisfaction Services Infrastructure Services

42 Definition: Digital Libraries are complex systems that help satisfy info needs of users (societies) provide info services (scenarios) organize info in usable ways (structures) present info in usable ways (spaces) communicate info with users (streams)

43 5S Layers Societies Scenarios Spaces Structures Streams

44 5S Model: Examples, Objectives ModelsExamplesObjectives Stream Text; video; audio; imageDescribes properties of the DL content such as encoding and language for textual material or particular forms of multimedia data Structures Collection; catalog; hypertext; document; metadata; organization tools Specifies organizational aspects of the DL content Spatial Measure; measurable, topological, vector, probabilistic Defines logical and presentational views of several DL components Scenarios Searching, browsing, recommending, Details the behavior of DL services Societies Service managers, learners, Teachers, etc. Defines managers, responsible for running DL services; actors, that use those services; and relationships among them

45 5S Model: Definitions 5SDefinition Streams Sequences of elements of an arbitrary type Structures Labeled directed graphs Spatial Sets and operations on those sets Scenarios Sequences of events that modify states of a computation in order to accomplish some functional requirement. Societies Sets of communities and relationships among them

46 Overview of 5S and DL formal definitions and compositions (Gonçalves)

47 Semantic relationships among DL concepts: Partial concept map (Gonçalves)

48 5S Framework and DL Development (Gonçalves)

49 5SL: Stream Model: ETD example text/xml UTF-8 application/pdf ENG...

50 5SL: Scenario Model Example - ETD Submission ETDReviewerETDWorkflowManager Repository Login(password) CheckSubmittedETDs ETDList Identifier get(Identifier, Submission) ETDReviewPage CheckETDFiles ETD GraduateSchool *GetFeesInfo FeesInfo [decision=accept] add(ETD, ETDCollection) [decision=reject] communicateProblem [while reviewNextETD=True] Accepted Rejected getDecision decision FeesInfo

51 Basic elements of DL services definition

52 5SLGen: Automatic DL Generation

53 MARIAN DL Generation MARIAN Digital Library Generator 5SL Design XML PARSERS: DOM, SAX MARIAN API Component Pool Class managers LoaderUser interfaces Indexing Classes Resource Manager Configuration and Processing Classes

54 MARIAN DL Generation Statistics Code Generation NDLTD Union ArchiveGenerated Lines of code Indexing Classes 154 Class Managers and ClassIDs 342 Collection Loader and Handler 361 Document presentation and User Interfaces 800

55 5SLGen for ODL

56 Information Life Cycle (plus quality dimensions from 5S perspective – Gonçalves)

57 Overview of 5SGraph Workspace (instance model) Structured toolbox (metamodel)

58

59 DL Standardized Log Format- Design 5SDefinitionUse in Log Design Streams Represent static and dynamic multimedia content Temporal events, types of digital objects Structures Labeled directed graphs; provide organization within the DL Structured documents and metadata; structured searches, collection, metadata catalog; hypertext, classification scheme Spaces Sets, properties and operations on those sets Retrieval mode, Presentation information, Scenarios sequences of events that modify states of a computation in order to accomplish some functional requirement. Organization of the user and system actions into transactions, statements, events and actions; DL services as sets of scenarios. Societies Sets of communities and relationships among them User information

60 The Digital Library Standardized Log Format (cont.) Specification Collection of extensive, flat set of attributes query event registering transaction session error browse actiontimestamp Machine information help search update Sorting rule search catalog collection Result cutoff response

61 The Digital Library Standardized Log Format - Structure Top Level Hierarchy Log Log Entry Transaction SessionId MachineInfo TimeStamp Statement...

62 The Digital Library Standardized Log Format – Structure (cont.) Decomposition of statement into different types AdmInfo Statement SessionInfo Event ErrorInfo HelpInfo RegisterInfo

63 AdmInfo Statement SessionInfo Event ErrorInfo HelpInfo RegisterInfo Action StatusInfo SearchBrowseStoreSysInfo Update The Digital Library Standardized Log Format – Structure (cont.) Decomposition of event

64 The Digital Library Standardized Log Format – Structure (cont.) Search Attributes Search QueryString TimeFrame PresentationInfo SearchBy FormatNumberOfResultsSortBy CutOff Collection Catalog

65 DL Log Tool and Implementation

66 Creating the Clickstream Stats Visualizer (GUI) Visualizations Use Activity File User 4532; 25 Logons, 22 Logoffs, 3 accesses from.edu, 2 hits from.gov, history:: Logon page [13 may 2003, 16:00] -> Browse page [13 may 2003, 16:02] -> search page [13 may 2003, 16:04] -> results page [13 may 2003, 16:04] -> view document 254 page [13 may 2003, 16:07] -> download page [13 may 2003, 16:10] -> logoff page [13 may 2003, 16:11] User 4555432; 3 Logons, 0 Logoffs, 1 accesses from.mil, 2 hits from.com, history: Logon page [12 may 2003, 12:00] -> Browse page [12 may 2003, 12:02] -> logoff page [12 may 2003, 12:03] Etc. etc. etc Clickstream stat generator Clickstream stats Step 1: intermediate Statistics Files are Used as input Step 2: Clickstream Data is produced Step 3: GUI is used To produce usage Statistics, clickstream Stat visual aids, etc. Step 1: The user activity file is input into a clickstream stat generator. Step 2: The clickstream generator produces aggregate statistics (i.e., average transition time from browse to results page, the average time spent on each page, etc). Step 3: The clickstream stats are output in both text and visual format. For example, the average path through a website can be displayed, and each page of a website can include hit statistics and time-on-page statistics.

67 OCKHAM Simplicity (a la OCCAM’s razor) Support by Mellon and DLF Next meeting in Atlanta Jan. 8, 2003 Four main ideas: 1.Components 2.Lightweight protocols 3.Open reference models (e.g., 5S, OAIS) 4.Community perspective and involvement

68 Problem Why do DL developers continue to “reinvent the wheel”? The top 10 reasons are: 1.The library budget won’t allow purchase of a commercial DL system. 2.Unless the development effort is local, there won’t be any control. 3.DLs are extensions of DBMSs, so they are simple applications to develop. 4.Since DLs operate on the Web, one must adopt the newest W3C proposal.

69 Problem – cont’d 5.Since technology moves so quickly, it is essential to follow the latest fad. 6.CS students always develop from scratch. 7.This team knows it can do it better. 8.This system must have more capabilities than any other system. 9.This DL has to be more flexible and extensible. 10.This is the right system architecture – at last!

70 Problem Approach We address the problem of how to develop DLs; build on experience in building many DLs; strive for simplicity as per OCKHAM initiative; build upon the Open Archives Initiative; demonstrate our approach in diverse situations; and invite all to use DL-in-a-box and help build Open Digital Libraries.

71 Outline Virginia Tech context Why DLs? What are DLs? (5S theory) Case Study: CSTC -> CITIDEL-> NSDL Case Study: NDLTD Accessibility and Visualization DL Software: MARIAN Interoperability: OAI, ODL Topical Outline Selected Links

72 CS -> CSTC -> CRIM NSF and ACM Education Committee are funding a 2 year project “A Computer Science Teaching Center” - CSTC - http://www.cstc.org/ College of NJ, U. Ill. Springfield, Virginia Tech Focus initially on labs, visualization, multimedia Multimedia part is also supported by a 2nd grant to Virginia Tech and The George Washington University: http://www.cstc.org/~crim/ (with curricular guidelines also under development)

73 CS Teaching Center (CSTC) Instead of building large, expensive multimedia packages, that become obsolete and are difficult to re-use, concentrate on small knowledge units. Learners benefit from having well-crafted modules that have been reviewed and tested. Use digital libraries to build a powerful base of support for learners, upon which a variety of courses, self-study tutorials & reference resources can be built. ACM support led to Journal of Educational Resources in Computing (JERIC), accessible from www.cstc.org

74

75 Browsing (1)

76 Browsing (2)

77

78

79

80 SMETE Library -> NSDL (from www.dlib.org to NSF DLI-2) Context: Global movement toward Digital Libraries (see April 1998 CACM) NSF 00-44 effort: Science, Mathematics, Engineering, and Technology Education Digital Library (focussed on undergraduates) 3 workshops, yearly increasing funds / new calls NSDL will operate as a distributed federation, with separate parts for each key discipline, and should lead to a global effort.

81 Selected NSDL Early Projects/Topics COLLEGIS Res. Inst.IMS, CS, Math, Viz., … Columbia UniversityEarth sciences Stanford UniversityMedicine (images) U. California BerkeleyEngineering University of MarylandK-12 education U. Texas at AustinPhysical anthropology

82 Computing and Information Technology Interactive Digital Educational Library (CITIDEL) Domain: computing / information technology Genre: one-stop-shopping for teachers & learners: courseware (CSTC, JERIC), leading DLs (ACM, IEEE-CS, DB&LP, CiteSeer), PlanetMath.org, NCSTRL (technical reports), … Submission & Collection: sub/partner collections  www.citidel.org

83 www.CITIDEL.org Led by Virginia Tech, with co-PIs: Fox (director, DL systems) Lee (history) Perez (user interface, Spanish support) Partners College of New Jersey (Knox) Hofstra (Impagliazzo) Villanova (Cassel) Penn State (Giles)

84 Overview of CITIDEL architecture

85 Distributed repository structure

86 Digital library architecture for local and interoperable CITIDEL services

87

88

89 CITIDEL: Computing & Information Technology Interactive Digital Educational Library Web Page: www.citidel.org Future Developments Expanding further to cover Information Technology Advanced searching and additional services Expansion of the collection from many sources Collaborative Internationalization and Translation System Assessment and evaluation Workshops Technology Features Component architecture (Open Digital Library) Re-use and compose re-deployable digital library components. Built Using Open Standards & Technologies XSL and XML: Interface Rendering Perl: Component Integration ESSEX: Search Engine Functionality Open Archives Initiative Used to collect DL Resources and DL Interoperability User Features Very large collection Over 400,000 Resources from ACM, SIGCSE, JERIC, CSTC, NDLTD, NCSTRL, DBLP Filtered browsing and searching Filters based on these user-selected sub-communities. Also allows customization in addition to views of all results. Multiclassification browser Supports browsing based on curricula (familiar, professional society approved) in computing and related disciplines, as well as on classification schemes. Activity collection creation & tools Faculty and students can extract resource references from CITIDEL search collections into learning activity templates, for sharing and interchange (with versioning). VIADUCT assists in the development of a totally independent, self-generated, educational resource collection within CITIDEL. IA VT is based on Utah State’s Instructional Architect. 1. The core of CITIDEL is the collection data support. This consists centrally of a union catalog, metadata cache, semantic links table, integration tables, and more. 2. The harvesting system populates the union catalog and the secondary tables from the contents of remote digital library collections, over Open Archives. 3.For collections which lack an Open Archive provider, ad hoc importing facilities must be constructed. 4. CITIDEL serves up the contents of its union catalog via an Open Archives data provider, giving other digital libraries (NSDL) access to CITIDEL's metadata. 5. The application layer data support consists of non-content-related tables and personalization tables, such as a table of users and preferences. 6. The filtering system relies on extensive database support for speed. 7. The service modules tackle the DL features of search engine, recombination into annotated and enriched lists, creation of pedagogical activities utilizing DL resources, and posting messages to DL resources. 8. The CITIDEL application ties it all together in a single user interface. Most presentation (but not all) is handled here. Browsing and Searching with Filters Users are placed in chosen sub-communities. They can filter results based on these sub-communities. Also there is further customization. Alternatively, users may view all results. Users may set up multiple filters for simple or complex filtering based on many factors such as education level, role, resource type, language, source, and much more. This allows users to get exactly what they are or are not looking for in the digital library. At any time, users are free to disable these filters or see results excluded by them. Multiclassification Browser The multiclassication browser allows users to browse through the CITIDEL collections based on professional society approved curricula in computing as well as classification schemes. As users span many disciplines related to computing, the users may browse within the scheme with which they are most familiar. Resources are cross-classified wherever possible through these schemes. The current schemes include the 2001 ACM/IEEE-CS Computing Curricula, the 1998 ACM Computing Classification System, the Computing Research Repository Subject Areas, and the 2000 AMS Mathematics Subject Classification. Searching CITIDEL searching, which is driven by the ESSEX search engine for relevance computation, also provides a list of relevant categories within the classification schemes (see sidebar, left). CITIDEL Front Page

90 CITIDEL -> NSDL A collection project in the National STEM (science, technolgy, engineering, and mathematics) education Digital Library – NSDL -> LEARNS

91 National Science Digital Library (NSDL) Domain: undergraduate and K-12 education, etc. Genre: educational resources Submission & Collection: sites of 90 projects  www.nsdl.org

92 Advancing Education Community Building Digital Libraries Educational Resources Sharing through supported by

93 NSDL Information Architecture Essentially as developed by the Technical Infrastructure Workgroup referenced items & collections referenced items & collections Special Databases NSDL Services NSDL Services Other NSDL Services CI Services annotation CI Services discussion CI Services personalization CI Services authentication CI Services browsing Core Services: information retrieval Core Collection- Building Services harvesting Core Collection- Building Services protocols Core Services: metadata gathering Portals & Clients Portals & Clients Portals & Clients Usage Enhancement Collection Building User Interfaces NSDL Collections NSDL Collections NSDL Collections Core NSDL “Bus”

94 “The network is the library.” A Learning Environments and Resources Network for SMET Education (LEARNS)

95 LEARNS Connects: Users: students, educators, life-long learners Content: structured learning materials; large real-time or archived datasets; audio, images, animations; primary sources; digital learning objects (e.g. applets); interactive (virtual, remote) laboratories;... Tools: search; refer; validate; integrate; create; customize; publish; share; notify; collaborate;...

96 LEARNS Supports: Users Content Tools (profiles) (metadata) (protocols) Learning communities Customizable collections Application services

97 LEARNS Enables: Environments for Communication Collaboration Creation Validation Evaluation Recognition... Discovery Stability Reliability Reusability Interoperability Customizability... of Resources AND

98 Goal Core Integration Track (FY00 pilots, FY01 full) Collections Track Services Track Targeted Research Track LEARNS operational by 2002

99 Expectations of NSDL ProgramTracks Core Integration: coordinate a distributed alliance of resource collection and service providers; and ensure reliable and extensible access to and usability of the resulting network of learning environments and resources Collections: aggregate and actively manage a subset of the digital library’s content within a coherent theme / specialty Services: increase the impact, reach, efficiency, and value of the digital library in its fully operational form Targeted (Applied) Research: have immediate impact on one or more of the other three tracks

100 Collections Discovery of content Classification and cataloguing Acquisition and/or linking; referencing Disciplinary-based themes define a natural body of content, but other possibilities are also encouraged Access to massive real-time or archived datasets Software tool suites for analysis, modeling, simulation, or visualization Reviewed commentary on learning materials and pedagogy

101 Services Help services, frequently asked questions, etc. Synchronous/asynchronous collaborative learning environments using shared resources Mechanisms for building personal annotated digital information spaces Reliability testing for applets or other digital learning objects Audio, image, and video search capability Metadata system translation Community feedback mechanisms

102

103 Outline Virginia Tech context Why DLs? What are DLs? (5S theory) Case Study: CSTC -> CITIDEL-> NSDL Case Study: NDLTD Accessibility and Visualization DL Software: MARIAN Interoperability: OAI, ODL Topical Outline Selected Links

104 A Digital Library Case Study Domain: graduate education, research Genre:ETDs=electronic theses & dissertations Submission: http://etd.vt.edu Collection: http://www.theses.org Project: Networked Digital Library of Theses & Dissertations (NDLTD) http://www.ndltd.org

105 Alphabet Soup - Factoring NDLTD = ND LTD (Paul Mather – from UK) NDLTD = NDL TD (Edie Rasmussen) (Later, Networked University Digital Library = NUDL

106 The Networked Digital Library of Theses and Dissertations www.NDLTD.org Leader of the Worldwide ETD (Electronic Thesis and Dissertation) Initiative Training Authors Expanding Access Preserving Knowledge Improving Graduate Education Enhancing Scholarly Communication Empowering Students & Universities

107 Grad Program IT Ed. (Tech) Library NDLTD

108 Media ETD Web Site http://www.ndltd.org/ ETDs Got Your Interest? Graduate Students Singapore AM Chronicle of Higher Ed. National Public Radio NY Times... U. Laval

109 Key Ideas: Networked infrastructure Scalability Education is the rationale University collaboration Workflow, automation Authors must submit Maximal Access PDF, SGML, MM, MARC, DC, URNs, Federated search Standards 8th graders vs. grads

110 What are the long term goals? 400K US students / year getting grad degrees are exposed / involved 200K/yr rich hypermedia ETDs that may turn into electronic portfolios (images, video, audio, …) Dramatic increase in knowledge sharing: literature reviews, bibliographies, … Services providing lifelong access for students: browse, search, prior searches, citation links Hundreds/thousands of downloads / year / work

111 ETDs: Library Goals Improve library services Better turn-around time Always available Reduce work catalog from e-text eliminate handling: mailing to UMI, bindery prep, check-out, check-in, reshelving, etc. Save space

112 Record all work with NDLTD, return to prior situation, prepare bibliography Powerful (multilingual, text, image) searching, browsing (with categories), following citation links Support collaboration with others in same field: help with literature review, sharing tools and data sets, applying their methods Grad Student Workstation?

113 Aiding universities to enhance grad educ., publishing and IPR efforts: to help improve the availability and content of theses and dissertations Educating ALL future scholars so they can publish electronically and effectively use digital libraries (i.e., are Information Literate and can be more expressive) Demonstrating how for other organizations What are we doing?

114 NDLTD Computer Resources Research Literature Student Prepares Thesis/Dissertation

115 Student Defends & Finalizes ETD My Thesis ETD

116 Student Gets Committee Signatures and Submits ETD Signed Grad School

117 Graduate School Approves ETD, Student is Graduated Ph.D.

118 Library Catalogs ETD, Access is Opened to the New Research WWW NDLTD

119 Available at VT Information http://scholar.lib.vt.edu/theses Automated submission system ready for customization http://scholar.lib.vt.edu/ETD-db/ Student guidelines, training materials, FAQ's, multimedia educational materials http://etd.vt.edu NDLTD: Network educational institutions Annual conferences: Berlin 2003, U of Kentucky 2004 http://www.ndltd.org

120 ETDs at Virginia Tech Partnership: Library, Graduate School, and Faculty Approved by university governance- Mar.1996 Full implementation- Jan.1997 Web submission Students: http://etd.vt.edu Programmers: http://scholar.lib.vt.edu/ETD-db/ Workshops for students (and faculty) Over 5000 ETDs approved

121 How are ETDs managed? Graduate student creates ETD Word processor, multimedia Saves as PDF, usually Graduate student submits ETD Directly to library server/permanent archive Archiving fee replaces binding fee Graduate School approves E-mails author, advisor, UMI (VT scripts) Authors/advisors prescribe Internet access Library catalogs and archives UMI downloads

122 Archiving ETDs Every 15 minutes back-ups made of not- yet-approved submissions Hourly back-ups of newly approved ETDs Weekly back-ups of entire ETD collection Copies stored on-site and off-site

123 VT ETD Cataloging same as current cataloging policies, except: author-assigned keywords (not LCSH) generic (not LC) call no. fields/subfields as required for computer files full abstracts time savings cataloger familiar with computer files equipment, software for word processing 5 minutes avg. (10-15 minutes for paper TDs)

124 Library Costs $12/vol. for paper thesis processing catalog, bind, security strip, label, shelve @950 vols./yr. = $11,466 $3.20/vol. ETD processing cataloging @950 vols./yr. = $3040 $.07/vol. shelving $.04/vol. circulation

125 Costs/Savings at VT Graduate School stopped shipping to the library 3000 copies of paper TDs/year Library stopped binding, shelving, and circulating 3000 copies of TDs/year 166 ft of shelf space saved/year by the library VT used existing equipment in Library (vs. start-up costs for staff, hardware and software from from a zero-base estimate: $65,000 – see http://scholar.lib.vt.edu/theses/)

126 Popular Works 1996 458 Seevers, Gary L. Identification of Criteria for Delivery of Theological Education Through Distance Education: An International Delphi Study (Ph.D., Educational Research and Evaluation, April 1993; 1353Kb) 432 Hohauser, Robyn Lisa. The Social Construction of Technology: The Case of LSD (MS in Science and Technology Studies, Feb. 1995; 244Kb) 390 Childress, Vincent William. The Effects of Technology Education, Science, and Mathematics Integration Upon Eighth Grader's Technological Problem-Solving Ability (Ph.D. in Vocational and Technical Education, July 1994; 285Kb) 310 Kuhn, William B. Design of Integrated, Low Power, Radio Receivers in BiCMOS Technologies (Ph.D. in Electrical Engineering, Dec. 1995; 2Mb) 287 Sprague, Milo D. A High Performance DSP Based System Architecture for Motor Drive Control ( MS in Electrical Engineering, May 1993; 878Kb) 165 Wallace, Richard A. Regional Differences in the Treatment of Karl Marx by the Founders of American Academic Sociology (MS in Sociology, Nov. 1993; 479Kb) 150 McKeel, Scott Andrew. Numerical Simulation of the Transition Region in Hypersonic Flow (Ph.D. in Aerospace Engineering, Feb. 1996; 3Mb)

127 Popular Works 1997 9920 Liu, Xiangdong. Analysis and Reduction of Moire Patterns in Scanned Halftone Pictures (Ph.D. in Computer Science, May 1996; 6.6Mb) 7656 Petrus, Paul. Novel Adaptive Array Algorithms and Their Impact on Cellular System Capacity (Ph.D. in Electrical Engineering, March 1997; 5Mb) 2781 Agnes, Gregory Stephen. Performance of Nonlinear Mechanical, Resonant-Shunted Piezoelectric, and Electronic Vibration Absorbers for Multi-Degree-of-Freedom Structures (Ph.D. in Engineering Mechanics, Sept. 1997; ? + 7926Kb) 2492 Gonzalez, Reinaldo J. Raman, Infrared, X-ray, and EELS Studies of Nanophase Titania (Ph.D. in Physics, July 1996; 4607Kb) 1877 Shih, Po-Jen. On-Line Consolidation of Thermoplastic Composites (Ph.D. in Engineering Mechanics, Feb. 1997; 3.3Mb) 1791 Saldanha, Kevin J. Performance Evaluation of DECT in Different Radio Environments (MS in Electrical Engineering, Aug. 1996; 3.2Mb) 1431 DeVaux, David. A Tutorial on Authorware (MS in CS, April 1996; 2.3Mb) 1394 Kuhn, William B. Design of Integrated, Low Power, Radio Receivers in BiCMOS Technologies (Ph.D. in Electrical Engineering, Dec. 1995; 2518Kb)

128 ETD Benefits: Low margin, high use Incorporate ETDs with other digital library activities Ejournals, online class materials, digital images, etc. Additional equipment, staff may not be necessary http://scholar.lib.vt.edu/theses/data/setup.html Use VT programs, scripts, etc. http://scholar.lib.vt.edu/ETD-db/ Online accesses vs. circulation of copies VT theses 1990-1994, combined average circulation per copy: 2.24/yr VT dissertations 1990-1994, combined average circulation per copy: 3.2/yr

129 Access to VT’s ETDs http://scholar.lib.vt.edu/theses/

130 Why are ETDs so popular? User surveys 67% found VT ETDs easily 61% found them by searching 22% browsed by department 16% browsed by author 53% downloaded 1 or more ETDs Author surveys Conversion and submission processes less difficult than anticipated Over half plan to publish articles from their ETDs Why did they restrict access? http://lumiere.lib.vt.edu/surveys/

131 http://scholar.lib.vt.edu/theses/available/etd-2227102539751141/

132

133 Brief History of ETD Meetings 1987 mtg in Ann Arbor: UMI, VT, … 1992 mtg in Washington: CNI, CGS, UMI, VT and 10 universities with 3 reps each 1993 mtg in Atlanta to start Monticello Electronic Library (regional, US Southeast): SURA, SOLINET 1994 mtg at VT: std: PDF + SGML + multimedia objects 1996 funding by SURA, US Dept. of Education (FIPSE) 1997 meetings in UK, Germany,... 1998 – 1 st symposium – Memphis (20) 1999 – 2 nd symposium – Blacksburg (70) 2000 – 3 rd symposium – St. Petersburg (225) 2001 – 4 th symposium – Caltech (200) 2002 – 5 th syposium – BYU, Provo, Utah 2003 – 6 th syposium – Berlin (215) 2004 – 7 th syposium – U. Kentucky 2005 – 8 th syposium – Sydney, Australia

134 NDLTD Membership As of 5/17/2003 there were at least: 176 members, including: 155 individual universities 6 consortia 21 institutional members

135 National / Regional Projects Australia U. New South Wales (lead) U. of Melbourne U. of Queensland U. of Sydney Australian National U. Curtin U. of Technology Griffith U. Belgium Brazil Germany Humboldt University (lead) 3 other universities 5 learned societies: Math, Physics, Chemistry, Sociology, Education 1 computing center 2 major libraries India Lithuania Spain: Consorci de Biblioteques Universitàries de Catalunya, as group, www.cbuc.es: 9 sites Sudan UK (British Library, JISC, Edinburgh) UNESCO (especially Latin America, Eastern Europe, Africa) USA: CIC (“Big 10”) Ohio: OhioLINK: 79 colleges/univs SOLINET …

136 OhioLINK Statewide Consortium Represents 79 colleges, universities, libraries Public Universities Private Universities and Colleges 2-Year Colleges Only a few (e.g., Miami U. of Ohio) are also NDLTD members on their own

137 US University Members Air University (Alabama) Baylor University Boston University Brigham Young University Caltech Clemson University College of William & Mary Concordia University (Illinois) Drexel University – required 4/2002 East Carolina University East Tenn. State U. – required 1/2001 Florida Institute of Technology Florida International University Florida State University Florida Tech George Washington University Georgetown University Johns Hopkins University Louisiana State University – required 1/2002 Marshall University (W. Va.) Miami University of Ohio Michigan Tech Mississippi State University MIT Montana State University Naval Postgraduate School (CA) New Jersey Inst. of Technology New Mexico Tech North Carolina State University – required 9/2002 Northwestern University Penn. State University Regis University Rochester Institute of Tech. Texas A&M U. of Central Florida U. of Colorado Health Science Center U. of Florida – required 8/2001 U. of Georgia – required 9/2001 U. of Hawaii, Manoa U. of Illinois, Urbana-Champaign U. of Iowa U. of Kentucky – required in CS only U. of Maine – required in CS, Spatial Info Sci/Eng U. of Missouri-Columbia U. of North Texas – required since 8/99 U. of Oklahoma U. of Nevada, Las Vegas U. of New Orleans U. of North Texas – required 8/1999 U. of Oklahoma U. of Pittsburgh U. of Rochester U. of South Florida – required 8/2002 U. of Tennessee, Knoxville U. of Tennessee, Memphis U. of Texas at Austin – required 6/2001 U. of Virginia – required 1/2003 U. of West Florida U. of Wisconsin - Madison – part reqt 12/1999 Vanderbilt U. Virginia Commonwealth U. Virginia Tech - required 1/97 Wake Forest U. West Virginia U. - required 8/1998 Western Kentucky U. – required 9/2004 Western Michigan U. Worcester Polytechnic Inst. – required 7/2002 Yale U.

138 Other Countries (selected) Australia Belgium Brazil Canada Chile China, Hong Kong Columbia Finland France Germany Greece India Italy Jamaica Korea Lithuania Mexico Netherland Norway Poland Russia Singapore S. Africa S. Korea Spain Sudan Sweden Taiwan Thailand UK Venezuela

139 Institutional Members Australian Digital Theses Program British Library Cinemedia Coalition for Networked Information (CNI) Committee on Institutional Cooperation (CIC) Consorci de Biblioteques Universitàries de Catalunya Diplomica.com Dissertation.com Dissertationen Online (Germany) ETDweb, a Division of Answer4.com Ibero-American Science & Technology Education Consortium (ISTEC) MathDISS International National Documentation Centre (NDC), Greece National Library of Canada National Library of Portugal OCLC Online Computer Library Center Office of Scientific and Technical Info (US Dept of Energy) OhioLINK Organization of American States (SEDI/OAS) Southeastern Library Network (SOLINET) Sudanese National Electronic Library UNESCO (www.unesco.org/webworld/etd)

140 UNESCO and ETDs Promoting the use of the Internet as a tool for disseminating scientific knowledge Facilitating the transfer of ETD expertise from developed to developing countries 1998: Member of the NDLTD Steering Committee 1999: First UNESCO ETD meeting on ETD internationalisation 2002: “ UNESCO Guide to Electronic Theses and Dissertations ” 2003: Model training programmes and training courses 2003: Sponsor pilot projects 2003: Pilot projects (Africa, Europe, Latin-America)

141 For professional societies Like “writing across the curriculum”, e.g., Chemical Markup Language, MathML, … Besides writing: computing/communications, information literacy, personal digital library management, tool use, research methods, collaboration, archiving/preservation Data sets, communities of users of them Classification systems / browsing / searching NRC’s “Issues for Science and Engineering Researchers in the Digital Age”, 57 pages

142 Relationship with publishers Concern of faculty and students that still wish to publish books or journal articles, voiced: campus, Chronicle, NPR, Times Solution: Approval Form gives students, faculty choices on access, when to change access condition; use IPR controls in DL Solution: by case, work with publishers and publisher associations to increase access AAP, AAUP AAAS, ACM, ACS, Elsevier,...

143 Some responses from publishers ACM: need to acknowledge copyright Elsevier: need to acknowledge copyright IEEE-CS: endorse initiative ACS: After first publication, can release Textbook publishers: different market, manuscript significantly reworked General: restricting access to local campus will not cause any problems

144 How does this relate to ProQuest/UMI? Generally, they are independent decisions. 1987 UMI workshop was first to explore ETDs. UMI wrote support letter for US Dept. of Ed. proposal. UMI is on Board of Directors (formerly Steering Committee). ProQuest Direct pilot of scanning works started 1/1/97, with free 2 yr access to front part. We are collaborating on: accepting electronic author submissions standards (e.g., representation)

145 ETD Initiative (and UMI) Students Learn about DL, EPub TDs become more expressive N. Amer. (T)Ds are accessible, archived Global TDs become more accessible, archived UMI Universities

146 User Search Support (multilingual, XML) Note: All groups shown are connected with NDLTD.

147 www.theses.org James Powell student project, D-Lib Magazine description in Sept. 1998 XML description of each site type of search engine / service language coverage (for resource discovery) Adding Z39.50 gateway capability and integrating with MARIAN, along with Harvest and Open Archives protocols

148 Access Approaches Goal: Maximize access and services, e.g., by encouraging: UMI centralized services VTLS: free union collection of ETD vmetadata OCLC: free union collection of TD metadata Distributed service: Dienst, Z39.50 Regional services (e.g., OhioLinkh) Local servers with browse, search From local catalogs to local archives WWW robot indexing and search services

149 Access Possibilities Web search engines library catalog clients www. theses. org www. openarchives. org 3 rd Party Services (e.g., UMI) Virginia Tech National Library of Portugal CBUC (Spain) Ohio Link MITNational Projects: AU, GE, …

150 Why might a university want to be involved? To improve graduate education / better prepare your students / increase their knowledge (epub, DLs, IPR) and visibility To enhance university infrastructure (DL) To unlock university information To save money for students and for the university / improve workflow To build an important digital library (of ETDs)

151 NDLTD Members and ETD-MS NDLTD members will Share metadata for their ETDs Providing that in either ETD-MS Or if they use a version of MARC locally, work to have that eventually shared in either MARC21 or UNIMARC Run OAI, either locally or in consortia, so their metadata can be harvested, according to necessary terms and conditions

152 Complex to Simple MARC ($50)Dublin Core (DC) + thesis

153 ETD-MS ETD Metadata Standard XML-encoded metadata standard (content and encoding) for Electronic Theses and Dissertations (ETDs) in part conforming to Dublin Core (DC) using RDF using UNICODE Will specify relationship with MARC

154 ETD-MS Schema Includes Elements not in dces (Dublin Core Element Set) e.g., thesis.degree Elements with wildly divergent semantics e.g., thesis.advisor rather than dc.contributor Relationships to other elements Controlled vocabularies e.g., {Bachelors, Masters, Doctorate, Other} for thesis.degree.level Labels in multiple languages

155 ETD Encoding Decisions Text UNICODE (with language identifiers) Structure MARC (MARC-21 or UNIMARC) PLUS XML / RDF / DC + ETD Multimedia Following international standards Other schemes may not be amenable to preservation

156 RDF for ETDs WWW Consortium (W3C)’s RDF: Resource Description Framework NUDL ETD metadata realized as an RDF application profile Specifying elements from DC element set Plus new elements from a registered ETD schema Constraints & policies attached to both (e.g., “Full title,” “Name as it appears on title page,” “Repeatable”) Links to authority records encoded as URIs XML syntax as per RDF standard

157 OCLC and ETD-MS Identify TDs in WorldCat (4.3M) Through OAI make available metadata for WorldCat TDs in both DC and ETD-MS Provide an authority service for personal names for NDLTD Coordinate with other authority services such as LC

158 VTLS and ETD-MS Support NDLTD through a union catalog service implemented with Virtua Accept metadata in MARC21 or UNIMARC, and help identify other converters for other types Accept metadata in one other format, namely ETD-MS, collected using OAI (harvesting) Accept data in various character sets, with UNICODE preferred, but in some cases the submitter may be required to convert

159 Union Catalog (with Vinod Chachra, Thom Hickey)

160 NDLTD Union Catalog Statistics 1. Participating Countries So far ETDs from 7 countries are included in the database. Canada Germany Greece Korea Portugal Spain U.S. UK to be added by June 30, 2002. Brazil to be added soon.

161 NDLTD Union Catalog Statistics 2. Interface Languages in Union Catalog The language here is the language of the interface The VTLS NDLTD Union Catalog has 14 languages: English, Arabic, Catalan, Chinese French, German, Hebrew, Korean Polish, Portuguese, Russian, Slovak Spanish and Swedish Example follows

162 German

163 NDLTD Union Catalog Statistics 3. Languages in the Union Catalog The language here is the language of the content of ETD The VTLS NDLTD Union Catalog has data in 6 different languages. These are: English German Greek Korean Portuguese Spanish Examples follow

164 Language = German; hits = 137

165 Full record display

166 Language = Greek

167 In Greek In English

168 Union Catalog Creation

169 NDLTD Union Catalog Architecture TD OAI Repository ETD OAI Repository WorldCat VT ODL Demo Search/Browse Virtua Union Catalog email FTP OAI-PMH 20+ sites OCLC VTLS SRU/SRW (search) Try: Z39.50 harvest

170 OCLC Capabilities Harvesting OAI-PMH versions 1.1 and 2.0 Harvestable sets Sets by institution Searching SRU (Z39.50 on the Web) VTLS Virginia Tech Open Digital Library demo Unicode support

171 OCLC Statistics 19 Sources 61,998 records Probably some overlap Adding 1-2 new sites/month

172 Multiple objectives Sharing research results Decrease costs, increase services Increase knowledge of users Adding to author knowledge/skills Epub, DL, IPR Enhancing organization’s infrastructure CS department, library University, Laboratory

173 Some Barriers at Universities Lethargy; Not invented here (esp. large univ’s) Anger with unfunded, added, required work Last straw: using more frustrating technology Lack of experience in working together: graduate school, library, computing staff Lack of interest in (quality of) student work More loyalty to discipline than to campus Unwillingness to accept responsibility for $ problems with libraries, publishers

174 How can a university get involved? Select planning/implementation team Graduate School Library Computing / Information Technology Institutional Research / Educ. Tech. Fill in online form, giving us contact names www.ndltd.org/join Adapt Virginia Tech (or other) solution Build interest and consensus Start trial / allow optional submission

175 Contact Our Project Team E-mail etd@ndltd.org Phone Call Visit Video Tape

176 Convene Local Planning Group ETD

177 Build Local ETD Site Digital Library Policies Inspection/Approval Workshop/Training ETD

178 Support Offered Software, documentation, tech support Email, listservs (etd-l@listserv.vt.edu) UNESCO training, Guide (www.etdguide.org) NDLTD Committees Conference Membership Software Distribution Standards

179 Why ETD? Short Answer For Students: Gain knowledge and skills for the Information Age Richer communication (digital information, multimedia, …) For Universities: Easy way to enter the digital library field and benefit thereby For the World: Global digital library – large, useful, many services General: Save time and money Increased visibility for all associated with research results

180 The Process? Short Answer For Students: Plan on ETD from day 1 Secure knowledge from: workshops, online info, colleagues Work with faculty to plan approach PDF? XML? TEI? Multi/hypermedia? Data sets? Viz? Get signed approval form: access, ©, proxy assignment After defense and approval, submit ETD to university For Universities: Form team Adapt solution from work at other universities, attend ETD conference Pilot -> Option -> Requirement

181 Future Work - 1 of 2 Working with publishers to increase level of access as much as possible -> joint awards Interoperability tests to provide integrated services Study with testbed that emerges, to improve information retrieval, browsing, interface, and other types of user support Evaluation, improving learning experience, spread further as worldwide initiative, sustainable support and coordination

182 Future Work - 2 of 2 Adding services currently prototyped annotation and SDI (routing) capabilities fulltext search, crawling Adding other services planned building and using citation database (w. SFX) implementing plagiarism check (like “SCAM”) Further development of NDLTD Inc. as nonprofit charitable educational institution promoting education and digital libraries

183 Spirit of NDLTD Help make a better (smaller) world Win-win-win (everyone can benefit) Have fun helping others Helpers/teachers learn more than those they work with Cooperation, friendly competition When you “1-up” VT, share your software, documents! “Doing better” requires both “doing”, “better” Balance (and build on standards) New, popular, powerful, expressive, exciting, “better” Doable, feasible, learnable, affordable, sharable, preservable We can always do more, enhancing quality and knowledge!

184 Outline Virginia Tech context Why DLs? What are DLs? (5S theory) Case Study: CSTC -> CITIDEL-> NSDL Case Study: NDLTD Accessibility and Visualization DL Software: MARIAN Interoperability: OAI, ODL Topical Outline Selected Links

185 Portals and DLs Reengineering PhysNet in the uPortal framework by Ye Zhou (Dept. of Computer Science, MS thesis, Virginia Tech, May 2003). Hypothesis: DLs can be modeled as a set of interactive and non- interactive components with well-defined inter- component communication protocols. Offering a customizable User Interface (UI) toolkit can facilitate the process to build a DL. Distributed DL services can be achieved with the enablement of a web service on each individual component. To prove the hypothesis above, we designed, implemented, and tested a framework in a portal reengineering project.

186 PhysNet Re-engineering

187 PhysNet Re-Engineering Screen shot

188 New Services: PACS Recommender

189 Browse user interface with uPortal

190 Browse user interface with uPortal – cont’d

191 uPortal project, http://www.udel.edu/uportal/

192 Weiner, K., “Introduction to uPortal 2.1”, JA-SIG Conference, Dec. 2002

193 Accessibility Activities / Plans Interface design (simple, 3D, VR) Usability studies Generic multi-lingual support Support for those with disabilities Hybrid collection (paper, MARC, abstracts, full-text, multimedia) Disciplinary classifications, tools Visualization of results, collection

194 CAVE Experiments Use a familiar metaphor building / floor / room / shelf / book Rearrange orderings / shelving use categories, clustering, ranking use visualization: colors and gaps study space mappings: physical, logical Simplify movement for key tasks

195

196 CAVE-ETD CAVE-ETD is a simulation of a library that runs in a CAVE (VR environment). Populated with a subset of ETD records. Main Foyer room

197 Book Browsing

198 Reading Book Abstract

199 ENVISION NSF “A User-Centered Database from the Computer Science Literature” (1991-93) Collected bib/typesetter data, converted to SGML Scanned thousands of page images MARIAN search engine - can be made available (also applied to the Virginia Tech library catalog) used as part of a prototype object-based DL, with tailored visualization interface (L. Nowell dissertation)

200 Envision Results Window

201

202

203

204

205 Envision – New Version

206 Envision – New Versions - Clusters

207 SPIRE Visualization

208 VIDI: A Lightweight Protocol Between Visualization Systems and Digital Libraries Jun Wang Virginia Tech CS MS Thesis Spring 2002

209 Problem Concerned Scenario DL 1 DL 2 DL 3 VIS 1 VIS 2

210 VIDI Protocol Design Features Enabling interoperability Lightweight Extended OAI Protocol Flexible implementations enabled General XML, HTTP Standard time formats Dual usage of commands Simple and Easy!

211 VIDI Protocol Request Verbs Identify (DL, VIS) ListMetadataFormats (DL) ListVisdataFormats (VIS) ListTransformers (VIS) RequestResultSet (DL)

212 Extend OAI Protocol OAIOAI & VIDIVIDI GetRecord ListIdentifiers ListRecords ListSets Identify ListMetadataFormats ListVisdataFormats ListTransformers RequestResultSet

213 Implementation Roles and Times Implementing protocol Devising general approaches to protocol use for DL-VIS environments Applying protocol in representative cases ENVISION-ODL ENVISION-MARIAN

214 Implementation Process 1.Analyze metadata format in DL 2.Analyze visdata format in VIS 3.Write transformer (if not in registry) 4.Decide on command flow 5.Implement protocol commands

215 Command Flow Used In Prototype <back

216 ENVISION-ODL (II) Connect ENVISION with: ODL A DL implementing OAI protocol, which means we can issue OAI requests and receive responses to retrieve the data

217 ENVISION-MARIAN Connect ENVISION with: MARIAN A DL having multiple collections (NDLTD, DIRLINE, CITIDEL, VT catalog,…) User Authentication

218 Future Work SOM decoupling DLVIS Transformer

219 Outline Virginia Tech context Why DLs? What are DLs? (5S theory) Case Study: CSTC -> CITIDEL-> NSDL Case Study: NDLTD Accessibility and Visualization DL Software: MARIAN Interoperability: OAI, ODL Topical Outline Selected Links

220 MARIAN Multiple Access Retrieval of Information with Annotations (Marian the Librarian …) Evolved from CODER system to a distributed Online Public Access Catalog (OPAC), then DL backend, now becoming a full DL system From C/C++ to Java Future: NDLTD, NUDL, PetaPlex Use for campus collection management Use for www.theses.org as centralized system with gateway services: OAi, Harvest, Z39.50, …

221 MARIAN Digital Library Search & Retrieval System Principles Network representation Class-based retrieval Weight-valued functions and weighted sets Interoperability System: wrappers and harvesting Syntax: OAI standards (XML, Unicode, …) Structure: information networks Semantics: class-based retrieval : collection views

222 MARIAN Layers Database Layer Search Engine Layer User Information Layer User Interface Layer User

223 MARIAN Architecture

224 System & Syntactic Interoperability

225 MARIAN – Part of Class Hierarchy

226 Structural Interoperability through Information Networks

227 PhysDis Collection View

228

229 MARIAN Parallelism

230

231

232

233

234

235

236 Outline Virginia Tech context Why DLs? What are DLs? (5S theory) Case Study: CSTC -> CITIDEL-> NSDL Case Study: NDLTD Accessibility and Visualization DL Software: MARIAN Interoperability: OAI, ODL Topical Outline Selected Links

237 Open Archives Initiative OAI www.openarchives.org openarchives@openarchives.org

238 OAi Philosophy Self-archiving = submission mechanism Long-term storage system = archive Open interface = harvesting mechanism Data provider + service provider Start with “gray literature” e-prints/pre-prints, reports, dissertations, …

239 Open Archives Initiative (OAI) xxx@LANL, high-energy physics (Ginsparg, 1991) CSTR + WATERS = NCSTRL (Lagoze,1994) xxx + NCSTRL = CoRR collaboration (1998) Universal Preprint Service protoproto, Oct. 21-22, 1999, Santa Fe – led by LANL, CNI, DLF, Mellon --> OAi Santa Fe Convention (see Feb. D-Lib Magazine article) Follow-on mtgs: 6/3@San Antonio, 9/21@Lisbon (ECDL) Archives -> Open Archives Support unique archive identifiers Implement Open Archives metadata set (DC, using XML) Implement OA harvesting protocol (derived from Dienst protocol) Register the archive Build tools, layer other services: linking, searching, …

240 Open Archives (protoproto) ArXiv & Los Alamos National Lab CogPrints & U. Southampton NACA & NASA (reports) NCSTRL & Cornell U. NDLTD & Virginia Tech RePEc & U. Surrey Total of around 200K records

241 Original Open Archives Members American Physical Society California Digital Library Caltech Coalition for Networked Info. Cornell University Harvard University Library of Congress Los Alamos Nat’l Lab Mellon Foundation NASA Langley Research Cntr Old Dominion University Stanford University U. of Ghent U. of Surrey U. of Southampton Vanderbilt University Virginia Tech Washington University

242 Open Archives Future EconWPA (U. Washington) e-biomed -> PubMed Central (NIH) PubScience (DOE) Clinical Medicine Netprints (+ other HighWire Press holdings ) University ePub (California Digital Library) All public e-prints (MIT) Scholar’s Forum (Caltech) Int’l: CERN, Germany, India, Mexico, … Goal: millions of books/articles/reports / yr

243 Harvesting vs. Federation Competing approaches to interoperability Federation is when services are run remotely on remote data (e.g. Federated searching) Harvesting is when data/metadata is transferred from the remote source to the destination where the services are located (e.g. Union catalogues) Federation requires more effort at each remote source but is easier for the local system and vice versa for harvesting OAI currently focuses on harvesting

244 Metadata vs. Data Data refers to digital objects or digital representations of objects Metadata is information about the objects (e.g. title, author, etc.) OAI focuses on metadata, with the implicit understanding that metadata usually contains useful links to the source digital objects

245 Technical Umbrella for Practical Interoperability… Reference Libraries Publishers E-Print Archives …that can be exploited by different communities Museums

246 OAI – Repository Perspective Required: Protocol DO MDO

247 OAI – Black Box Perspective OA 1OA 2OA 4OA 3OA 5OA 6OA 7

248 OAI – Black Box Perspective OA 1OA 2OA 4OA 3OA 5OA 6OA 7 BrowseSummarizeSearchVisualize DO Services: Docs: Metadata:

249 Aggregation through OAI Harvesting ArchiveLite SitesNCSTRLEprints IEEE-CS, ACM, … Own: History, ResearchIndex, CSTC, … CITIDELActive

250 Tiered Model of Interoperability Mediator services Metadata harvesting Document models

251 Repository of Digital Objects Repository Access Protocol handle Digital object terms and conditions

252 Approaches to Open Archives Build By Discipline Build By Institution

253 Approaches to Open Archives Build By Discipline Build By Institution Author Category Interdisciplinary Year Language Query …

254

255

256 Author′s tools www.physik.uni-oldenburg.de/EPS/mmm

257

258 Discovery Current Awareness Preservation Service Providers Data Providers Metadata harvesting The World According to OAI

259

260 Mechanisms Sharing Join federation, run software Make metadata and archive available Aggregating By discipline By institution By genre Automating Workflow Harvesting and providing services Federated searching Dynamic linking (e.g., with SFX (OpenURLs))

261 VT View of the Open Archives Initiative (OAI) Enable sharing of publication metadata and full- text by digital libraries Standardize low-level mechanisms to share contents of libraries Build higher-level user-centric and administrative services in meta-libraries Install organizational mechanisms to support the technical processes Insights from 5S (streams, structures, scenarios)

262 Virginia Tech Projects MARC XML-DTD Computer Science Teaching Centre (CSTC) W3C Web Characterization Repository OAI Repository Explorer NDLTD Open Digital Libraries, XOAI-PMH

263 MARC XML-DTD XML Transport format for US-MARC records Standardized metadata exchange format for traditional library services joining OAI

264 Protocol for Metadata Harvesting Service Requests Identify ListMetadataFormats ListSets GetRecord ListIdentifiers ListRecords Metadata Multiplicity Date/Time Ranges Sets (with semantics depending on local data providers) Resumption Tokens

265

266 Key Features of the OAI Metadata Harvesting Protocol definitions & concepts repository record identifier datestamp set protocol features HTTP encoding metadata prefix & schema flow control protocol requests supporting requests harvesting requests

267 repository repositoryrepository OAI protocol harvesterharvester support data harvesting data items

268 identifiers oai-identifier = oai:archive-identifier:record-identifier Registered URI Scheme Archive Identifier: Registered within OAI Unique ID within archive: (syntax is archive- specific) example = oai:ncstrl:ncstrl.cornellcs/TR94-1418 locally unique key for extracting a record from a repository

269 selective harvesting - datestamps repositoryrepository harvest within date range record

270 selective harvesting - sets repositoryrepository harvest within set S1 record S2

271

272

273

274 OAI Tools Related resources, e.g., XML, Unicode Servers and utilities, e.g., ARC, Kepler, EPrints XML Schema Validator Repository Explorer Interactive Browsing Testing of parameters Multiple views of data Multilingual support Automatic test suite

275 ARC (arc.cs.odu.edu)

276

277 Kepler Architecture

278 OAI-based NCSTRL architecture

279

280

281 XSV Schema Validator

282 OAI Repository Explorer Serves as a compliancy test Allows browsing of open archives using only OAI protocol Sends requests on behalf of user, parses and checks responses and displays browsable interface Will detect most discrepancies in protocol http://purl.org/net/explorer

283 RE 1.3

284 OAI Repository Explorer Serves as a compliancy test Allows browsing of open archives using only OAI protocol Sends requests on behalf of user, parses and checks responses and displays browsable interface Will detect most discrepancies in protocol http://purl.org/net/explorer

285 Request, Response – OAI, VT ETDs

286 Case Study: NCSTRL Costs/Benefits StakeholdersSample Potential CostSample Potential Benefit ProvidersFacultyLower value for P&TFaster publishing StudentsLess recognitionBroader set of outlets PractitionersLimited relevanceEase of publishing, > quantity UsersFacultyLower quality of workBroader access to resources StudentsHigher access costs (vs. department available material) Lower access costs (vs. journal available material) DepartmentsNew maintenance costsBroader visibility University librariesAdditional access costsAccess to new resources PractitionersMore difficult accessAccess to new resources

287 The OAI Static Repository Model Components of the model The static repository An well-defined structure XML file with information similar to that in OAI-PMH responses Accessible at a persistent network-location The static repository gateway makes one or more Static Repositories harvestable. assigns a unique base URL to each such Static Repository Responding to OAI-PMH requests

288 The OAI Static Repository Model

289 DL Components User Interfaces Workflow Mgr DBMS Search Engines, Classifiers, … Data, MM Info Gateways Repository Rights Mgr MM/ HT Renderer

290 Open Digital Library (ODL) Hypothesis (Hussein Suleman) Can we leverage the successful model of the OAI Protocol for Metadata Harvesting to alleviate our architectural problems ? Maybe … if Digital Libraries can be modeled as networks of extended Open Archives, where each extended Open Archive is a source of data and/or a provider of services.

291 Open Digital Libraries XOAI-PMH Dissertation work of Hussein Suleman (member of OAI technical committee) Extending the OAI protocol Supporting rapid development of DLs using networks of components Demonstrated with NDLTD, CSTC Described in Dec. 2001 D-Lib Magazine article, and article submitted for publication

292 1010100101 0100101010 1001010101 0101010101 Program 1010100101 0100101010 1001010101 0101010101 Document 1010100101 0100101010 1001010101 0101010101 Document 1010100101 0100101010 1001010101 0101010101 Document 1010100101 0100101010 1001010101 0101010101 Program 1010100101 0100101010 1001010101 0101010101 Program 1010100101 0100101010 1001010101 0101010101 Image 1010100101 0100101010 1001010101 0101010101 Image 1010100101 0100101010 1001010101 0101010101 Image 1010100101 0100101010 1001010101 0101010101 Video 1010100101 0100101010 1001010101 0101010101 Video 1010100101 0100101010 1001010101 0101010101 Video usersdigital objects ?

293 1010100101 0100101010 1001010101 0101010101 Program 1010100101 0100101010 1001010101 0101010101 Document 1010100101 0100101010 1001010101 0101010101 Document 1010100101 0100101010 1001010101 0101010101 Document 1010100101 0100101010 1001010101 0101010101 Program 1010100101 0100101010 1001010101 0101010101 Program 1010100101 0100101010 1001010101 0101010101 Image 1010100101 0100101010 1001010101 0101010101 Image 1010100101 0100101010 1001010101 0101010101 Image 1010100101 0100101010 1001010101 0101010101 Video 1010100101 0100101010 1001010101 0101010101 Video 1010100101 0100101010 1001010101 0101010101 Video componentized digital library ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

294 1010100101 0100101010 1001010101 0101010101 Program 1010100101 0100101010 1001010101 0101010101 Document 1010100101 0100101010 1001010101 0101010101 Document 1010100101 0100101010 1001010101 0101010101 Document 1010100101 0100101010 1001010101 0101010101 Program 1010100101 0100101010 1001010101 0101010101 Program 1010100101 0100101010 1001010101 0101010101 Image 1010100101 0100101010 1001010101 0101010101 Image 1010100101 0100101010 1001010101 0101010101 Image 1010100101 0100101010 1001010101 0101010101 Video 1010100101 0100101010 1001010101 0101010101 Video 1010100101 0100101010 1001010101 0101010101 Video open digital library OA PMH XPMH

295 Component System Approach (Open) DL = Network of Extended OAs Local Archive Data Input Remote Archive Browse Metadata Repository SearchRecommend Resource Discovery User Interface OAI/ODL archive OAI/ODL protocol legend

296 Example Architecture (NDLTD) Humboldt Duisburg MIT Filter MIT Browse Union Catalog SearchRecent User Interface OAI/ODL archive OAI/ODL protocol legend Virginia Tech PhysNet CalTech Dresden

297 ODL Demonstration - FrontPage

298 ODL Demonstration - Search

299 ODL Demonstration - Browse

300 ODL Component Requirements Search Retrieve a list of items Index new items Annotate Add annotation to item Retrieve a list of annotations for an item

301 Open Digital Library Components Running now XML-File (data provider from file system) Union, search, browse, recent, filter E-journal/review, Submit, Edit, Annotation Class projects High performance multilingual search Recommender, Rating; Mirroring (see JCDL’02) Working with NCSA: from DB, unstructured text Others discussed Classification/categorization DL-Viz interconnection (VIDI – Jun Wang ETD)

302 Harvest from data providers DBUnion Archive Merger Component DBBrowse Browse Engine IRDB-1 Search Engine As Metadata Search Service Provider As Metadata Browse Service Provider XML File Coll. & Data Provider 1 XML File Coll. & Data Provider 2 XML File Coll. & Data Provider 3 Open Digital Library: Extended What’s New Engine As What’s New Service Provider OAI-PMH Data Provider Submit Archive OAIB (NCSA: from RDBMS) Filter Recommend Rate Engine Annotation Engine IRDB-2 Search Engine As Annotation Search Service Provider As Recommend & Rate Service Provider

303 1010100101 0100101010 1001010101 0101010101 Program 1010100101 0100101010 1001010101 0101010101 Document 1010100101 0100101010 1001010101 0101010101 Document 1010100101 0100101010 1001010101 0101010101 ETD-1 1010100101 0100101010 1001010101 0101010101 Program 1010100101 0100101010 1001010101 0101010101 ETD-2 1010100101 0100101010 1001010101 0101010101 Image 1010100101 0100101010 1001010101 0101010101 Image 1010100101 0100101010 1001010101 0101010101 ETD-3 1010100101 0100101010 1001010101 0101010101 Video 1010100101 0100101010 1001010101 0101010101 Video 1010100101 0100101010 1001010101 0101010101 ETD-4 Digital Library for the Networked Digital Library of Theses and Dissertations (www.ndltd.org) Search Filter Union Recent Browse PMH ODLRecent ODLBrowse ODLUnion ODLSearch ODLUnion PMH USER INTERFACE Students and researchers ETD collections Example Open Digital Library

304 Digital Library for the Computer Science Teaching Center (www.cstc.org)

305 Digital Library in a Box Domain: helping DL projects Genre: any domain, but especially those involved in NSDL (since funded in part is through NSDL – with U. FL, NCSA) Software and Documentation: http://dlbox.nudl.org

306 Outline Virginia Tech context Why DLs? What are DLs? (5S theory) Case Study: CSTC -> CITIDEL-> NSDL Case Study: NDLTD Accessibility and Visualization DL Software: MARIAN Interoperability: OAI, ODL Topical Outline Selected Links

307 Topical Outline: Digital Library Courseware http://ei.cs.vt.edu/~dlib/ WWW pages or large PDF copy files Online quizzes based on book by Michael Lesk (Morgan Kaufmann Publishers) Contents based on book, with several other popular topics added (e.g., agents) Separate pages to supplement: Definitions, Resources (People, Projects), and References

308 Topical Outline - Foundations Early visions Definitions Resources References Projects

309 Topical Outline - Foundations Early visions: “a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility” - Bush, V., “As We May Think”, Atlantic Monthly, 176(1):101-108, Jul. 1945

310 Topical Outline - Foundations Definitions: A digital Library (DL) can be described as an electronic information storage system focused on meeting the information seeking needs of its users. - Levy, D., and Marshall, C. C., “Going Digital: A Look at Assumptions Underlying Digital Libraries”, Communications of the ACM, 38(4):78-84, 1995.

311 Topical Outline - Foundations Definitions: Association for Research Libraries, http://sunsite.berkeley.edu/ARL/definition.html: The digital library is not a single entity. The digital library requires technology to link the resources of many. The linkages between the many digital libraries and information services are transparent to the end users. Universal access to digital libraries and information services is a goal.

312 Topical Outline - Foundations Definitions: Digital libraries are organizations that provide the resources, including the specialized staff, to select, structure, offer intellectual access to, interpret, distribute, preserve the integrity of, and ensure the persistence over time of collections of digital works so that they are readily and economically available for use by a defined community or set of communities - Digital Library Federation, “A Working Definition of Digital Library”, Apr. 1999. http://www.clir.org/diglib/dldefinition.htm

313 Topical Outline - Foundations Definitions: A library that maintains all, or a substantial part, of its collection in computer accessible form as an alternative, supplement, or complement to the conventional printed and microfilm materials that currently dominate library collections. Used in this context, the term "collection" denotes the documents that a library acquires or maintains - Saffady, W., “Digital Library Concepts and Technologies for the Management of Library Collections: An Analysis of Methods and Costs”, Library Technology Reports 31.3:221-380, 1995

314 Topical Outline – IR Areas Search, Retrieval, Resource Discovery Information storage and retrieval Boolean vs. natural language Search engines Indexing, phrases, thesauri, concepts Federated search and harvesting, OAI Integrating links and ratings Crawlers, spiders, metasearch, fusion Details following – Li Wang indep. study

315 Logical View of Document + Indexing from Baeza-Yates, R., and Ribeiro-Neto, B., Modern Information Retrieval, Addison Wesley, 1999

316 Retrieval Process from Baeza-Yates, R., and Ribeiro-Neto, B., Modern Information Retrieval, Addison Wesley, 1999

317 PACS Automatic Classification Classifier Trained Model Classification Scheme Classifier Trained Model Classification Scheme Selector J2EE Server Container Classification Servlet

318 PACS Automatic Classification Online

319 What is a Crawler? A Program An Important Module For Web Search Engine Crawls On The Web According To Its Algorithm Retrieves Web Pages Gets Useful Information Stores The Web Pages For Future Refining

320 Jobs For Threads Get A New URL From Buffer Contact The Server For File Type Download The File Parse The Web Page Put New URLs Into Buffer

321 Advanced Functions Backward Linkage Information Collector A Web Page

322 Topical Outline - Multimedia Multiple media types, representations Text, audio, image, video, graphics, animation Capture, digitization, standards, interchange Compression, content-based retrieval Playback (Real), SMIL, QoS JPEG, MPEG (and versions)

323 Document Models, Representations, and Accesses Doc = stream + structure + use-scenario; hybrid (paper/electronic), digital only Multilingual: content, summary, metadata Multimedia: structure, quality (oS), search Structured: MARC, SGML, by user: MVD Distributed collection: Kleisli, CIMI, Z39.50 Federated search: collecting, picking site(s), parallel search / fall-back, fusing results Access: IPR, payment, security, scenarios

324 Topical Outline - Architectures Distributed, centralized Modular, componentized Bus (InfoBus), hierarchical, star Mediators, wrappers (TSIMMIS) Light weight protocols Architecture of OAI and XOAI

325 Architectural Issues Internet middleware Independent system / part of federation Decompositions vary search engine, browser, DBMS, MM support repository, handle server, client information resources + mediators, bus or agent collection + client with workspace/environment Metrics: e.g., for federated search

326 Sornil & Mather Dissertations Mather: efficiently handling very large numbers of objects of varying sizes Sornil: efficiently handling IR for very large dynamic collections, large numbers of users, high transaction rates, large inverted files modeling and simulation data organization parallelization of algorithms, alone and in combination for retrieval (related) tasks

327 OAI and I2-DSI (Ryan Richardson) OAI – metadata harvesting OAI-PMH Data providers/Service providers I2-DSI – mirroring and replication Can we put them together to get benefits of both? Is this the first of many higher level mirroring schemes for Internet2, that provide independence from lower level representation issues?

328 I2-DSI interface

329 I2-DSI Architectural Diagram Mirror User Internet OAI Server Distributed Director

330 I2-DSI Technical Issues Resumption tokens Mirrors are stateless, so any mirror can answer (any part of) a request Every client request to the mirror is logged, to enable log comparison among the mirror archives Mirroring time is dependent on # of records sent in each chunk Can we connect this or something similar with LOCKSS (lots of copies keep stuff safe)

331 Topical Outline – Interfaces Taxonomy of interface components Workflow Visualization Environments Design Usability testing

332 Topical Outline – Metadata MARC Dublin Core RDF IMS OAI (Open Archives Initiative) Crosswalks, mappings Ontologies Topics maps, concept maps

333 Automatic Generation of Concept Maps Ryan Richardson, Rao Shen  Concept maps are a valuable pedagogical tool (Novek & Gowin, 1984)  Are concept maps a good summarization tool?  Answer: Yes, they are at least a good supplement to abstracts, according to an experiment we did last semester  Unfortunately, making concept maps by hand is tedious for more than a few documents  Can we generate useful concept maps automatically, for both English and Spanish documents?

334 Generation by term co-occurrence  Procedure for Spanish documents  Determine part-of-speech for each word  Collapse all inflected forms to root form  Concatenate noun phrases into one “concept”  Remove some stopwords; others are crosslinks  Future: Use synonym sets to further collapse words  Use Agrawal’s algorithm for association rules to find related concepts  Can translate node/link labels into English automatically, if desired

335 Automatically generated concept map This map was extracted from a Spanish essay on “Cien Años de Soledad” (100 Years of Solitude)

336 References Background: http://cmap.coginst.uwf.edu/info/ GetSmart: http://ai8.bpa.arizona.edu:8080/aicm/index.html http://ai.bpa.arizona.edu/go/mlir/ Agrawal's association rules algorithm: http://citeseer.nj.nec.com/agrawal94fast.html Gaines and Shaw: http://ksi.cpsc.ucalgary.ca/articles/ConceptMaps/ CM.html Japanese Work: http://www.icce2001.org/cd/pdf/P06/JP031.pdf Singapore work: http://textmining.krdl.org.sg/people/kanagasa

337 Topical Outline – Epub, SGML, XML Authoring Rendering, presenting Structure Tagging, Markup, DOM Semi-structured information Dual-publishing, eBooks Styles (XSL, XSLT) Structure queries

338 Topical Outline – Databases Extending database technology Structured and unstructured info Multimedia databases Link databases Performance Replicated storage, I2-DSI (details following)

339 Topical Outline – Agents Protocols Knowledge interchange Negotiation, registries Distributed issues Ontologies (standard upper) Webbots (automatic indexing)

340 Topical Outline – Economics E-commerce Sustainability Preservation and archiving DLF, Besser, Lorie, Gladney Self-archiving Open collections Economic models, business plans

341 Topical Outline – IPR Intellectual property rights (IPR) Legal issues Terms and conditions Copyright Patents, trademarks Distributed rights management Security

342 Topical Outline – Social Issues Cooperation, collaboration Annotation, ratings Digital divide Educational applications Cultural heritage Museums (AMICO) Organizational acceptance Personalization Internationalization

343 Increase local interchange among students, faculty, library, graduate school Increase international understanding, building many more invisible colleges, with students more empowered Connect graduate researchers with undergrads, who can access ETDs / them Facilitate direct university collaboration, explicitly, in reshaping publishing world Social Capital?

344 Collaborative Development (Joan Lippincott)

345 Why Collaboration? Expertise in aspects of the digital environment Pooling of resources

346 Collaboration and digital projects Distributed systems Digital course content Digital library resources Delivery of services Development of policies

347 Collaborations involve: Shared goals Common vision Shared vocabulary

348 Two views of an ETD progam Have staff scan Implement now Increase university visibility Teach students to write and submit ETDs Implement soon Develop electronic authors

349 In a collaboration... Each contributes resources Partners acknowledge and value contributions Partners develop a clear process Group and individual accountability

350 ETD project participants Academic administrators Faculty Students Staff Graduate school / provost / registrar Information technologists Librarians

351 Collaboration and NDLTD Common goals of members Diverse sets of skills and expertise Need for strategies and tactics to surmount any problems -> advocacy

352 Collaborative project strategy Champion initiates project Leadership establishes initial goal and parameters Issue a call for participants Conduct procedure to select participants

353 Collaborative project strategy Initial meeting Develop shared goals Develop clear process Continue work at institutions Establish communication channels Establish project milestones Evaluate progress, refine approach

354 Outline Virginia Tech context Why DLs? What are DLs? (5S theory) Case Study: CSTC -> CITIDEL-> NSDL Case Study: NDLTD Accessibility and Visualization DL Software: MARIAN Interoperability: OAI, ODL Topical Outline Selected Links

355 Selected Links - http://fox.cs.vt.edu CITIDEL www.citidel.org NCSTRL www.ncstrl.org NDLTD www.ndltd.org and etdguide.org NSDL www.nsdl.org Virginia Tech Digital Library Courseware http://ei.cs.vt.edu/~dlib Virginia Tech Digital Library Research Laboratory (DLRL) http://www.dlib.vt.edu (5S, 5SL, AmericanSouth.Org, CSTC, ENVISION, MARIAN, NDLTD, NSDL, OAI, ODL) Virginia Tech DLRL OAI Projects http://www.dlib.vt.edu/projects/OAI/ Repository Explorer http://purl.org/net/oai_explorer

356 More Links ARC Cross-Archive Search Service http://arc.cs.odu.edu/ Dublin Core Metadata Initiative www.dublincore.org E-Prints DL-in-a-box www.eprints.org Open Archives Initiative http://www.openarchives.org OAI Metadata Harvesting Protocol http://www.openarchives.org/OAI/openarchivesprotocol.htm XML Schema Validator http://www.w3.org/2001/03/webdata/xsv XML Tools at W3C http://www.w3.org/XML/#software


Download ppt "Overview of Digital Libraries: From Requirements to Theory to System to Projects JCDL 2003 – Houston, TX, USA Tutorial – May 27, 2003 Edward A. Fox"

Similar presentations


Ads by Google