Download presentation
Presentation is loading. Please wait.
Published byGervais Douglas Modified over 8 years ago
1
Overview of Digital Libraries: From Requirements to Theory to System to Projects JCDL 2003 – Houston, TX, USA Tutorial – May 27, 2003 Edward A. Fox fox@vt.edu http://fox.cs.vt.edu CS DLRL Internet TIC Virginia Tech, Blacksburg, VA, USA
2
Acknowledgements (Selected) Sponsors: ACM, Adobe, IBM, Microsoft, NLM, NSF, OCLC, SOLINET, SURA, US Dept. of Ed. (FIPSE), … VT Faculty/Staff: Marc Abrams, Tony Atkins, Debra Dudley, John Eaton, H. Rex Hartson, Deborah Hix, JAN Lee, Mann-Ho Lee, Gail McMillan, James Powell, Naren Ramakrishnan, Shalini Urs, … VT Students: Fernando Das Neves, Robert France, Marcos Goncalves, Neill Kipp, Paul Mather, Ryan Richardson, Rao Shen, Ohm Sornil, Hussein Suleman, Wensi Xi, Ye Zhou…
3
Outline Virginia Tech context Why DLs? What are DLs? (5S theory) Case Study: CSTC -> CITIDEL-> NSDL Case Study: NDLTD Accessibility and Visualization DL Software: MARIAN Interoperability: OAI, ODL Topical Outline Selected Links
4
Outline Virginia Tech context Why DLs? What are DLs? (5S theory) Case Study: CSTC -> CITIDEL-> NSDL Case Study: NDLTD Accessibility and Visualization DL Software: MARIAN Interoperability: OAI, ODL Topical Outline Selected Links
5
Virginia Tech Background Largest university in Virginia, land-grant, football, town population 35K plus 26K students Blacksburg Electronic Village, since 1992, with > 80% of community on Internet Net.Work.Virginia, with sites for education, research, government LMDS, Local Multipoint Distribution Service, gigabit wireless networking - 1/3 of Virginia Math Emporium, 500 workstations Faculty Development Initiative, round 3 Torgersen Hall, $30M Advanced Communications and Information Technology Center, with DLRL
6
Internet Technology Innovation Center Supported by Virginia’s Center for Innovative Technology Statewide University Partners - Governing Board: Christopher Newport University William Winter, William Muir, Virginia Electronic Commerce Technology Center / Southeastern Virginia Network (VECTEC/SEVAnet) George Mason University Steven Ruth, International Center for Applied Studies in IT (ICASIT) Old Dominion University – Kurt Maly (CS Head), … University of Virginia Alf Weaver, Internet Commerce Group (InterCom) Jim French, Internet Digital Library Virginia Tech Edward Fox, Digital Library Research Laboratory (DLRL), CC, CS Scott Midkiff, Center for Wireless Telecomm. (CWT), VTISC, ECpE
7
ITIC @ VT Research Areas Collaboration (e.g., group decision support) Community networking (e.g., BEV) Internet access (e.g., statewide network) Information services (e.g., digital libraries) Modeling and simulation (e.g., Web traffic) Usability (e.g., human factors engineering) Virtual environments (e.g., CAVE, visualization)
8
Digital Libraries --- Virginia Tech MARIAN (NLM, NSF) CS DL Prototype - ENVISION (NSF, ACM) TULIP (Elsevier, OCLC) BEV History Base (NSF, Blacksburg) DL for CS Education - EI (NSF, ACM) WATERS, NCSTRL (NSF) NDLTD (SURA, US Dept. of Education, NSF) CSTC (NSF, ACM), CRIM (NSF, SIGMM) WCA (Log) Repository (W3C) VT-PetaPlex-1 (Knowledge Systems) NSDL (NSF): CITIDEL, DL-in-a-Box, GetSmart AmericanSouth.Org (Mellon)
9
DL Examples IBM Digital Library Virtua (www.vtls.com) Greenstone (www.greenstone.org) Eprints (www.eprints.org) Many systems in NSF DLI projects VT systems: MARIAN, CSTC, NDLTD Work on ODL, DL-in-a-box, CITIDEL, NCSTRL
10
Outline Virginia Tech context Why DLs? What are DLs? (5S theory) Case Study: CSTC -> CITIDEL-> NSDL Case Study: NDLTD Accessibility and Visualization DL Software: MARIAN Interoperability: OAI, ODL Topical Outline Selected Links
11
Digital Libraries SGML (1985) PDF (1992) NSF DLI (1994) Library Cancellations (1988) University Scholarly Electronic Pub. (1988) Info. Literacy (1995) Improving Education Internet (1984) WWW (1994) Multimedia (1986)
12
Synchronous Scholarly Communication Same time, Same or different place
13
Asynchronous, Digital Library Mediated Scholarly Communication Different time and/or place
14
Borgman et al.: Workshop Report on Social Aspects of Digital Libraries: http://www-lis.gseis. ucla.edu/DL/ Information Life Cycle
15
Information Life Cycle Authoring Modifying Organizing Indexing Storing Retrieving Distributing Networking Retention / Mining Accessing Filtering Using Creating
16
Computing (flops) Digital content Communicat i ons (bandwidth, connectivity) Locating Digital Libraries in Computing and Communications Technology Space Digital Libraries technology trajectory: intellectual access to globally distributed information lessmore
18
Integrated CCLINC Translingual Information System DARPA Extraction What is the north korean movement in the front line? CCLINC SERVER Info Detection Summarization It seems that North Korea launch a missile again After North Korea launched a Daipodong missile last month, NK is perceived to proceed to an additional test launch. Korea, US and Japan enter into an alert state, and prepare for a joint response policy. Korea estimates that the additional launch will be on 09/05. Japan estimates that NK’s missile range is short. US information says that there is no sign of launch yet. Translation What is the status of nk missile launch against japan? BugHanI IlBonE Ddo MiSaIlEul BalSaHan Deus HaDa 2-way Speech Transation
19
Structured Video Browser (making video into hypermedia) www.learn.umd.edu IBrowse Expository multimedia Narrative Structures
20
ICU Information and Communication University Users Web Search Engines WWW Servlet Engine Web Server OS DB Search Server Servlet MPEG-7 Description Module 1 2 3 4 5 3’ 4’ 5’ MPEG-7 Image Library Systems Tech. t MPEG-7 Image Library Systems
21
t MPEG-7 Video Library Systems Tech. ICU Information and Communication University MPEG-7 Video Library Systems Tech. Video Data Description Generator Description Schemes Design Tool Description Scheme Meta Database Video Database Retrieval Server Module Player Presentation Module Architecture
22
About enumerate founders know numbers, computers, and the Web. They saw the Web’s potential to solve a big problem: Working with numeric information is too hard!
23
How does enumerate help? RDL Value - What is the number? Format - $100 [in thousands] Semantics - How it translates or corresponds to other formats Provenance – created by whom and when? Measure - scale about the number: Units: feet, meters, $, pounds, RBI Magnitude: thousands, millions, billions Modifiers: number been manipulated? Structure - relationship of numbers to each other
24
The enumerate formula
25
enumerate’s interactive data can be analyzed instantly
26
Standards Protocols/federation Z39.50, CIMI Dienst, NCSTRL OAI protocol Metadata TEI: inline, detailed (structure in stream) MARC: two-level, fine-grained Dublin Core: high-level, 15 elements RDF: describing resources/collections, annotation OAMS -> DC and others used in OAI
27
AmericanSouth.Org – Roles, Content SOLINETLibraries (Data Providers)Scholars Intellectual Organization Controlled vocabulary Metadata extension development Collection Decisions Selection Criteria Controlled vocabulary Central Server MaintenanceLocal Server MaintenanceProvision of Context Metadata RepositoryMetadata Creation/MaintenanceOrganizational Structure and Annotation Tools Central Interface Design/MaintenanceLocal Interface Design/MaintenanceSelection of Other Annotation Tools Central Indices Creation/MaintenanceLocal IndicesSelection of Thesauri Coordination of Metadata Gateway Development Gateway ImplementationConcept Mapping Digital Objects
28
Content Area DescriptionAudioDigitalFinding Aid MSSOtherPhotoVideoMFPrintTotal African-American cultural life64694123101872 Agricultural crisis of late 19 th century113114819 Codification of segregation laws13211816 Configuration of white supremacy13331920 Cultural values and activities31517415152071 Disenfranchising movements122121615 Educational movements61118621352798 Emergence of Holiness & Pentecostal Groups111710 Emergence of new musical forms311128 Emergence of organized groups expressing farmers concerns 221813 Expansion of Southern evangelical Protestant Churches 31939112359
29
Content Area DescriptionAudioDigitalFinding Aid MSSOtherPhotoVideoMFPrintTotal Expansion of industrial activity61251051452 Forms of inter-racialism11121410 Great Migration & its relationship to worsened race relations in the South 33 Growth of business151211351552 Growth of cities & towns11512413121857 Interplay of economic interest among regions114121616 Local literature312174733168 Lost Cause monument movement3238 Political relationships between Populist & other groups 12249
30
Content Area DescriptionAudioDigitalFinding Aid MSSOtherPhotoVideoMFPrintTotal Popular magazines & newspapers221131735 Reactions of African-American leaders to Segregation 212412111024 Relationship among Southern Populists & those in the West 11 Relationship between new racial system of 1890s and other 241815 Role of immigration112642925 Survival of African-American communities & Culture 22157121333 Women’s Groups21101514933 Total Each Format411451161381331379301831
31
Digital Libraries Shorten the Chain from Editor Publisher A&I Consolidator Library Reviewer
32
DLs Shorten the Chain to Author Reader Digital Library Editor Reviewer Teacher Learner Librarian
33
Digital Libraries --- Objectives World Lit.: 24hr / 7day / from desktop Integrated “super” information systems: 5S: streams, structures, spaces, scenarios, societies Ubiquitous, Higher Quality, Lower Cost Education, Knowledge Sharing, Discovery Disintermediation -> Collaboration Universities Reclaim Property Interactive Courseware, Student Works Scalable, Sustainable, Usable, Useful
34
Benefits Ease of use Effectiveness “The benefits of digital libraries will not be appreciated unless they are easy to use effectively.” - IITA Workshop report
35
DLs: Why of Global Interest? National projects can preserve antiquities and heritage: cultural, historical, linguistic, scholarly Knowledge and information are essential to economic and technological growth, education DL - a domain for international collaboration wherein all can contribute and benefit which leverages investment in networking which provides useful content on Internet & WWW which will tie nations and peoples together more strongly and through deeper understanding
36
Application Domain Related InstitutionsExamples Technical ChallengesBenefit / Impact Publishing Publishers, Eprint archives OAI Quality control, opennessAggregation, organization Education Schools, colleges, universities NSDL, NCSTRL Knowledge management, reuseability Access to data Art, CultureMuseumAMICO, PRDLA Digitization, describing, catalogingGlobal understanding Science Government, Academia, Commerce NVO, PDG, SwissProt, UK eScience,European Union Commission Data models reproducibility, faster reuse, faster advance (e) Government Government Agencies (all levels) Census Intellectual property rights, privacy, multi-national Accountability, homeland security (e) Commerce, (e) Industry Legal institutionsCourt cases, patents Developing standardsStandardization, economic development History, Heritage FoundationsAmerican Memory Content, context, interpretation Long term view, perspective, documentation, recording, facilitating, interpretation, understanding Cross- cutting Library, Archive Web, personal collections Multi-language, preservation, scalability, interoperability, dynamic behavior, workflow, sustainability, ontologies, distributed data, infrastructure Reduced cost, increased access, pereservation, democratization, leveling, peace, competitiveness Reagan MooreEd FoxReagan MooreEd Fox June 2002for NSFJune 2002for NSF
37
Libraries of the Future JCR Licklider, 1965, MIT Press World Nation State City Community
38
DL Challenges Preservation - so people with trust DLs Supporting infrastructure - networks,... Scalability, sustainability, interoperability DL industry - critical mass by covering libraries, archives, museums, corporate info, govt info, personal info - “quality WWW” integrating IR, HT, MM,... Need tools & methods to make them easier to build
39
DL Examples IBM Digital Library Virtua (www.vtlc.com) Greenstone (www.greenstone.org) Eprints (www.eprints.org) Many systems in NSF DLI projects VT systems: MARIAN, CSTC, NDLTD Work on ODL, DL-in-a-box, CITIDEL, NCSTRL
40
Definitions Library ++ (library+archive+museum+…) Distributed information system + organization + effective interface User community + collection + services Digital objects, repositories, IPR management, handles, indexes, federated search, hyperbase, annotation
41
DL Services/Activities Taxonomy (Gonçalves) Browsing Collaborating Customizing Filtering Providing access Recommending Requesting Searching Visualizing Annotating Classifying Clustering Evaluating Extracting Indexing Measuring Publicizing Rating Reviewing (peer) Surveying Translating (language) Conserving Converting Copying/Replicating Emulating Renewing Translating (format) Acquiring Cataloging Crawling (focused) Describing Digitizing Federating Harvesting Purchasing Submitting PreservationalCreational Add Value Repository-Building Information Satisfaction Services Infrastructure Services
42
Definition: Digital Libraries are complex systems that help satisfy info needs of users (societies) provide info services (scenarios) organize info in usable ways (structures) present info in usable ways (spaces) communicate info with users (streams)
43
5S Layers Societies Scenarios Spaces Structures Streams
44
5S Model: Examples, Objectives ModelsExamplesObjectives Stream Text; video; audio; imageDescribes properties of the DL content such as encoding and language for textual material or particular forms of multimedia data Structures Collection; catalog; hypertext; document; metadata; organization tools Specifies organizational aspects of the DL content Spatial Measure; measurable, topological, vector, probabilistic Defines logical and presentational views of several DL components Scenarios Searching, browsing, recommending, Details the behavior of DL services Societies Service managers, learners, Teachers, etc. Defines managers, responsible for running DL services; actors, that use those services; and relationships among them
45
5S Model: Definitions 5SDefinition Streams Sequences of elements of an arbitrary type Structures Labeled directed graphs Spatial Sets and operations on those sets Scenarios Sequences of events that modify states of a computation in order to accomplish some functional requirement. Societies Sets of communities and relationships among them
46
Overview of 5S and DL formal definitions and compositions (Gonçalves)
47
Semantic relationships among DL concepts: Partial concept map (Gonçalves)
48
5S Framework and DL Development (Gonçalves)
49
5SL: Stream Model: ETD example text/xml UTF-8 application/pdf ENG...
50
5SL: Scenario Model Example - ETD Submission ETDReviewerETDWorkflowManager Repository Login(password) CheckSubmittedETDs ETDList Identifier get(Identifier, Submission) ETDReviewPage CheckETDFiles ETD GraduateSchool *GetFeesInfo FeesInfo [decision=accept] add(ETD, ETDCollection) [decision=reject] communicateProblem [while reviewNextETD=True] Accepted Rejected getDecision decision FeesInfo
51
Basic elements of DL services definition
52
5SLGen: Automatic DL Generation
53
MARIAN DL Generation MARIAN Digital Library Generator 5SL Design XML PARSERS: DOM, SAX MARIAN API Component Pool Class managers LoaderUser interfaces Indexing Classes Resource Manager Configuration and Processing Classes
54
MARIAN DL Generation Statistics Code Generation NDLTD Union ArchiveGenerated Lines of code Indexing Classes 154 Class Managers and ClassIDs 342 Collection Loader and Handler 361 Document presentation and User Interfaces 800
55
5SLGen for ODL
56
Information Life Cycle (plus quality dimensions from 5S perspective – Gonçalves)
57
Overview of 5SGraph Workspace (instance model) Structured toolbox (metamodel)
59
DL Standardized Log Format- Design 5SDefinitionUse in Log Design Streams Represent static and dynamic multimedia content Temporal events, types of digital objects Structures Labeled directed graphs; provide organization within the DL Structured documents and metadata; structured searches, collection, metadata catalog; hypertext, classification scheme Spaces Sets, properties and operations on those sets Retrieval mode, Presentation information, Scenarios sequences of events that modify states of a computation in order to accomplish some functional requirement. Organization of the user and system actions into transactions, statements, events and actions; DL services as sets of scenarios. Societies Sets of communities and relationships among them User information
60
The Digital Library Standardized Log Format (cont.) Specification Collection of extensive, flat set of attributes query event registering transaction session error browse actiontimestamp Machine information help search update Sorting rule search catalog collection Result cutoff response
61
The Digital Library Standardized Log Format - Structure Top Level Hierarchy Log Log Entry Transaction SessionId MachineInfo TimeStamp Statement...
62
The Digital Library Standardized Log Format – Structure (cont.) Decomposition of statement into different types AdmInfo Statement SessionInfo Event ErrorInfo HelpInfo RegisterInfo
63
AdmInfo Statement SessionInfo Event ErrorInfo HelpInfo RegisterInfo Action StatusInfo SearchBrowseStoreSysInfo Update The Digital Library Standardized Log Format – Structure (cont.) Decomposition of event
64
The Digital Library Standardized Log Format – Structure (cont.) Search Attributes Search QueryString TimeFrame PresentationInfo SearchBy FormatNumberOfResultsSortBy CutOff Collection Catalog
65
DL Log Tool and Implementation
66
Creating the Clickstream Stats Visualizer (GUI) Visualizations Use Activity File User 4532; 25 Logons, 22 Logoffs, 3 accesses from.edu, 2 hits from.gov, history:: Logon page [13 may 2003, 16:00] -> Browse page [13 may 2003, 16:02] -> search page [13 may 2003, 16:04] -> results page [13 may 2003, 16:04] -> view document 254 page [13 may 2003, 16:07] -> download page [13 may 2003, 16:10] -> logoff page [13 may 2003, 16:11] User 4555432; 3 Logons, 0 Logoffs, 1 accesses from.mil, 2 hits from.com, history: Logon page [12 may 2003, 12:00] -> Browse page [12 may 2003, 12:02] -> logoff page [12 may 2003, 12:03] Etc. etc. etc Clickstream stat generator Clickstream stats Step 1: intermediate Statistics Files are Used as input Step 2: Clickstream Data is produced Step 3: GUI is used To produce usage Statistics, clickstream Stat visual aids, etc. Step 1: The user activity file is input into a clickstream stat generator. Step 2: The clickstream generator produces aggregate statistics (i.e., average transition time from browse to results page, the average time spent on each page, etc). Step 3: The clickstream stats are output in both text and visual format. For example, the average path through a website can be displayed, and each page of a website can include hit statistics and time-on-page statistics.
67
OCKHAM Simplicity (a la OCCAM’s razor) Support by Mellon and DLF Next meeting in Atlanta Jan. 8, 2003 Four main ideas: 1.Components 2.Lightweight protocols 3.Open reference models (e.g., 5S, OAIS) 4.Community perspective and involvement
68
Problem Why do DL developers continue to “reinvent the wheel”? The top 10 reasons are: 1.The library budget won’t allow purchase of a commercial DL system. 2.Unless the development effort is local, there won’t be any control. 3.DLs are extensions of DBMSs, so they are simple applications to develop. 4.Since DLs operate on the Web, one must adopt the newest W3C proposal.
69
Problem – cont’d 5.Since technology moves so quickly, it is essential to follow the latest fad. 6.CS students always develop from scratch. 7.This team knows it can do it better. 8.This system must have more capabilities than any other system. 9.This DL has to be more flexible and extensible. 10.This is the right system architecture – at last!
70
Problem Approach We address the problem of how to develop DLs; build on experience in building many DLs; strive for simplicity as per OCKHAM initiative; build upon the Open Archives Initiative; demonstrate our approach in diverse situations; and invite all to use DL-in-a-box and help build Open Digital Libraries.
71
Outline Virginia Tech context Why DLs? What are DLs? (5S theory) Case Study: CSTC -> CITIDEL-> NSDL Case Study: NDLTD Accessibility and Visualization DL Software: MARIAN Interoperability: OAI, ODL Topical Outline Selected Links
72
CS -> CSTC -> CRIM NSF and ACM Education Committee are funding a 2 year project “A Computer Science Teaching Center” - CSTC - http://www.cstc.org/ College of NJ, U. Ill. Springfield, Virginia Tech Focus initially on labs, visualization, multimedia Multimedia part is also supported by a 2nd grant to Virginia Tech and The George Washington University: http://www.cstc.org/~crim/ (with curricular guidelines also under development)
73
CS Teaching Center (CSTC) Instead of building large, expensive multimedia packages, that become obsolete and are difficult to re-use, concentrate on small knowledge units. Learners benefit from having well-crafted modules that have been reviewed and tested. Use digital libraries to build a powerful base of support for learners, upon which a variety of courses, self-study tutorials & reference resources can be built. ACM support led to Journal of Educational Resources in Computing (JERIC), accessible from www.cstc.org
75
Browsing (1)
76
Browsing (2)
80
SMETE Library -> NSDL (from www.dlib.org to NSF DLI-2) Context: Global movement toward Digital Libraries (see April 1998 CACM) NSF 00-44 effort: Science, Mathematics, Engineering, and Technology Education Digital Library (focussed on undergraduates) 3 workshops, yearly increasing funds / new calls NSDL will operate as a distributed federation, with separate parts for each key discipline, and should lead to a global effort.
81
Selected NSDL Early Projects/Topics COLLEGIS Res. Inst.IMS, CS, Math, Viz., … Columbia UniversityEarth sciences Stanford UniversityMedicine (images) U. California BerkeleyEngineering University of MarylandK-12 education U. Texas at AustinPhysical anthropology
82
Computing and Information Technology Interactive Digital Educational Library (CITIDEL) Domain: computing / information technology Genre: one-stop-shopping for teachers & learners: courseware (CSTC, JERIC), leading DLs (ACM, IEEE-CS, DB&LP, CiteSeer), PlanetMath.org, NCSTRL (technical reports), … Submission & Collection: sub/partner collections www.citidel.org
83
www.CITIDEL.org Led by Virginia Tech, with co-PIs: Fox (director, DL systems) Lee (history) Perez (user interface, Spanish support) Partners College of New Jersey (Knox) Hofstra (Impagliazzo) Villanova (Cassel) Penn State (Giles)
84
Overview of CITIDEL architecture
85
Distributed repository structure
86
Digital library architecture for local and interoperable CITIDEL services
89
CITIDEL: Computing & Information Technology Interactive Digital Educational Library Web Page: www.citidel.org Future Developments Expanding further to cover Information Technology Advanced searching and additional services Expansion of the collection from many sources Collaborative Internationalization and Translation System Assessment and evaluation Workshops Technology Features Component architecture (Open Digital Library) Re-use and compose re-deployable digital library components. Built Using Open Standards & Technologies XSL and XML: Interface Rendering Perl: Component Integration ESSEX: Search Engine Functionality Open Archives Initiative Used to collect DL Resources and DL Interoperability User Features Very large collection Over 400,000 Resources from ACM, SIGCSE, JERIC, CSTC, NDLTD, NCSTRL, DBLP Filtered browsing and searching Filters based on these user-selected sub-communities. Also allows customization in addition to views of all results. Multiclassification browser Supports browsing based on curricula (familiar, professional society approved) in computing and related disciplines, as well as on classification schemes. Activity collection creation & tools Faculty and students can extract resource references from CITIDEL search collections into learning activity templates, for sharing and interchange (with versioning). VIADUCT assists in the development of a totally independent, self-generated, educational resource collection within CITIDEL. IA VT is based on Utah State’s Instructional Architect. 1. The core of CITIDEL is the collection data support. This consists centrally of a union catalog, metadata cache, semantic links table, integration tables, and more. 2. The harvesting system populates the union catalog and the secondary tables from the contents of remote digital library collections, over Open Archives. 3.For collections which lack an Open Archive provider, ad hoc importing facilities must be constructed. 4. CITIDEL serves up the contents of its union catalog via an Open Archives data provider, giving other digital libraries (NSDL) access to CITIDEL's metadata. 5. The application layer data support consists of non-content-related tables and personalization tables, such as a table of users and preferences. 6. The filtering system relies on extensive database support for speed. 7. The service modules tackle the DL features of search engine, recombination into annotated and enriched lists, creation of pedagogical activities utilizing DL resources, and posting messages to DL resources. 8. The CITIDEL application ties it all together in a single user interface. Most presentation (but not all) is handled here. Browsing and Searching with Filters Users are placed in chosen sub-communities. They can filter results based on these sub-communities. Also there is further customization. Alternatively, users may view all results. Users may set up multiple filters for simple or complex filtering based on many factors such as education level, role, resource type, language, source, and much more. This allows users to get exactly what they are or are not looking for in the digital library. At any time, users are free to disable these filters or see results excluded by them. Multiclassification Browser The multiclassication browser allows users to browse through the CITIDEL collections based on professional society approved curricula in computing as well as classification schemes. As users span many disciplines related to computing, the users may browse within the scheme with which they are most familiar. Resources are cross-classified wherever possible through these schemes. The current schemes include the 2001 ACM/IEEE-CS Computing Curricula, the 1998 ACM Computing Classification System, the Computing Research Repository Subject Areas, and the 2000 AMS Mathematics Subject Classification. Searching CITIDEL searching, which is driven by the ESSEX search engine for relevance computation, also provides a list of relevant categories within the classification schemes (see sidebar, left). CITIDEL Front Page
90
CITIDEL -> NSDL A collection project in the National STEM (science, technolgy, engineering, and mathematics) education Digital Library – NSDL -> LEARNS
91
National Science Digital Library (NSDL) Domain: undergraduate and K-12 education, etc. Genre: educational resources Submission & Collection: sites of 90 projects www.nsdl.org
92
Advancing Education Community Building Digital Libraries Educational Resources Sharing through supported by
93
NSDL Information Architecture Essentially as developed by the Technical Infrastructure Workgroup referenced items & collections referenced items & collections Special Databases NSDL Services NSDL Services Other NSDL Services CI Services annotation CI Services discussion CI Services personalization CI Services authentication CI Services browsing Core Services: information retrieval Core Collection- Building Services harvesting Core Collection- Building Services protocols Core Services: metadata gathering Portals & Clients Portals & Clients Portals & Clients Usage Enhancement Collection Building User Interfaces NSDL Collections NSDL Collections NSDL Collections Core NSDL “Bus”
94
“The network is the library.” A Learning Environments and Resources Network for SMET Education (LEARNS)
95
LEARNS Connects: Users: students, educators, life-long learners Content: structured learning materials; large real-time or archived datasets; audio, images, animations; primary sources; digital learning objects (e.g. applets); interactive (virtual, remote) laboratories;... Tools: search; refer; validate; integrate; create; customize; publish; share; notify; collaborate;...
96
LEARNS Supports: Users Content Tools (profiles) (metadata) (protocols) Learning communities Customizable collections Application services
97
LEARNS Enables: Environments for Communication Collaboration Creation Validation Evaluation Recognition... Discovery Stability Reliability Reusability Interoperability Customizability... of Resources AND
98
Goal Core Integration Track (FY00 pilots, FY01 full) Collections Track Services Track Targeted Research Track LEARNS operational by 2002
99
Expectations of NSDL ProgramTracks Core Integration: coordinate a distributed alliance of resource collection and service providers; and ensure reliable and extensible access to and usability of the resulting network of learning environments and resources Collections: aggregate and actively manage a subset of the digital library’s content within a coherent theme / specialty Services: increase the impact, reach, efficiency, and value of the digital library in its fully operational form Targeted (Applied) Research: have immediate impact on one or more of the other three tracks
100
Collections Discovery of content Classification and cataloguing Acquisition and/or linking; referencing Disciplinary-based themes define a natural body of content, but other possibilities are also encouraged Access to massive real-time or archived datasets Software tool suites for analysis, modeling, simulation, or visualization Reviewed commentary on learning materials and pedagogy
101
Services Help services, frequently asked questions, etc. Synchronous/asynchronous collaborative learning environments using shared resources Mechanisms for building personal annotated digital information spaces Reliability testing for applets or other digital learning objects Audio, image, and video search capability Metadata system translation Community feedback mechanisms
103
Outline Virginia Tech context Why DLs? What are DLs? (5S theory) Case Study: CSTC -> CITIDEL-> NSDL Case Study: NDLTD Accessibility and Visualization DL Software: MARIAN Interoperability: OAI, ODL Topical Outline Selected Links
104
A Digital Library Case Study Domain: graduate education, research Genre:ETDs=electronic theses & dissertations Submission: http://etd.vt.edu Collection: http://www.theses.org Project: Networked Digital Library of Theses & Dissertations (NDLTD) http://www.ndltd.org
105
Alphabet Soup - Factoring NDLTD = ND LTD (Paul Mather – from UK) NDLTD = NDL TD (Edie Rasmussen) (Later, Networked University Digital Library = NUDL
106
The Networked Digital Library of Theses and Dissertations www.NDLTD.org Leader of the Worldwide ETD (Electronic Thesis and Dissertation) Initiative Training Authors Expanding Access Preserving Knowledge Improving Graduate Education Enhancing Scholarly Communication Empowering Students & Universities
107
Grad Program IT Ed. (Tech) Library NDLTD
108
Media ETD Web Site http://www.ndltd.org/ ETDs Got Your Interest? Graduate Students Singapore AM Chronicle of Higher Ed. National Public Radio NY Times... U. Laval
109
Key Ideas: Networked infrastructure Scalability Education is the rationale University collaboration Workflow, automation Authors must submit Maximal Access PDF, SGML, MM, MARC, DC, URNs, Federated search Standards 8th graders vs. grads
110
What are the long term goals? 400K US students / year getting grad degrees are exposed / involved 200K/yr rich hypermedia ETDs that may turn into electronic portfolios (images, video, audio, …) Dramatic increase in knowledge sharing: literature reviews, bibliographies, … Services providing lifelong access for students: browse, search, prior searches, citation links Hundreds/thousands of downloads / year / work
111
ETDs: Library Goals Improve library services Better turn-around time Always available Reduce work catalog from e-text eliminate handling: mailing to UMI, bindery prep, check-out, check-in, reshelving, etc. Save space
112
Record all work with NDLTD, return to prior situation, prepare bibliography Powerful (multilingual, text, image) searching, browsing (with categories), following citation links Support collaboration with others in same field: help with literature review, sharing tools and data sets, applying their methods Grad Student Workstation?
113
Aiding universities to enhance grad educ., publishing and IPR efforts: to help improve the availability and content of theses and dissertations Educating ALL future scholars so they can publish electronically and effectively use digital libraries (i.e., are Information Literate and can be more expressive) Demonstrating how for other organizations What are we doing?
114
NDLTD Computer Resources Research Literature Student Prepares Thesis/Dissertation
115
Student Defends & Finalizes ETD My Thesis ETD
116
Student Gets Committee Signatures and Submits ETD Signed Grad School
117
Graduate School Approves ETD, Student is Graduated Ph.D.
118
Library Catalogs ETD, Access is Opened to the New Research WWW NDLTD
119
Available at VT Information http://scholar.lib.vt.edu/theses Automated submission system ready for customization http://scholar.lib.vt.edu/ETD-db/ Student guidelines, training materials, FAQ's, multimedia educational materials http://etd.vt.edu NDLTD: Network educational institutions Annual conferences: Berlin 2003, U of Kentucky 2004 http://www.ndltd.org
120
ETDs at Virginia Tech Partnership: Library, Graduate School, and Faculty Approved by university governance- Mar.1996 Full implementation- Jan.1997 Web submission Students: http://etd.vt.edu Programmers: http://scholar.lib.vt.edu/ETD-db/ Workshops for students (and faculty) Over 5000 ETDs approved
121
How are ETDs managed? Graduate student creates ETD Word processor, multimedia Saves as PDF, usually Graduate student submits ETD Directly to library server/permanent archive Archiving fee replaces binding fee Graduate School approves E-mails author, advisor, UMI (VT scripts) Authors/advisors prescribe Internet access Library catalogs and archives UMI downloads
122
Archiving ETDs Every 15 minutes back-ups made of not- yet-approved submissions Hourly back-ups of newly approved ETDs Weekly back-ups of entire ETD collection Copies stored on-site and off-site
123
VT ETD Cataloging same as current cataloging policies, except: author-assigned keywords (not LCSH) generic (not LC) call no. fields/subfields as required for computer files full abstracts time savings cataloger familiar with computer files equipment, software for word processing 5 minutes avg. (10-15 minutes for paper TDs)
124
Library Costs $12/vol. for paper thesis processing catalog, bind, security strip, label, shelve @950 vols./yr. = $11,466 $3.20/vol. ETD processing cataloging @950 vols./yr. = $3040 $.07/vol. shelving $.04/vol. circulation
125
Costs/Savings at VT Graduate School stopped shipping to the library 3000 copies of paper TDs/year Library stopped binding, shelving, and circulating 3000 copies of TDs/year 166 ft of shelf space saved/year by the library VT used existing equipment in Library (vs. start-up costs for staff, hardware and software from from a zero-base estimate: $65,000 – see http://scholar.lib.vt.edu/theses/)
126
Popular Works 1996 458 Seevers, Gary L. Identification of Criteria for Delivery of Theological Education Through Distance Education: An International Delphi Study (Ph.D., Educational Research and Evaluation, April 1993; 1353Kb) 432 Hohauser, Robyn Lisa. The Social Construction of Technology: The Case of LSD (MS in Science and Technology Studies, Feb. 1995; 244Kb) 390 Childress, Vincent William. The Effects of Technology Education, Science, and Mathematics Integration Upon Eighth Grader's Technological Problem-Solving Ability (Ph.D. in Vocational and Technical Education, July 1994; 285Kb) 310 Kuhn, William B. Design of Integrated, Low Power, Radio Receivers in BiCMOS Technologies (Ph.D. in Electrical Engineering, Dec. 1995; 2Mb) 287 Sprague, Milo D. A High Performance DSP Based System Architecture for Motor Drive Control ( MS in Electrical Engineering, May 1993; 878Kb) 165 Wallace, Richard A. Regional Differences in the Treatment of Karl Marx by the Founders of American Academic Sociology (MS in Sociology, Nov. 1993; 479Kb) 150 McKeel, Scott Andrew. Numerical Simulation of the Transition Region in Hypersonic Flow (Ph.D. in Aerospace Engineering, Feb. 1996; 3Mb)
127
Popular Works 1997 9920 Liu, Xiangdong. Analysis and Reduction of Moire Patterns in Scanned Halftone Pictures (Ph.D. in Computer Science, May 1996; 6.6Mb) 7656 Petrus, Paul. Novel Adaptive Array Algorithms and Their Impact on Cellular System Capacity (Ph.D. in Electrical Engineering, March 1997; 5Mb) 2781 Agnes, Gregory Stephen. Performance of Nonlinear Mechanical, Resonant-Shunted Piezoelectric, and Electronic Vibration Absorbers for Multi-Degree-of-Freedom Structures (Ph.D. in Engineering Mechanics, Sept. 1997; ? + 7926Kb) 2492 Gonzalez, Reinaldo J. Raman, Infrared, X-ray, and EELS Studies of Nanophase Titania (Ph.D. in Physics, July 1996; 4607Kb) 1877 Shih, Po-Jen. On-Line Consolidation of Thermoplastic Composites (Ph.D. in Engineering Mechanics, Feb. 1997; 3.3Mb) 1791 Saldanha, Kevin J. Performance Evaluation of DECT in Different Radio Environments (MS in Electrical Engineering, Aug. 1996; 3.2Mb) 1431 DeVaux, David. A Tutorial on Authorware (MS in CS, April 1996; 2.3Mb) 1394 Kuhn, William B. Design of Integrated, Low Power, Radio Receivers in BiCMOS Technologies (Ph.D. in Electrical Engineering, Dec. 1995; 2518Kb)
128
ETD Benefits: Low margin, high use Incorporate ETDs with other digital library activities Ejournals, online class materials, digital images, etc. Additional equipment, staff may not be necessary http://scholar.lib.vt.edu/theses/data/setup.html Use VT programs, scripts, etc. http://scholar.lib.vt.edu/ETD-db/ Online accesses vs. circulation of copies VT theses 1990-1994, combined average circulation per copy: 2.24/yr VT dissertations 1990-1994, combined average circulation per copy: 3.2/yr
129
Access to VT’s ETDs http://scholar.lib.vt.edu/theses/
130
Why are ETDs so popular? User surveys 67% found VT ETDs easily 61% found them by searching 22% browsed by department 16% browsed by author 53% downloaded 1 or more ETDs Author surveys Conversion and submission processes less difficult than anticipated Over half plan to publish articles from their ETDs Why did they restrict access? http://lumiere.lib.vt.edu/surveys/
131
http://scholar.lib.vt.edu/theses/available/etd-2227102539751141/
133
Brief History of ETD Meetings 1987 mtg in Ann Arbor: UMI, VT, … 1992 mtg in Washington: CNI, CGS, UMI, VT and 10 universities with 3 reps each 1993 mtg in Atlanta to start Monticello Electronic Library (regional, US Southeast): SURA, SOLINET 1994 mtg at VT: std: PDF + SGML + multimedia objects 1996 funding by SURA, US Dept. of Education (FIPSE) 1997 meetings in UK, Germany,... 1998 – 1 st symposium – Memphis (20) 1999 – 2 nd symposium – Blacksburg (70) 2000 – 3 rd symposium – St. Petersburg (225) 2001 – 4 th symposium – Caltech (200) 2002 – 5 th syposium – BYU, Provo, Utah 2003 – 6 th syposium – Berlin (215) 2004 – 7 th syposium – U. Kentucky 2005 – 8 th syposium – Sydney, Australia
134
NDLTD Membership As of 5/17/2003 there were at least: 176 members, including: 155 individual universities 6 consortia 21 institutional members
135
National / Regional Projects Australia U. New South Wales (lead) U. of Melbourne U. of Queensland U. of Sydney Australian National U. Curtin U. of Technology Griffith U. Belgium Brazil Germany Humboldt University (lead) 3 other universities 5 learned societies: Math, Physics, Chemistry, Sociology, Education 1 computing center 2 major libraries India Lithuania Spain: Consorci de Biblioteques Universitàries de Catalunya, as group, www.cbuc.es: 9 sites Sudan UK (British Library, JISC, Edinburgh) UNESCO (especially Latin America, Eastern Europe, Africa) USA: CIC (“Big 10”) Ohio: OhioLINK: 79 colleges/univs SOLINET …
136
OhioLINK Statewide Consortium Represents 79 colleges, universities, libraries Public Universities Private Universities and Colleges 2-Year Colleges Only a few (e.g., Miami U. of Ohio) are also NDLTD members on their own
137
US University Members Air University (Alabama) Baylor University Boston University Brigham Young University Caltech Clemson University College of William & Mary Concordia University (Illinois) Drexel University – required 4/2002 East Carolina University East Tenn. State U. – required 1/2001 Florida Institute of Technology Florida International University Florida State University Florida Tech George Washington University Georgetown University Johns Hopkins University Louisiana State University – required 1/2002 Marshall University (W. Va.) Miami University of Ohio Michigan Tech Mississippi State University MIT Montana State University Naval Postgraduate School (CA) New Jersey Inst. of Technology New Mexico Tech North Carolina State University – required 9/2002 Northwestern University Penn. State University Regis University Rochester Institute of Tech. Texas A&M U. of Central Florida U. of Colorado Health Science Center U. of Florida – required 8/2001 U. of Georgia – required 9/2001 U. of Hawaii, Manoa U. of Illinois, Urbana-Champaign U. of Iowa U. of Kentucky – required in CS only U. of Maine – required in CS, Spatial Info Sci/Eng U. of Missouri-Columbia U. of North Texas – required since 8/99 U. of Oklahoma U. of Nevada, Las Vegas U. of New Orleans U. of North Texas – required 8/1999 U. of Oklahoma U. of Pittsburgh U. of Rochester U. of South Florida – required 8/2002 U. of Tennessee, Knoxville U. of Tennessee, Memphis U. of Texas at Austin – required 6/2001 U. of Virginia – required 1/2003 U. of West Florida U. of Wisconsin - Madison – part reqt 12/1999 Vanderbilt U. Virginia Commonwealth U. Virginia Tech - required 1/97 Wake Forest U. West Virginia U. - required 8/1998 Western Kentucky U. – required 9/2004 Western Michigan U. Worcester Polytechnic Inst. – required 7/2002 Yale U.
138
Other Countries (selected) Australia Belgium Brazil Canada Chile China, Hong Kong Columbia Finland France Germany Greece India Italy Jamaica Korea Lithuania Mexico Netherland Norway Poland Russia Singapore S. Africa S. Korea Spain Sudan Sweden Taiwan Thailand UK Venezuela
139
Institutional Members Australian Digital Theses Program British Library Cinemedia Coalition for Networked Information (CNI) Committee on Institutional Cooperation (CIC) Consorci de Biblioteques Universitàries de Catalunya Diplomica.com Dissertation.com Dissertationen Online (Germany) ETDweb, a Division of Answer4.com Ibero-American Science & Technology Education Consortium (ISTEC) MathDISS International National Documentation Centre (NDC), Greece National Library of Canada National Library of Portugal OCLC Online Computer Library Center Office of Scientific and Technical Info (US Dept of Energy) OhioLINK Organization of American States (SEDI/OAS) Southeastern Library Network (SOLINET) Sudanese National Electronic Library UNESCO (www.unesco.org/webworld/etd)
140
UNESCO and ETDs Promoting the use of the Internet as a tool for disseminating scientific knowledge Facilitating the transfer of ETD expertise from developed to developing countries 1998: Member of the NDLTD Steering Committee 1999: First UNESCO ETD meeting on ETD internationalisation 2002: “ UNESCO Guide to Electronic Theses and Dissertations ” 2003: Model training programmes and training courses 2003: Sponsor pilot projects 2003: Pilot projects (Africa, Europe, Latin-America)
141
For professional societies Like “writing across the curriculum”, e.g., Chemical Markup Language, MathML, … Besides writing: computing/communications, information literacy, personal digital library management, tool use, research methods, collaboration, archiving/preservation Data sets, communities of users of them Classification systems / browsing / searching NRC’s “Issues for Science and Engineering Researchers in the Digital Age”, 57 pages
142
Relationship with publishers Concern of faculty and students that still wish to publish books or journal articles, voiced: campus, Chronicle, NPR, Times Solution: Approval Form gives students, faculty choices on access, when to change access condition; use IPR controls in DL Solution: by case, work with publishers and publisher associations to increase access AAP, AAUP AAAS, ACM, ACS, Elsevier,...
143
Some responses from publishers ACM: need to acknowledge copyright Elsevier: need to acknowledge copyright IEEE-CS: endorse initiative ACS: After first publication, can release Textbook publishers: different market, manuscript significantly reworked General: restricting access to local campus will not cause any problems
144
How does this relate to ProQuest/UMI? Generally, they are independent decisions. 1987 UMI workshop was first to explore ETDs. UMI wrote support letter for US Dept. of Ed. proposal. UMI is on Board of Directors (formerly Steering Committee). ProQuest Direct pilot of scanning works started 1/1/97, with free 2 yr access to front part. We are collaborating on: accepting electronic author submissions standards (e.g., representation)
145
ETD Initiative (and UMI) Students Learn about DL, EPub TDs become more expressive N. Amer. (T)Ds are accessible, archived Global TDs become more accessible, archived UMI Universities
146
User Search Support (multilingual, XML) Note: All groups shown are connected with NDLTD.
147
www.theses.org James Powell student project, D-Lib Magazine description in Sept. 1998 XML description of each site type of search engine / service language coverage (for resource discovery) Adding Z39.50 gateway capability and integrating with MARIAN, along with Harvest and Open Archives protocols
148
Access Approaches Goal: Maximize access and services, e.g., by encouraging: UMI centralized services VTLS: free union collection of ETD vmetadata OCLC: free union collection of TD metadata Distributed service: Dienst, Z39.50 Regional services (e.g., OhioLinkh) Local servers with browse, search From local catalogs to local archives WWW robot indexing and search services
149
Access Possibilities Web search engines library catalog clients www. theses. org www. openarchives. org 3 rd Party Services (e.g., UMI) Virginia Tech National Library of Portugal CBUC (Spain) Ohio Link MITNational Projects: AU, GE, …
150
Why might a university want to be involved? To improve graduate education / better prepare your students / increase their knowledge (epub, DLs, IPR) and visibility To enhance university infrastructure (DL) To unlock university information To save money for students and for the university / improve workflow To build an important digital library (of ETDs)
151
NDLTD Members and ETD-MS NDLTD members will Share metadata for their ETDs Providing that in either ETD-MS Or if they use a version of MARC locally, work to have that eventually shared in either MARC21 or UNIMARC Run OAI, either locally or in consortia, so their metadata can be harvested, according to necessary terms and conditions
152
Complex to Simple MARC ($50)Dublin Core (DC) + thesis
153
ETD-MS ETD Metadata Standard XML-encoded metadata standard (content and encoding) for Electronic Theses and Dissertations (ETDs) in part conforming to Dublin Core (DC) using RDF using UNICODE Will specify relationship with MARC
154
ETD-MS Schema Includes Elements not in dces (Dublin Core Element Set) e.g., thesis.degree Elements with wildly divergent semantics e.g., thesis.advisor rather than dc.contributor Relationships to other elements Controlled vocabularies e.g., {Bachelors, Masters, Doctorate, Other} for thesis.degree.level Labels in multiple languages
155
ETD Encoding Decisions Text UNICODE (with language identifiers) Structure MARC (MARC-21 or UNIMARC) PLUS XML / RDF / DC + ETD Multimedia Following international standards Other schemes may not be amenable to preservation
156
RDF for ETDs WWW Consortium (W3C)’s RDF: Resource Description Framework NUDL ETD metadata realized as an RDF application profile Specifying elements from DC element set Plus new elements from a registered ETD schema Constraints & policies attached to both (e.g., “Full title,” “Name as it appears on title page,” “Repeatable”) Links to authority records encoded as URIs XML syntax as per RDF standard
157
OCLC and ETD-MS Identify TDs in WorldCat (4.3M) Through OAI make available metadata for WorldCat TDs in both DC and ETD-MS Provide an authority service for personal names for NDLTD Coordinate with other authority services such as LC
158
VTLS and ETD-MS Support NDLTD through a union catalog service implemented with Virtua Accept metadata in MARC21 or UNIMARC, and help identify other converters for other types Accept metadata in one other format, namely ETD-MS, collected using OAI (harvesting) Accept data in various character sets, with UNICODE preferred, but in some cases the submitter may be required to convert
159
Union Catalog (with Vinod Chachra, Thom Hickey)
160
NDLTD Union Catalog Statistics 1. Participating Countries So far ETDs from 7 countries are included in the database. Canada Germany Greece Korea Portugal Spain U.S. UK to be added by June 30, 2002. Brazil to be added soon.
161
NDLTD Union Catalog Statistics 2. Interface Languages in Union Catalog The language here is the language of the interface The VTLS NDLTD Union Catalog has 14 languages: English, Arabic, Catalan, Chinese French, German, Hebrew, Korean Polish, Portuguese, Russian, Slovak Spanish and Swedish Example follows
162
German
163
NDLTD Union Catalog Statistics 3. Languages in the Union Catalog The language here is the language of the content of ETD The VTLS NDLTD Union Catalog has data in 6 different languages. These are: English German Greek Korean Portuguese Spanish Examples follow
164
Language = German; hits = 137
165
Full record display
166
Language = Greek
167
In Greek In English
168
Union Catalog Creation
169
NDLTD Union Catalog Architecture TD OAI Repository ETD OAI Repository WorldCat VT ODL Demo Search/Browse Virtua Union Catalog email FTP OAI-PMH 20+ sites OCLC VTLS SRU/SRW (search) Try: Z39.50 harvest
170
OCLC Capabilities Harvesting OAI-PMH versions 1.1 and 2.0 Harvestable sets Sets by institution Searching SRU (Z39.50 on the Web) VTLS Virginia Tech Open Digital Library demo Unicode support
171
OCLC Statistics 19 Sources 61,998 records Probably some overlap Adding 1-2 new sites/month
172
Multiple objectives Sharing research results Decrease costs, increase services Increase knowledge of users Adding to author knowledge/skills Epub, DL, IPR Enhancing organization’s infrastructure CS department, library University, Laboratory
173
Some Barriers at Universities Lethargy; Not invented here (esp. large univ’s) Anger with unfunded, added, required work Last straw: using more frustrating technology Lack of experience in working together: graduate school, library, computing staff Lack of interest in (quality of) student work More loyalty to discipline than to campus Unwillingness to accept responsibility for $ problems with libraries, publishers
174
How can a university get involved? Select planning/implementation team Graduate School Library Computing / Information Technology Institutional Research / Educ. Tech. Fill in online form, giving us contact names www.ndltd.org/join Adapt Virginia Tech (or other) solution Build interest and consensus Start trial / allow optional submission
175
Contact Our Project Team E-mail etd@ndltd.org Phone Call Visit Video Tape
176
Convene Local Planning Group ETD
177
Build Local ETD Site Digital Library Policies Inspection/Approval Workshop/Training ETD
178
Support Offered Software, documentation, tech support Email, listservs (etd-l@listserv.vt.edu) UNESCO training, Guide (www.etdguide.org) NDLTD Committees Conference Membership Software Distribution Standards
179
Why ETD? Short Answer For Students: Gain knowledge and skills for the Information Age Richer communication (digital information, multimedia, …) For Universities: Easy way to enter the digital library field and benefit thereby For the World: Global digital library – large, useful, many services General: Save time and money Increased visibility for all associated with research results
180
The Process? Short Answer For Students: Plan on ETD from day 1 Secure knowledge from: workshops, online info, colleagues Work with faculty to plan approach PDF? XML? TEI? Multi/hypermedia? Data sets? Viz? Get signed approval form: access, ©, proxy assignment After defense and approval, submit ETD to university For Universities: Form team Adapt solution from work at other universities, attend ETD conference Pilot -> Option -> Requirement
181
Future Work - 1 of 2 Working with publishers to increase level of access as much as possible -> joint awards Interoperability tests to provide integrated services Study with testbed that emerges, to improve information retrieval, browsing, interface, and other types of user support Evaluation, improving learning experience, spread further as worldwide initiative, sustainable support and coordination
182
Future Work - 2 of 2 Adding services currently prototyped annotation and SDI (routing) capabilities fulltext search, crawling Adding other services planned building and using citation database (w. SFX) implementing plagiarism check (like “SCAM”) Further development of NDLTD Inc. as nonprofit charitable educational institution promoting education and digital libraries
183
Spirit of NDLTD Help make a better (smaller) world Win-win-win (everyone can benefit) Have fun helping others Helpers/teachers learn more than those they work with Cooperation, friendly competition When you “1-up” VT, share your software, documents! “Doing better” requires both “doing”, “better” Balance (and build on standards) New, popular, powerful, expressive, exciting, “better” Doable, feasible, learnable, affordable, sharable, preservable We can always do more, enhancing quality and knowledge!
184
Outline Virginia Tech context Why DLs? What are DLs? (5S theory) Case Study: CSTC -> CITIDEL-> NSDL Case Study: NDLTD Accessibility and Visualization DL Software: MARIAN Interoperability: OAI, ODL Topical Outline Selected Links
185
Portals and DLs Reengineering PhysNet in the uPortal framework by Ye Zhou (Dept. of Computer Science, MS thesis, Virginia Tech, May 2003). Hypothesis: DLs can be modeled as a set of interactive and non- interactive components with well-defined inter- component communication protocols. Offering a customizable User Interface (UI) toolkit can facilitate the process to build a DL. Distributed DL services can be achieved with the enablement of a web service on each individual component. To prove the hypothesis above, we designed, implemented, and tested a framework in a portal reengineering project.
186
PhysNet Re-engineering
187
PhysNet Re-Engineering Screen shot
188
New Services: PACS Recommender
189
Browse user interface with uPortal
190
Browse user interface with uPortal – cont’d
191
uPortal project, http://www.udel.edu/uportal/
192
Weiner, K., “Introduction to uPortal 2.1”, JA-SIG Conference, Dec. 2002
193
Accessibility Activities / Plans Interface design (simple, 3D, VR) Usability studies Generic multi-lingual support Support for those with disabilities Hybrid collection (paper, MARC, abstracts, full-text, multimedia) Disciplinary classifications, tools Visualization of results, collection
194
CAVE Experiments Use a familiar metaphor building / floor / room / shelf / book Rearrange orderings / shelving use categories, clustering, ranking use visualization: colors and gaps study space mappings: physical, logical Simplify movement for key tasks
196
CAVE-ETD CAVE-ETD is a simulation of a library that runs in a CAVE (VR environment). Populated with a subset of ETD records. Main Foyer room
197
Book Browsing
198
Reading Book Abstract
199
ENVISION NSF “A User-Centered Database from the Computer Science Literature” (1991-93) Collected bib/typesetter data, converted to SGML Scanned thousands of page images MARIAN search engine - can be made available (also applied to the Virginia Tech library catalog) used as part of a prototype object-based DL, with tailored visualization interface (L. Nowell dissertation)
200
Envision Results Window
205
Envision – New Version
206
Envision – New Versions - Clusters
207
SPIRE Visualization
208
VIDI: A Lightweight Protocol Between Visualization Systems and Digital Libraries Jun Wang Virginia Tech CS MS Thesis Spring 2002
209
Problem Concerned Scenario DL 1 DL 2 DL 3 VIS 1 VIS 2
210
VIDI Protocol Design Features Enabling interoperability Lightweight Extended OAI Protocol Flexible implementations enabled General XML, HTTP Standard time formats Dual usage of commands Simple and Easy!
211
VIDI Protocol Request Verbs Identify (DL, VIS) ListMetadataFormats (DL) ListVisdataFormats (VIS) ListTransformers (VIS) RequestResultSet (DL)
212
Extend OAI Protocol OAIOAI & VIDIVIDI GetRecord ListIdentifiers ListRecords ListSets Identify ListMetadataFormats ListVisdataFormats ListTransformers RequestResultSet
213
Implementation Roles and Times Implementing protocol Devising general approaches to protocol use for DL-VIS environments Applying protocol in representative cases ENVISION-ODL ENVISION-MARIAN
214
Implementation Process 1.Analyze metadata format in DL 2.Analyze visdata format in VIS 3.Write transformer (if not in registry) 4.Decide on command flow 5.Implement protocol commands
215
Command Flow Used In Prototype <back
216
ENVISION-ODL (II) Connect ENVISION with: ODL A DL implementing OAI protocol, which means we can issue OAI requests and receive responses to retrieve the data
217
ENVISION-MARIAN Connect ENVISION with: MARIAN A DL having multiple collections (NDLTD, DIRLINE, CITIDEL, VT catalog,…) User Authentication
218
Future Work SOM decoupling DLVIS Transformer
219
Outline Virginia Tech context Why DLs? What are DLs? (5S theory) Case Study: CSTC -> CITIDEL-> NSDL Case Study: NDLTD Accessibility and Visualization DL Software: MARIAN Interoperability: OAI, ODL Topical Outline Selected Links
220
MARIAN Multiple Access Retrieval of Information with Annotations (Marian the Librarian …) Evolved from CODER system to a distributed Online Public Access Catalog (OPAC), then DL backend, now becoming a full DL system From C/C++ to Java Future: NDLTD, NUDL, PetaPlex Use for campus collection management Use for www.theses.org as centralized system with gateway services: OAi, Harvest, Z39.50, …
221
MARIAN Digital Library Search & Retrieval System Principles Network representation Class-based retrieval Weight-valued functions and weighted sets Interoperability System: wrappers and harvesting Syntax: OAI standards (XML, Unicode, …) Structure: information networks Semantics: class-based retrieval : collection views
222
MARIAN Layers Database Layer Search Engine Layer User Information Layer User Interface Layer User
223
MARIAN Architecture
224
System & Syntactic Interoperability
225
MARIAN – Part of Class Hierarchy
226
Structural Interoperability through Information Networks
227
PhysDis Collection View
229
MARIAN Parallelism
236
Outline Virginia Tech context Why DLs? What are DLs? (5S theory) Case Study: CSTC -> CITIDEL-> NSDL Case Study: NDLTD Accessibility and Visualization DL Software: MARIAN Interoperability: OAI, ODL Topical Outline Selected Links
237
Open Archives Initiative OAI www.openarchives.org openarchives@openarchives.org
238
OAi Philosophy Self-archiving = submission mechanism Long-term storage system = archive Open interface = harvesting mechanism Data provider + service provider Start with “gray literature” e-prints/pre-prints, reports, dissertations, …
239
Open Archives Initiative (OAI) xxx@LANL, high-energy physics (Ginsparg, 1991) CSTR + WATERS = NCSTRL (Lagoze,1994) xxx + NCSTRL = CoRR collaboration (1998) Universal Preprint Service protoproto, Oct. 21-22, 1999, Santa Fe – led by LANL, CNI, DLF, Mellon --> OAi Santa Fe Convention (see Feb. D-Lib Magazine article) Follow-on mtgs: 6/3@San Antonio, 9/21@Lisbon (ECDL) Archives -> Open Archives Support unique archive identifiers Implement Open Archives metadata set (DC, using XML) Implement OA harvesting protocol (derived from Dienst protocol) Register the archive Build tools, layer other services: linking, searching, …
240
Open Archives (protoproto) ArXiv & Los Alamos National Lab CogPrints & U. Southampton NACA & NASA (reports) NCSTRL & Cornell U. NDLTD & Virginia Tech RePEc & U. Surrey Total of around 200K records
241
Original Open Archives Members American Physical Society California Digital Library Caltech Coalition for Networked Info. Cornell University Harvard University Library of Congress Los Alamos Nat’l Lab Mellon Foundation NASA Langley Research Cntr Old Dominion University Stanford University U. of Ghent U. of Surrey U. of Southampton Vanderbilt University Virginia Tech Washington University
242
Open Archives Future EconWPA (U. Washington) e-biomed -> PubMed Central (NIH) PubScience (DOE) Clinical Medicine Netprints (+ other HighWire Press holdings ) University ePub (California Digital Library) All public e-prints (MIT) Scholar’s Forum (Caltech) Int’l: CERN, Germany, India, Mexico, … Goal: millions of books/articles/reports / yr
243
Harvesting vs. Federation Competing approaches to interoperability Federation is when services are run remotely on remote data (e.g. Federated searching) Harvesting is when data/metadata is transferred from the remote source to the destination where the services are located (e.g. Union catalogues) Federation requires more effort at each remote source but is easier for the local system and vice versa for harvesting OAI currently focuses on harvesting
244
Metadata vs. Data Data refers to digital objects or digital representations of objects Metadata is information about the objects (e.g. title, author, etc.) OAI focuses on metadata, with the implicit understanding that metadata usually contains useful links to the source digital objects
245
Technical Umbrella for Practical Interoperability… Reference Libraries Publishers E-Print Archives …that can be exploited by different communities Museums
246
OAI – Repository Perspective Required: Protocol DO MDO
247
OAI – Black Box Perspective OA 1OA 2OA 4OA 3OA 5OA 6OA 7
248
OAI – Black Box Perspective OA 1OA 2OA 4OA 3OA 5OA 6OA 7 BrowseSummarizeSearchVisualize DO Services: Docs: Metadata:
249
Aggregation through OAI Harvesting ArchiveLite SitesNCSTRLEprints IEEE-CS, ACM, … Own: History, ResearchIndex, CSTC, … CITIDELActive
250
Tiered Model of Interoperability Mediator services Metadata harvesting Document models
251
Repository of Digital Objects Repository Access Protocol handle Digital object terms and conditions
252
Approaches to Open Archives Build By Discipline Build By Institution
253
Approaches to Open Archives Build By Discipline Build By Institution Author Category Interdisciplinary Year Language Query …
256
Author′s tools www.physik.uni-oldenburg.de/EPS/mmm
258
Discovery Current Awareness Preservation Service Providers Data Providers Metadata harvesting The World According to OAI
260
Mechanisms Sharing Join federation, run software Make metadata and archive available Aggregating By discipline By institution By genre Automating Workflow Harvesting and providing services Federated searching Dynamic linking (e.g., with SFX (OpenURLs))
261
VT View of the Open Archives Initiative (OAI) Enable sharing of publication metadata and full- text by digital libraries Standardize low-level mechanisms to share contents of libraries Build higher-level user-centric and administrative services in meta-libraries Install organizational mechanisms to support the technical processes Insights from 5S (streams, structures, scenarios)
262
Virginia Tech Projects MARC XML-DTD Computer Science Teaching Centre (CSTC) W3C Web Characterization Repository OAI Repository Explorer NDLTD Open Digital Libraries, XOAI-PMH
263
MARC XML-DTD XML Transport format for US-MARC records Standardized metadata exchange format for traditional library services joining OAI
264
Protocol for Metadata Harvesting Service Requests Identify ListMetadataFormats ListSets GetRecord ListIdentifiers ListRecords Metadata Multiplicity Date/Time Ranges Sets (with semantics depending on local data providers) Resumption Tokens
266
Key Features of the OAI Metadata Harvesting Protocol definitions & concepts repository record identifier datestamp set protocol features HTTP encoding metadata prefix & schema flow control protocol requests supporting requests harvesting requests
267
repository repositoryrepository OAI protocol harvesterharvester support data harvesting data items
268
identifiers oai-identifier = oai:archive-identifier:record-identifier Registered URI Scheme Archive Identifier: Registered within OAI Unique ID within archive: (syntax is archive- specific) example = oai:ncstrl:ncstrl.cornellcs/TR94-1418 locally unique key for extracting a record from a repository
269
selective harvesting - datestamps repositoryrepository harvest within date range record
270
selective harvesting - sets repositoryrepository harvest within set S1 record S2
274
OAI Tools Related resources, e.g., XML, Unicode Servers and utilities, e.g., ARC, Kepler, EPrints XML Schema Validator Repository Explorer Interactive Browsing Testing of parameters Multiple views of data Multilingual support Automatic test suite
275
ARC (arc.cs.odu.edu)
277
Kepler Architecture
278
OAI-based NCSTRL architecture
281
XSV Schema Validator
282
OAI Repository Explorer Serves as a compliancy test Allows browsing of open archives using only OAI protocol Sends requests on behalf of user, parses and checks responses and displays browsable interface Will detect most discrepancies in protocol http://purl.org/net/explorer
283
RE 1.3
284
OAI Repository Explorer Serves as a compliancy test Allows browsing of open archives using only OAI protocol Sends requests on behalf of user, parses and checks responses and displays browsable interface Will detect most discrepancies in protocol http://purl.org/net/explorer
285
Request, Response – OAI, VT ETDs
286
Case Study: NCSTRL Costs/Benefits StakeholdersSample Potential CostSample Potential Benefit ProvidersFacultyLower value for P&TFaster publishing StudentsLess recognitionBroader set of outlets PractitionersLimited relevanceEase of publishing, > quantity UsersFacultyLower quality of workBroader access to resources StudentsHigher access costs (vs. department available material) Lower access costs (vs. journal available material) DepartmentsNew maintenance costsBroader visibility University librariesAdditional access costsAccess to new resources PractitionersMore difficult accessAccess to new resources
287
The OAI Static Repository Model Components of the model The static repository An well-defined structure XML file with information similar to that in OAI-PMH responses Accessible at a persistent network-location The static repository gateway makes one or more Static Repositories harvestable. assigns a unique base URL to each such Static Repository Responding to OAI-PMH requests
288
The OAI Static Repository Model
289
DL Components User Interfaces Workflow Mgr DBMS Search Engines, Classifiers, … Data, MM Info Gateways Repository Rights Mgr MM/ HT Renderer
290
Open Digital Library (ODL) Hypothesis (Hussein Suleman) Can we leverage the successful model of the OAI Protocol for Metadata Harvesting to alleviate our architectural problems ? Maybe … if Digital Libraries can be modeled as networks of extended Open Archives, where each extended Open Archive is a source of data and/or a provider of services.
291
Open Digital Libraries XOAI-PMH Dissertation work of Hussein Suleman (member of OAI technical committee) Extending the OAI protocol Supporting rapid development of DLs using networks of components Demonstrated with NDLTD, CSTC Described in Dec. 2001 D-Lib Magazine article, and article submitted for publication
292
1010100101 0100101010 1001010101 0101010101 Program 1010100101 0100101010 1001010101 0101010101 Document 1010100101 0100101010 1001010101 0101010101 Document 1010100101 0100101010 1001010101 0101010101 Document 1010100101 0100101010 1001010101 0101010101 Program 1010100101 0100101010 1001010101 0101010101 Program 1010100101 0100101010 1001010101 0101010101 Image 1010100101 0100101010 1001010101 0101010101 Image 1010100101 0100101010 1001010101 0101010101 Image 1010100101 0100101010 1001010101 0101010101 Video 1010100101 0100101010 1001010101 0101010101 Video 1010100101 0100101010 1001010101 0101010101 Video usersdigital objects ?
293
1010100101 0100101010 1001010101 0101010101 Program 1010100101 0100101010 1001010101 0101010101 Document 1010100101 0100101010 1001010101 0101010101 Document 1010100101 0100101010 1001010101 0101010101 Document 1010100101 0100101010 1001010101 0101010101 Program 1010100101 0100101010 1001010101 0101010101 Program 1010100101 0100101010 1001010101 0101010101 Image 1010100101 0100101010 1001010101 0101010101 Image 1010100101 0100101010 1001010101 0101010101 Image 1010100101 0100101010 1001010101 0101010101 Video 1010100101 0100101010 1001010101 0101010101 Video 1010100101 0100101010 1001010101 0101010101 Video componentized digital library ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
294
1010100101 0100101010 1001010101 0101010101 Program 1010100101 0100101010 1001010101 0101010101 Document 1010100101 0100101010 1001010101 0101010101 Document 1010100101 0100101010 1001010101 0101010101 Document 1010100101 0100101010 1001010101 0101010101 Program 1010100101 0100101010 1001010101 0101010101 Program 1010100101 0100101010 1001010101 0101010101 Image 1010100101 0100101010 1001010101 0101010101 Image 1010100101 0100101010 1001010101 0101010101 Image 1010100101 0100101010 1001010101 0101010101 Video 1010100101 0100101010 1001010101 0101010101 Video 1010100101 0100101010 1001010101 0101010101 Video open digital library OA PMH XPMH
295
Component System Approach (Open) DL = Network of Extended OAs Local Archive Data Input Remote Archive Browse Metadata Repository SearchRecommend Resource Discovery User Interface OAI/ODL archive OAI/ODL protocol legend
296
Example Architecture (NDLTD) Humboldt Duisburg MIT Filter MIT Browse Union Catalog SearchRecent User Interface OAI/ODL archive OAI/ODL protocol legend Virginia Tech PhysNet CalTech Dresden
297
ODL Demonstration - FrontPage
298
ODL Demonstration - Search
299
ODL Demonstration - Browse
300
ODL Component Requirements Search Retrieve a list of items Index new items Annotate Add annotation to item Retrieve a list of annotations for an item
301
Open Digital Library Components Running now XML-File (data provider from file system) Union, search, browse, recent, filter E-journal/review, Submit, Edit, Annotation Class projects High performance multilingual search Recommender, Rating; Mirroring (see JCDL’02) Working with NCSA: from DB, unstructured text Others discussed Classification/categorization DL-Viz interconnection (VIDI – Jun Wang ETD)
302
Harvest from data providers DBUnion Archive Merger Component DBBrowse Browse Engine IRDB-1 Search Engine As Metadata Search Service Provider As Metadata Browse Service Provider XML File Coll. & Data Provider 1 XML File Coll. & Data Provider 2 XML File Coll. & Data Provider 3 Open Digital Library: Extended What’s New Engine As What’s New Service Provider OAI-PMH Data Provider Submit Archive OAIB (NCSA: from RDBMS) Filter Recommend Rate Engine Annotation Engine IRDB-2 Search Engine As Annotation Search Service Provider As Recommend & Rate Service Provider
303
1010100101 0100101010 1001010101 0101010101 Program 1010100101 0100101010 1001010101 0101010101 Document 1010100101 0100101010 1001010101 0101010101 Document 1010100101 0100101010 1001010101 0101010101 ETD-1 1010100101 0100101010 1001010101 0101010101 Program 1010100101 0100101010 1001010101 0101010101 ETD-2 1010100101 0100101010 1001010101 0101010101 Image 1010100101 0100101010 1001010101 0101010101 Image 1010100101 0100101010 1001010101 0101010101 ETD-3 1010100101 0100101010 1001010101 0101010101 Video 1010100101 0100101010 1001010101 0101010101 Video 1010100101 0100101010 1001010101 0101010101 ETD-4 Digital Library for the Networked Digital Library of Theses and Dissertations (www.ndltd.org) Search Filter Union Recent Browse PMH ODLRecent ODLBrowse ODLUnion ODLSearch ODLUnion PMH USER INTERFACE Students and researchers ETD collections Example Open Digital Library
304
Digital Library for the Computer Science Teaching Center (www.cstc.org)
305
Digital Library in a Box Domain: helping DL projects Genre: any domain, but especially those involved in NSDL (since funded in part is through NSDL – with U. FL, NCSA) Software and Documentation: http://dlbox.nudl.org
306
Outline Virginia Tech context Why DLs? What are DLs? (5S theory) Case Study: CSTC -> CITIDEL-> NSDL Case Study: NDLTD Accessibility and Visualization DL Software: MARIAN Interoperability: OAI, ODL Topical Outline Selected Links
307
Topical Outline: Digital Library Courseware http://ei.cs.vt.edu/~dlib/ WWW pages or large PDF copy files Online quizzes based on book by Michael Lesk (Morgan Kaufmann Publishers) Contents based on book, with several other popular topics added (e.g., agents) Separate pages to supplement: Definitions, Resources (People, Projects), and References
308
Topical Outline - Foundations Early visions Definitions Resources References Projects
309
Topical Outline - Foundations Early visions: “a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility” - Bush, V., “As We May Think”, Atlantic Monthly, 176(1):101-108, Jul. 1945
310
Topical Outline - Foundations Definitions: A digital Library (DL) can be described as an electronic information storage system focused on meeting the information seeking needs of its users. - Levy, D., and Marshall, C. C., “Going Digital: A Look at Assumptions Underlying Digital Libraries”, Communications of the ACM, 38(4):78-84, 1995.
311
Topical Outline - Foundations Definitions: Association for Research Libraries, http://sunsite.berkeley.edu/ARL/definition.html: The digital library is not a single entity. The digital library requires technology to link the resources of many. The linkages between the many digital libraries and information services are transparent to the end users. Universal access to digital libraries and information services is a goal.
312
Topical Outline - Foundations Definitions: Digital libraries are organizations that provide the resources, including the specialized staff, to select, structure, offer intellectual access to, interpret, distribute, preserve the integrity of, and ensure the persistence over time of collections of digital works so that they are readily and economically available for use by a defined community or set of communities - Digital Library Federation, “A Working Definition of Digital Library”, Apr. 1999. http://www.clir.org/diglib/dldefinition.htm
313
Topical Outline - Foundations Definitions: A library that maintains all, or a substantial part, of its collection in computer accessible form as an alternative, supplement, or complement to the conventional printed and microfilm materials that currently dominate library collections. Used in this context, the term "collection" denotes the documents that a library acquires or maintains - Saffady, W., “Digital Library Concepts and Technologies for the Management of Library Collections: An Analysis of Methods and Costs”, Library Technology Reports 31.3:221-380, 1995
314
Topical Outline – IR Areas Search, Retrieval, Resource Discovery Information storage and retrieval Boolean vs. natural language Search engines Indexing, phrases, thesauri, concepts Federated search and harvesting, OAI Integrating links and ratings Crawlers, spiders, metasearch, fusion Details following – Li Wang indep. study
315
Logical View of Document + Indexing from Baeza-Yates, R., and Ribeiro-Neto, B., Modern Information Retrieval, Addison Wesley, 1999
316
Retrieval Process from Baeza-Yates, R., and Ribeiro-Neto, B., Modern Information Retrieval, Addison Wesley, 1999
317
PACS Automatic Classification Classifier Trained Model Classification Scheme Classifier Trained Model Classification Scheme Selector J2EE Server Container Classification Servlet
318
PACS Automatic Classification Online
319
What is a Crawler? A Program An Important Module For Web Search Engine Crawls On The Web According To Its Algorithm Retrieves Web Pages Gets Useful Information Stores The Web Pages For Future Refining
320
Jobs For Threads Get A New URL From Buffer Contact The Server For File Type Download The File Parse The Web Page Put New URLs Into Buffer
321
Advanced Functions Backward Linkage Information Collector A Web Page
322
Topical Outline - Multimedia Multiple media types, representations Text, audio, image, video, graphics, animation Capture, digitization, standards, interchange Compression, content-based retrieval Playback (Real), SMIL, QoS JPEG, MPEG (and versions)
323
Document Models, Representations, and Accesses Doc = stream + structure + use-scenario; hybrid (paper/electronic), digital only Multilingual: content, summary, metadata Multimedia: structure, quality (oS), search Structured: MARC, SGML, by user: MVD Distributed collection: Kleisli, CIMI, Z39.50 Federated search: collecting, picking site(s), parallel search / fall-back, fusing results Access: IPR, payment, security, scenarios
324
Topical Outline - Architectures Distributed, centralized Modular, componentized Bus (InfoBus), hierarchical, star Mediators, wrappers (TSIMMIS) Light weight protocols Architecture of OAI and XOAI
325
Architectural Issues Internet middleware Independent system / part of federation Decompositions vary search engine, browser, DBMS, MM support repository, handle server, client information resources + mediators, bus or agent collection + client with workspace/environment Metrics: e.g., for federated search
326
Sornil & Mather Dissertations Mather: efficiently handling very large numbers of objects of varying sizes Sornil: efficiently handling IR for very large dynamic collections, large numbers of users, high transaction rates, large inverted files modeling and simulation data organization parallelization of algorithms, alone and in combination for retrieval (related) tasks
327
OAI and I2-DSI (Ryan Richardson) OAI – metadata harvesting OAI-PMH Data providers/Service providers I2-DSI – mirroring and replication Can we put them together to get benefits of both? Is this the first of many higher level mirroring schemes for Internet2, that provide independence from lower level representation issues?
328
I2-DSI interface
329
I2-DSI Architectural Diagram Mirror User Internet OAI Server Distributed Director
330
I2-DSI Technical Issues Resumption tokens Mirrors are stateless, so any mirror can answer (any part of) a request Every client request to the mirror is logged, to enable log comparison among the mirror archives Mirroring time is dependent on # of records sent in each chunk Can we connect this or something similar with LOCKSS (lots of copies keep stuff safe)
331
Topical Outline – Interfaces Taxonomy of interface components Workflow Visualization Environments Design Usability testing
332
Topical Outline – Metadata MARC Dublin Core RDF IMS OAI (Open Archives Initiative) Crosswalks, mappings Ontologies Topics maps, concept maps
333
Automatic Generation of Concept Maps Ryan Richardson, Rao Shen Concept maps are a valuable pedagogical tool (Novek & Gowin, 1984) Are concept maps a good summarization tool? Answer: Yes, they are at least a good supplement to abstracts, according to an experiment we did last semester Unfortunately, making concept maps by hand is tedious for more than a few documents Can we generate useful concept maps automatically, for both English and Spanish documents?
334
Generation by term co-occurrence Procedure for Spanish documents Determine part-of-speech for each word Collapse all inflected forms to root form Concatenate noun phrases into one “concept” Remove some stopwords; others are crosslinks Future: Use synonym sets to further collapse words Use Agrawal’s algorithm for association rules to find related concepts Can translate node/link labels into English automatically, if desired
335
Automatically generated concept map This map was extracted from a Spanish essay on “Cien Años de Soledad” (100 Years of Solitude)
336
References Background: http://cmap.coginst.uwf.edu/info/ GetSmart: http://ai8.bpa.arizona.edu:8080/aicm/index.html http://ai.bpa.arizona.edu/go/mlir/ Agrawal's association rules algorithm: http://citeseer.nj.nec.com/agrawal94fast.html Gaines and Shaw: http://ksi.cpsc.ucalgary.ca/articles/ConceptMaps/ CM.html Japanese Work: http://www.icce2001.org/cd/pdf/P06/JP031.pdf Singapore work: http://textmining.krdl.org.sg/people/kanagasa
337
Topical Outline – Epub, SGML, XML Authoring Rendering, presenting Structure Tagging, Markup, DOM Semi-structured information Dual-publishing, eBooks Styles (XSL, XSLT) Structure queries
338
Topical Outline – Databases Extending database technology Structured and unstructured info Multimedia databases Link databases Performance Replicated storage, I2-DSI (details following)
339
Topical Outline – Agents Protocols Knowledge interchange Negotiation, registries Distributed issues Ontologies (standard upper) Webbots (automatic indexing)
340
Topical Outline – Economics E-commerce Sustainability Preservation and archiving DLF, Besser, Lorie, Gladney Self-archiving Open collections Economic models, business plans
341
Topical Outline – IPR Intellectual property rights (IPR) Legal issues Terms and conditions Copyright Patents, trademarks Distributed rights management Security
342
Topical Outline – Social Issues Cooperation, collaboration Annotation, ratings Digital divide Educational applications Cultural heritage Museums (AMICO) Organizational acceptance Personalization Internationalization
343
Increase local interchange among students, faculty, library, graduate school Increase international understanding, building many more invisible colleges, with students more empowered Connect graduate researchers with undergrads, who can access ETDs / them Facilitate direct university collaboration, explicitly, in reshaping publishing world Social Capital?
344
Collaborative Development (Joan Lippincott)
345
Why Collaboration? Expertise in aspects of the digital environment Pooling of resources
346
Collaboration and digital projects Distributed systems Digital course content Digital library resources Delivery of services Development of policies
347
Collaborations involve: Shared goals Common vision Shared vocabulary
348
Two views of an ETD progam Have staff scan Implement now Increase university visibility Teach students to write and submit ETDs Implement soon Develop electronic authors
349
In a collaboration... Each contributes resources Partners acknowledge and value contributions Partners develop a clear process Group and individual accountability
350
ETD project participants Academic administrators Faculty Students Staff Graduate school / provost / registrar Information technologists Librarians
351
Collaboration and NDLTD Common goals of members Diverse sets of skills and expertise Need for strategies and tactics to surmount any problems -> advocacy
352
Collaborative project strategy Champion initiates project Leadership establishes initial goal and parameters Issue a call for participants Conduct procedure to select participants
353
Collaborative project strategy Initial meeting Develop shared goals Develop clear process Continue work at institutions Establish communication channels Establish project milestones Evaluate progress, refine approach
354
Outline Virginia Tech context Why DLs? What are DLs? (5S theory) Case Study: CSTC -> CITIDEL-> NSDL Case Study: NDLTD Accessibility and Visualization DL Software: MARIAN Interoperability: OAI, ODL Topical Outline Selected Links
355
Selected Links - http://fox.cs.vt.edu CITIDEL www.citidel.org NCSTRL www.ncstrl.org NDLTD www.ndltd.org and etdguide.org NSDL www.nsdl.org Virginia Tech Digital Library Courseware http://ei.cs.vt.edu/~dlib Virginia Tech Digital Library Research Laboratory (DLRL) http://www.dlib.vt.edu (5S, 5SL, AmericanSouth.Org, CSTC, ENVISION, MARIAN, NDLTD, NSDL, OAI, ODL) Virginia Tech DLRL OAI Projects http://www.dlib.vt.edu/projects/OAI/ Repository Explorer http://purl.org/net/oai_explorer
356
More Links ARC Cross-Archive Search Service http://arc.cs.odu.edu/ Dublin Core Metadata Initiative www.dublincore.org E-Prints DL-in-a-box www.eprints.org Open Archives Initiative http://www.openarchives.org OAI Metadata Harvesting Protocol http://www.openarchives.org/OAI/openarchivesprotocol.htm XML Schema Validator http://www.w3.org/2001/03/webdata/xsv XML Tools at W3C http://www.w3.org/XML/#software
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.