Presentation is loading. Please wait.

Presentation is loading. Please wait.

Digital Libraries Lillian N. Cassel. A digital library An informal definition of a digital library is a managed collection of information, with associated.

Similar presentations


Presentation on theme: "Digital Libraries Lillian N. Cassel. A digital library An informal definition of a digital library is a managed collection of information, with associated."— Presentation transcript:

1 Digital Libraries Lillian N. Cassel

2 A digital library An informal definition of a digital library is a managed collection of information, with associated services, where the information is stored in digital formats and accessible over a network. -- Wm Arms, Digital Libraries, 1999 A focused collection of digital objects, including text, video, and audio, along with methods for access and retrieval, and for selection, organization, and maintenance of the collection. -- Witten and Bainbridge, How to Build a Digital Library 2003

3 What is a library? An active exercise to explore what we know about, and think about, traditional libraries. How do we translate these characteristics to the digital world? –Is that the right model? Are we unnecessarily constraining the digital environment? Are there things that do not translate?

4 Vannevar Bush “As we may think” (http://www.theatlantic.com/doc/194507/bush)http://www.theatlantic.com/doc/194507/bush Reflecting after WWII –The value of collaboration –The sad use of scientific expertise to invent the atomic bomb –The need for organization and access to information.

5 memex Vannevar Bush’s vision Image source: kelty.rice.edu/375/images/memex/camera.jpg http://www.knowledgesearch.org/presentations/etcon/images/memex.gif

6 MyLifeBits Gordon Bell and Microsoft http://www.guardian.co.uk/science/story/0,3605,1674359,00.html “Gordon Bell doesn't need to remember, but has no chance of forgetting. At the age of 71, he is recording as much of his life as modern technology will allow, storing it all on a vast database: a digital facsimile of a life lived. If he goes for a walk, a miniature camera that dangles from his neck snaps pictures every minute or so, immediately committing the scene to a memory built not of neurons but ones and noughts. If he wanders into a cafe, sensors note the change in light, the shift of temperature and squirrel the information away. Conversations are recorded and steps logged thanks to a GPS receiver carried with him.”

7 Related work Walden’s Path –http://www.csdl.tamu.edu/walden/http://www.csdl.tamu.edu/walden/ –System used by itself or as a service within a digital library –Allows a user to make a path through a set of related resources and save the path for reuse at a later time. Used to allow a teacher to “blaze a trail” through a collection of materials to help students find their way from a starting point to a goal. Also for recording personal trips through a collection of material to be revisited. How does that compare to a set of bookmarks?

8 Moving Forward Looked at what a library is Now –How do we translate that to a digital entity? Information resources, including digital libraries, are very complex systems. –A formal model helps to capture the essence of the system and give special attention to specific areas –The model also allows developers of digital libraries to have a check list of areas to consider and develop well.

9 The 5S model Streams –The flow of information in various formats Structures –Organizational aspects of the DL Spaces –Views of components; real or abstract images Scenarios –Services and behaviors Societies –Communities and relationships among them

10 5S summary ModelPrimitives Formalisms Objectives StreamText; video, audio, software program Sequences, typesDescribes properties of the DL content, encoding and textual material or particular forms of multimedia data. StructureCollection, catalog; hypertext; document; metadata; organizational tools Graphs; nodes; links; labels; hierarchies Specifies organizational aspects of the DL content SpaceUser Interface; index; retrieval model Sets; operations; vector space; measure space; probability space Defines logical and presentational views of several DL components ScenariosService, event; condition; action Sequence diagrams; collaboration diagrams Details the behavior of DL services SocietiesCommunity; managers; actors; classes; relationships; attributes; operators Object-oriented modeling constructs; design patterns Defines managers responsible for running DL services; actors that use those services, and relationships among them Source: http://www.dlib.vt.edu/projects/5S-Model/

11 Etana - A DL for archeology

12 An example application of 5S - Etana: A DL for an archeological site Text Video Audio *Site *Sub-partition *Container*Artifact*LocusRegion Taxonomies Temporal Artifact-specific Space model Structure model Metadata DrawingPhoto3D Stream model *Partition Society model Archaeologist General public Geographic space Service Manager Information Satisfaction Value added Repository building Scenario model Services Domain specific User interfaceMetric space Spatial Source: E. A. Fox http://feathers.dlib.vt.edu/

13 Applying the model, informally Personal Photos; Movie, TV, media Stream - what types of data? Gif, jpg, avi? Structure - How are the elements organized? Is there a hierarchy? Are there multiple structures? Spaces - How would you index the items? How would you divide them into related groups Scenarios - what services would you provide? What information do we need to provide those services? Societies - who is the library intended to serve? Remember to include agents and other processes as well as users. In your group, choose one or the other (photos or movie/TV/media). Start with stream, scenarios, societies.

14 More formally: Definitions Definition: A stream is a sequence whose co-domain is a non empty set. Definition: A structure is a tuple (G, L, F) where G = (V,E) is a directed graph with vertex set V and edge set E, L is a set of label values, and F is a labeling function.

15 Definitions, cont’d Definition: A space is a measurable space, measure space, probability space, vector space, topological space, or metric space –A vector space is a representation for the set of elements in a collection. The vector representing each element is a set of characteristics held by that element and both connecting that element to others that are similar and distinguishing it from those that are different. –We will do an exercise to illustrate

16 Definitions - 3 Definition: A scenario is a sequence of related transition events (e 1, e 2, …, e n ) on state set S such that e k = (s k, s k+1,) for 1 <= k <= n. –More easily visualized, a scenario is a path in a directed graph, G = (S, ∑ e ), where vertices correspond to states in the state set S and directed edges are equivalent to events in a set of events, ∑ e, and correspond to transitions between states. –Scenarios must be implemented to make a working system.

17 Definitions - 4 Definition: A society is a tuple (C,R) where –C = (c 1, c 2, …, c n ) is a set of conceptual communities, each community referring to a set of individuals of the same class or type (e.g. actors, activities, components, hardware, software, data); –R = (r 1, r 2, …, r m ) is a set of relationships, each relationship being a tuple r j = (e j, i j ) where e j is a Cartesian product c k 1 x c k 2 x … x c k n j. 1<= k 1 < k 2 < … < k n j <= n, which specifies the communities involved in the relationship and i j is an activity.

18 The Digital Library Content Essential elements for a digital library –Users –Content –Services

19 Content - requirements Store –Organize –Describe Find Deliver

20 Describing the content How to describe content –Metadata Machine readable description of anything What description –Machine readable requires standard descriptive elements Dublin Core (http://dublincore.org/)http://dublincore.org/ –International standard –“a standard for cross-domain information resource description.” –15 descriptive elements Other metadata schemes –IEEE-LOM

21 Metadata What does metadata look like? Metadata is data about data –Information about a resource, encoded in the resource or associated with the resource. The language of metadata: XML –eXtensible Markup Language

22 Google Books Project Michael A. Keller, Closing Keynote –Ida M. Green University Librarian at Stanford, –Director of Academic Information Resources, –Publisher of HighWire Press, and –Publisher of the Stanford University Press: "One good turn deserves another; how the Google Book Search project is benefiting everyone".

23 Google Books demo Full text - Life of Miguel de Cervantes Limited Preview - The Life of Miguel de Cervantes Saavedra Snippet View - "Discreción" in the Works of Cervantes: A Semantic Study

24 What has been accomplished As of September 2006 Nearly 30,000 Stanford books digitized –~1M books from all partner libraries Over 4,000 books identified as needing preservation treatment (& so not digitized) A great debate about copyright has started –Orphan works –What can an archive do to provide access –Defense of fair use underway This slide is taken from the presentation by Michael A. Keller at ECDL 2006

25 Original Principles If legally possible, digitize every book (9M volumes) in the Stanford libraries –Now digitizing with imprint dates up to 1963 Partner libraries (*added recently) –University of Michigan (similar to Stanford) –Harvard (public domain (?), maybe > 1M) –NYPL (public domain, unusual collections) –Oxford - Bodleian (earlier than 1885, ~ 1M titles) –University of California (similar to Stanford >6M) –(more to follow) This slide is taken from the presentation by Michael A. Keller at ECDL 2006

26 Purposes Digital preservation –Virtual Bookshelves in Stanford Digital Repository under construction as part of the Stanford Digital Repository –For Stanford use only Other searching and research functions –Subtle searching (as in Socrates & HighWire) –Taxonomic (LCSH & HighWire) & Associative Searching (Takano) –Citation linking (HighWire & “InforTools” (Ebrary) –Better navigation (through visualization ?) (Grokker) Digitized books from all sources as test bed for new research; combine with articles, datasets, etc. for data mining & other transformative uses. This slide is taken from the presentation by Michael A. Keller at ECDL 2006

27 Some Conclusions Google Book Search –Is an indexing, not a publishing project –Offers substantial increases in access to contents of books in library collections by keyword searching –Offers publishers global marketing of their publications –Offers several useful services to readers Offers participating libraries –Digital copies of books on their shelves for preservation –New possibilities for services to local readers –New possibilities for research for local faculty & students This slide is taken from the presentation by Michael A. Keller at ECDL 2006

28 Google statement “Many of the books in Google Book Search come from authors and publishers who participate in our Partner Program. For these books, our partners decide how much of the book is browsable -- anywhere from a few sample pages to the whole book. For books that enter Book Search through the Library Project, what you see depends on the book's copyright status. We respect copyright law and the tremendous creative effort authors put into their work. If the book is in the public domain and therefore out of copyright, you can page through the entire book and even download it and read it offline. But if the book is under copyright, and the publisher or author is not part of the Partner Program, we only show basic information about the book, similar to a card catalog, and, in some cases, a few snippets -- sentences of your search terms in context. The aim of Google Book Search is to help you discover books and learn where to buy or borrow them, not read them online from start to finish. It's like going to a bookstore and browsing - with a Google twist.” http://books.google.com/support/bin/answer.py?answer=43729&topic=9259&hl=en

29 Other projects Open Content Alliance (Yahoo and the Internet Archive) The Internet Archive www.archive.orgwww.archive.org The European Digital Library (Growing number of countries) others Comments? Discussion?

30 A DL example Library of Congress American Memory project –http://memory.loc.gov/ammem/index.htmlhttp://memory.loc.gov/ammem/index.html – “American Memory provides free and open access through the Internet to written and spoken words, sound recordings, still and moving images, prints, maps, and sheet music that document the American experience. It is a digital record of American history and creativity. These materials, from the collections of the Library of Congress and other institutions, chronicle historical events, people, places, and ideas that continue to shape America, serving the public as a resource for education and lifelong learning.”

31 Dublin Core for a map Map found in the LOC American Memory collection –Map at http://memory.loc.gov/ammem/gmdhtml/gmdhome.html http://memory.loc.gov/ammem/gmdhtml/gmdhome.html Dublin Core metadata illustration found at http://webapp.slis.ua.edu/smmweb/DLib/Metadata/OrganizingInternetResources_files/v3_document.htm http://webapp.slis.ua.edu/smmweb/DLib/Metadata/OrganizingInternetResources_files/v3_document.htm –Part of a DL course at U. of Alabama

32 Go to web site to explore what is there -- including copyright information, title, history, etc.

33 Dublin Core: Title Name given, usually by the creator or publisher < META name = “DC.Title” content = “Novi Belgii Novæque Angliæ:nec non partis Virginiæ tabula multis in locis emendata ” lang = “la” > Source: webapp.slis.ua.edu/smmweb/DLib/Metadata/OrganizingInternetResources_files/v3_document.htm

34 Dublin Core: Subject What the work is about, possibly keywords, terms from classification scheme if available. <META name = “DC.Subject” content = “Middle Atlantic States - Maps - Early works to 1800 - Facsimilies” scheme = “LCSH” > Source: webapp.slis.ua.edu/smmweb/DLib/Metadata/OrganizingInternetResources_files/v3_document.htm LCSH = Library of Congress Subject Headers

35 Dublin Core: Description Free text description, abstract, etc. <META name = DC.Description” content = “An (sic) historical map showing the coast of New Jersey as perceived in the senventeenth century” > Source: webapp.slis.ua.edu/smmweb/DLib/Metadata/OrganizingInternetResources_files/v3_document.htm

36 Dublin Core: Source Is this object derived from another? Is this map a part of a larger map? Is this text a variation or revision of another piece of text? <META name = “DC.Source” content = “G3715 1685.V5 1969” scheme = “LCCN” Source: webapp.slis.ua.edu/smmweb/DLib/Metadata/OrganizingInternetResources_files/v3_document.htm LCCN = Library of Congress Call Number

37 Dublin Core: Language Language of the content of the resource For the map, there is no language content <META name = “DC.Language” content = “nl” > Source: webapp.slis.ua.edu/smmweb/DLib/Metadata/OrganizingInternetResources_files/v3_document.htm

38 Dublin Core: Relation To what other object(s) or collection is this object related? Does it also exist in another collection? Is it derived from another document or image? How is it related? <META name = “DC.Relation” content = “isPartOf http://lcweb2.loc.gov/cgi- bin/query/r?ammem/gmd:@filreq(@field(NUMBER+@band(g3715+ct000001))+@field(COLLID+dsxpmap)) > Source: webapp.slis.ua.edu/smmweb/DLib/Metadata/OrganizingInternetResources_files/v3_document.htm

39 Dublin Core: Creator Person or organization responsible for the Intellectual Content of this object <META name = “DC.Creator” content = “Nicolaum Visscher” > Source: webapp.slis.ua.edu/smmweb/DLib/Metadata/OrganizingInternetResources_files/v3_document.htm

40 Dublin Core: Publisher Entity responsible for making the resource available in its present form Not shown in the example, but should be something like this: <META name = “DC.Publisher” content = “Library of Congress American Memory Project” > Source: webapp.slis.ua.edu/smmweb/DLib/Metadata/OrganizingInternetResources_files/v3_document.htm

41 Dublin Core: Contributor Any entity making a contribution to this object. Example: someone who added some information to the original document or image No entry for this map.

42 Dublin Core: Rights A pointer to a copyright notice, a rights management statement, or a rights server. <META name = “DC.Rights” content = http://lcweb2.loc.gov/cgi-bin/ ammemrr.pl ?title=%3ca%20href%3d%22%2fammem%2fgmdhtml %2fdsxphome.html%22%3eDiscovery%20and%20Exploration %3c%2fa%3e&coll=gmd&div=&agg=g3715&default=ammem &dir=ammem >

43 Dublin Core: Date Date on which this object was made available in its present form, possibly the date it was entered into this digital collection. <META name = “DC.DATE” content = “1996-04-17” scheme = “ISO 8601” > Source: webapp.slis.ua.edu/smmweb/DLib/Metadata/OrganizingInternetResources_files/v3_document.htm Specify the date format so that others can interpret it correctly

44 Dublin Core: Type or Category What sort of thing is this? Some examples: home page, novel, poem, working paper, technical report, essay dictionary, … Type should be selected from a controlled list. For example, see the DCMI Type Vocabulary: http://dublincore.org/documents/2006/08/28/dcmi-type-vocabulary/ Why is this recommended as a controlled vocabulary field?

45 DCMI Type Vocabulary Collection Dataset Event Image InteractiveResource MovingImage PhysicalObject Service Software Sound StillImage Text See the official page for explanations of the categories.official page Note that Image is a broad category and Moving Image and StillImage are more restricted subcategories.

46 Dublin Core: Type Category of this resource <META name = “DC.Type” content = “image.photograph” > Source: webapp.slis.ua.edu/smmweb/DLib/Metadata/OrganizingInternetResources_files/v3_document.htm

47 Dublin Core: Format The way the content is encoded. This tells what resource is needed to access this content. <META name=“DC.Format” content = “image/gif” scheme = “IMT” > Internet MIME Types: http://www.ltsw.se/knbase/internet/mime.htp http://www.ltsw.se/knbase/internet/mime.htp See also Internet Media Type: http://www.graphcomp.com/info/specs/mime.html http://www.graphcomp.com/info/specs/mime.html

48 Dublin Core: Unique ID The key for this object in the collection. I cannot find one for the map we are looking at, but the ID for the map of which it is a part is g3715 ct000001 The Metadata specification for that would be <META name= “DC.Id” content = “g3715 ct000001” > Source: http://memory.loc.gov/cgi-bin/query/r?ammem/gmd:@filreq(@field(NUMBER+ @band(g3715+ct000001))+@field(COLLID+dsxpmap))http://memory.loc.gov/cgi-bin/query/r?ammem/gmd:@filreq(@field(NUMBER+ @band(g3715+ct000001))+@field(COLLID+dsxpmap

49 Dublin Core: Coverage The time, space or other measurement of the scope or completeness of the object. No coverage entry specified, but might be this: <META name = “DC.Coverage” content = “North America, Eastern lands and coast, as viewed in late seventeenth century” > Example not a controlled vocabulary. Why would a controlled vocabulary be better?

50 International Concensus Recognition of International Scope of Resource Discovery on Web 17 Countries Currently Involved in DC Working Groups 50+ Implementation Projects in 10 Countries Source: webapp.slis.ua.edu/smmweb/DLib/Metadata/OrganizingInternetResources_files/v3_document.htm

51 Guide to Good Practice The NINCH Guide to Good Practice in the Digital Representation and Management of Cultural Heritage Materials http://www.nyu.edu/its/humanities/ninchguide/index.html

52 Legal and Technical Issues Legal: When is a resource available to digitize and make available. What requirements exist for controlling access. Technical: How do we control access to a resource that is stored online? –Policies –Encoding –Distribution limitations

53 Date of workProtected fromTerm Created 1-1-78 or after When work is fixed in tangible medium of expression Life + 70 years1(or if work of corporate authorship, the shorter of 95 years from publication, or 120 years from creation Published before 1923 In public domainNone Published 1923 - 63 When published with notice28 years + could be renewed for 47 years, now extended by 20 years for a total renewal of 67 years. If not so renewed, now in public domain Published from 1964 - 77 When published with notice28 years for first term; now automatic extension of 67 years for second term Created before 1- 1-78 but not published 1-1-78, the effective date of the 1976 Act which eliminated common law copyright Life + 70 years or 12-31-2002, whichever is greater Created before 1-1-78 but published between then and 12-31-2002 1-1-78, the effective date of the 1976 Act which eliminated common law copyright Life + 70 years or 12-31-2047 whichever is greater Chart created by Lolly Gasaway. Updates at http://www.unc.edu/~unclng/public-d.htm http://www.unc.edu/~unclng/public-d.htm

54 Works for hire Usual case -- works created by faculty are not the property of the university. –Faculty surrender copyright to publishers of journals and books –Some publishers allow faculty to retain copyright, giving the publisher specific limited rights to reproduce and distribute the work.

55 Fair use No clear, easy answers. Checksheet provided in the article is a good guide to the issues. Link to the checksheet: http://www.copyright.iupui.edu/checklist.htm http://www.copyright.iupui.edu/checklist.htm

56 Moral rights Fair to the creator –Keep the identity of the creator of the work –Do not cut the work –Generally, be considerate of the person (or institution) that created the work.

57 Getting Permission With the best will in the world, getting the appropriate permissions is not always easy. –Identify who holds the rights –Get in touch with the rights holder –Get a suitable agreement to cover the needs of your use. Useful links: http://www.loc.gov/copyright/ http://www.utsystem.edu/OGC/IntellectualProperty/PERMISSN.HTM –Connections to various ways to discover and contact the rights holder of a work.

58 Source: NINCH Guide to Good Practice. Chapter 4: Rights Management Checking copyright status

59 Source: NINCH Guide to Good Practice. Chapter 4: Rights Management Copyright: Lauryn G. Grant Considering people depicted in the work

60 Technical issues Link the resource to the copyright statements Maintain that link when the resource is copied or used Approaches: –Steganography –Encryption –Digital Wrappers –Digital Watermarks

61 Issues in Encryption General cases for protection of controlled content: Concern for passive listening, active interference. –Listening: intruder gains information, may not be detected. Effects indirect. –Active interference Intruder may prevent delivery of the message to the intended recipient. Intruder may substitute a fake message for the intended one Effects are direct and immediate Less likely in the case of digital library content


Download ppt "Digital Libraries Lillian N. Cassel. A digital library An informal definition of a digital library is a managed collection of information, with associated."

Similar presentations


Ads by Google