Presentation on theme: "Digital Libraries and Multimedia Searching MIT 026B Winter 2002."— Presentation transcript:
Digital Libraries and Multimedia Searching MIT 026B Winter 2002
Today’s Information Environment Library catalogues Periodical databases Internet resources
HTML “Hypertext Markup Language” The language used for mounting documents on the World Wide Web, so that they can be formatted and presented in today’s browsers.
Memo to MIT 026 To: MIT 026 From: D.G. Campbell Date: March 25, 2002 It’s a pleasure working with you this term.
Features of Hypertext Markup Language Descriptive rather than procedural markup Based on format rather than content Creates documents that are designed for human beings to read, not for machines to manipulate in any meaningful way
What if…… Your computer could understand the semantic meaning of documents?
XML “Extensible Markup Language” A meta-language that can be used to create specialized markup for particular purposes.
What is a “memo”? Memo HeaderBody DateFromTo
MIT 026 D.G. Campbell March 25, 2002 It’s a pleasure working with you this term.
What if…… Your computer could understand data and use that data to produce a document?
“The Semantic Web” A vision of Tim Berners-Lee A transformation of the current World Wide Web in which: Information is semantically identified so that it can be retrieved more efficiently Data is semantically identified so that it can be assembled by the computer into meaningful displays
Web Portal A website that gathers together a wide range of content and services – –List-servs –Search engines –Online shopping services
Web Portals of the Future? Hospital Information Systems E-Commerce Systems Information Systems
Digital Libraries A collection of digital resources that have been created and/or gathered by a particular administrative body The resources may have been in another format, or may have originated in digital form Sophisticated “library-style” searching is possible
Multimedia Information Retrieval Sound Video Images
Metadata “Data about data” Machine-understandable information about electronic resources
“The Dublin Core” A set of simple bits of metadata that can easily be added to the headers of a Web document. This “core” can be added to by any community that has extra elements they want to add.
There are metadata sets for: Museums (CIMI) Archives (Encoded Archival Description) Geospatial Information (FGDC) Government Information (GILS) Art Works (CDWA) Literature (TEI)
The Dublin Core Title Creator Subject Description Publisher Contributor Date Type Format Identifier Source Language Relation Coverage Rights
What does metadata look like? Attributes with values
Example of Dublin Core Metadata
Where is metadata found? In a separate database, just like a library catalogue Embedded in the headers of the documents themselves
Metadata Web Page header Creator Title URL Subject __________ Scale Map Type Dublin Core Metadata Community-Based Metadata
What’s New about Metadata? It takes information retrieval out of the library. A lot of it will be done by Web authors and publishers. There’ll be a lot more variety.
Metadata Harvesting A system in which: –Organizations place their metadata in repositories –These repositories make the metadata available through a special interface –Software agents (robots) connect to different repositories, collect metadata records, and combine them into a new system
The Semantic Web Toerntiosernet Toewreitnssart oiesrethsands Toerntiosernet Toewreitnssart oiesrethsands Toerntiosernet Toewreitnssart oiesrethsands Toerntiosernet Toewreitnssart oiesrethsands
New Developments in Information Retrieval Metadata for Information Retrieval –Metadata Harvesting –Semantic Web Multimedia Information Retrieval –Images –Sound
How do we retrieve non-textual information?
(dog* OR canine) AND (train*) How to train your dog to fetch your slippers to retrieve objects. Housetraining dogs the easy way. Canine behaviour patterns and their effect on the training process. Document 1 Document 2 Document 3
Image Retrieval 20 th Century: –Photography, film, television 1965 onward: –Digital imaging 1980s onward: –Cost-effective digital imaging
Key Players in Image Retrieval Fields that are heavily image-dependent: –Medicine, Architecture, Engineering Geographic Information Systems Art galleries and museums Photograph libraries
Example: William Blake Archive
Problems as Archives Grow: Browsing Searching
Levels of Detail: Primitive attributes Logical attributes Abstract attributes
Current Methods of Image Retrieval Controlled Vocabularies –Art and Architecture Thesaurus –Library of Congress Thesaurus for Graphic Materials Classification –ICONCLASS
Content-Based Image Retrieval Colour Texture Shape Position
10-Level Indexing 1.Type/Technique 2.Global Distribution 3.Local Structure 4.Global Composition SYNTAX FEATURES 1.Generic Object 2.Generic Scene 3.Specific Object 4.Specific Scene 5.Abstract Object 6.Abstract Scene SEMANTIC FEATURES
The hills are alive…..
Traditional Methods of Music Retrieval Standard Metadata Elements –Composer, lyricist, opus number, date of composition, etc.
Traditional Methods of Music Retrieval: the “Incipit” Beethoven, Ludwig van. Romance for violin and orchestra. Ed. Zino Francescatti. Opus 50.
Retrieving Music on the Basis of an Input Melody Retrieval is based on the variations in pitch from one note to another, rather than on the absolute pitch of the notes themselves Retrieval is enhanced by the directions of the intervals, up or down Retrieval often depends on a clear segmentation of the notes: “ta” or “da”
Problems Making the system “forgiving” to inexperienced singers Variations in popular tunes Size and complexity of database records