Download presentation
Presentation is loading. Please wait.
Published byMatthew Robertson Modified over 8 years ago
1
G. Marchionini, Univ. of Maryland Electronic Environments Cost Trends: Hardware cost < Software cost < Information cost < People time Virtuality (transcend space) Timeliness (minimize time) Interactivity Multimedia Trends: Resource Sharing, Collaboration, Dynamic Representation, The WWW Critical Need for Text and Multimedia Management Systems !
2
G. Marchionini, Univ. of Maryland Information Seeking Perspective Information seeking is a human-centered process Analytical Browse continuum of strategies and tactics Close coupling of queries, results, and usage Interactive, iterative process Information retrieval has focused on documents (not concepts or answers)
3
G. Marchionini, Univ. of Maryland Electronic Text Retrieval 1. Text retrieval is more complex than data retrieval from DBMS. 2. Distinguish searching for word matches from concept matches. 3. Distinguish subject from keyword search: Subject:-->Search on a controlled vocabulary (e.g., LC subject headings). The results point to documents. Keyword-->Search all words in particular fields/text fragments. The results point to documents. 4. Distinguish exact match from partial match retrieval
4
G. Marchionini, Univ. of Maryland Approaches to Text Retrieval 1. Surrogate Search: Search a set of predefined words that point to related documents. Requires indexing via some controlled vocabulary. pros: natural transition from paper systems; computationally cheap cons: limited access; human indexing required 2. Full-Test Search: Search every word in every document. pros: broaden access; possible to automate indexing cons: computationally expensive; word rather than concept 3. Knowledge-Based Search: Search a set of concepts that are related to concepts in documents. pros: improved retrieval cons: computationally expensive; theoretical at present
5
G. Marchionini, Univ. of Maryland Full-Text Search Full-Text Search: Search every word (or variant)in the document except stop words. Methods: Text Scanning Indexes (inverted files) Vectors Signatures
6
G. Marchionini, Univ. of Maryland Inverted File Words point to word number, offset, surrogate, or document: aardvark *Doc3, Doc 7, Doc45, Doc 67..... abacus Doc2, Doc16, Doc33, Doc 45, Doc 67,...... zygote Doc 7, Doc 33, Doc 67, Doc 123,.... Find all Documents and then apply logical operators to combine Query either matches or does not match * actually Doc3,Para5,Word45
7
G. Marchionini, Univ. of Maryland Vectors Each document (or surrogate) is represented by a vector defined by every word in the collection. Doc 1 0 0 1 1 0 0..... 0 Doc 2 0 0 0 0 1 1..... 0. Doc 7 1 0 0 1 0 0..... 1 (has aardvark and zygote). Doc 33 0 1 0 0 0 0..... 1 (has abacus and zygote). Doc 67 1 1 0 0 0 0..... 1 (has aardvark, abacus and zygote). Doc N Queries are expressed as vectors and matched to document vectors. Degrees of matching are possible.
8
G. Marchionini, Univ. of Maryland Document Alternatives Paragraphs, passages SGML codes Related problems: –text summarization/auto abstracting –auto categorization
9
G. Marchionini, Univ. of Maryland Multimedia Linguistic surrogates Images –color, texture, luminosity, shape Video –same as stills but add motion Sound –speaker attributes, pitch, duration
10
G. Marchionini, Univ. of Maryland Retrieval Trends 1. More full text databases (e.g., The Web!) 2. More statistical engines for ranking results (e.g., PLS, Inquiry, RetrievalWare, Topic) 3. Evolution in traditional markets (e.g., Dialog's Target, West's WIN, Mead's Freestyle) 4. WWW engines and services (Yahoo, Alta Vista, etc.) 5. Relevance feedback added 6. Multimedia developments
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.