Presentation is loading. Please wait.

Presentation is loading. Please wait.

Intelligent Information Retrieval CS 336 Lisa Ballesteros Spring 2006.

Similar presentations


Presentation on theme: "Intelligent Information Retrieval CS 336 Lisa Ballesteros Spring 2006."— Presentation transcript:

1 Intelligent Information Retrieval CS 336 Lisa Ballesteros Spring 2006

2 What is Information Retrieval? Includes the following: –Organization –Storage/Representation –Manipulation/Analysis –Search/Retrieval How far back in history can we find examples?

3 IR Through the Ages 3rd Century BCE –Library of Alexandria 500,000 volumes catalogs and classifications 13th Century A.D. –First concordance of the Bible What is a concordance? 15th Century A.D. –Invention of printing 1600 –University of Oxford Library All books printed in England

4 IR Through the Ages 1755 –Johnson’s Dictionary Set standard for dictionaries Included common language Helped standardize spelling 1800 –Library of Congress 1828 –Webster’s Dictionary Significantly larger than previous dictionaries Standardized American spelling 1852 –Roget’s Thesaurus

5 IR Through the Ages 1876 –Dewey Decimal Classification 1880’s –Carnegie Public Libraries 1,681 built (first public library 1850) 1930’s –Punched card retrieval systems 1940’s –Bush’s Memex –Shannon’s Communication Theory –Zipf’s “Law”

6 Historical Summary 1960’s –Basic advances in retrieval and indexing techniques 1970’s –Probabilistic and vector space models –Clustering, relevance feedback –Large, on-line, Boolean information services –Fast string matching 1980’s –Natural Language Processing and IR –Expert systems and IR –Off-the-shelf IR systems

7 IR Through the Ages Late 1980’s –First mini-computer and PC systems incorporating “relevance ranking” Early 1990’s –information storage revolution 1992 –First large-scale information service incorporating probabilistic retrieval (West’s legal retrieval system)

8 IR Through the Ages Mid 1990’s to present –Multimedia databases 1994 to present –The Internet and Web explosion e.g. Google, Yahoo, Lycos, Infoseek (now Go) 1995 to present –Digital Libraries –Data Mining –Agents and Filtering –Knowledge and Distributed Intelligence –Information Organization –Knowledge Management

9 Historical Summary 1990’s –Large-scale, full-text IR and filtering experiments and systems (TREC) –Dominance of ranking –Many web-based retrieval engines –Interfaces and browsing –Multimedia and multilingual –Machine learning techniques

10 Time On-line Information 19901970 Batch systems...Interactive systems...Database Systems…Cheap Storage...Internet…Multimedia... Gigabytes Terabytes Petabytes Technologies Boolean Retrieval and Filtering Ranked Retrieval Distributed Retrieval Concept-Based Retrieval Image and Video Retrieval Information Extraction Visualization Summarization Data Mining Ranked Filtering Trends in IR Technology 1-page word document without any images = ~10 kilobytes (kb) of disk space. 1 terabyte = one-hundred million imageless word docs 1 petabyte = one-thousand terabytes.

11 Historical Summary The Future –Logic-based IR? –NLP? –Integration with other functionality –Distributed, heterogeneous database access –IR in context –“Anytime, Anywhere”

12 Information Retrieval Ad Hoc Retrieval –Given a query and a large database of text objects, find the relevant objects Distributed Retrieval –Many distributed databases Information Filtering –Given a text object from an information stream (e.g. newswire) and many profiles (long-term queries), decide which profiles match Multimedia Retrieval –Databases of other types of unstructured data, e.g. images, video, audio

13 Information Retrieval Multilingual Retrieval –Retrieval in a language other than English Cross-language Retrieval –Query in one language (e.g. Spanish), retrieve documents in other languages (e.g. Chinese, French, and Spanish)

14 Information Retrieval Text Representation (Indexing) –given a text document, identify the concepts that describe the content and how well they describe it what makes a “good” representation? how is a representation generated from text? what are retrievable objects and how are they organized? Representing an Information Need (Query Formulation) –describe and refine information needs as explicit queries what is an appropriate query language? how can interactive query formulation and refinement be supported?

15 Information Retrieval Comparing Representations (Retrieval) –compare text and information need representations to determine which documents are likely to be relevant what is a “good” model of retrieval? how is uncertainty represented? Evaluating Retrieved Text (Feedback) –present documents for user evaluation and modify query based on feedback what are good metrics? what constitutes a good experimental testbed

16 Information Retrieval and Filtering Information NeedText Objects Representation Query Comparison Evaluation/Feedback Indexed Objects Retrieved Objects Representation

17 Features of a Modern IR Product Effective “relevance ranking” Simple free text (“natural language”) query capability Boolean and proximity operators Term weighting Query formulation assistance Query by example Filtering Field-based retrieval Distributed architecture Index anything Fast retrieval Information Organization

18 Typical Systems IR systems –Verity, Fulcrum, Excalibur Database systems –Oracle, Informix Web search and In-house systems –West, LEXIS/NEXIS, Dialog –Yahoo, Google, MSN, AskJeeves

19 IR vs. Database Systems Emphasis on effective, efficient retrieval of unstructured data IR systems typically have very simple schemas Query languages emphasize free text although Boolean combinations of words is also common

20 IR vs. Database Systems Matching is more complex than with structured data (semantics less obvious) –easy to retrieve the wrong objects –need to measure accuracy of retrieval Less focus on concurrency control and recovery, although update is very important


Download ppt "Intelligent Information Retrieval CS 336 Lisa Ballesteros Spring 2006."

Similar presentations


Ads by Google