Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 8 Information Retrieval Introduction

Similar presentations


Presentation on theme: "Lecture 8 Information Retrieval Introduction"— Presentation transcript:

1 Lecture 8 Information Retrieval Introduction

2 Information Retrieval Introduction
Databases Very formal & logical Input into them is (or can be) very tightly constrained In turn, DB queries are written assuming those constraints Information Retrieval Systems Empirical Cognitive modeling – the way we think

3 Information Retrieval Introduction
Queries based on ‘things already there’ Words, documents What are characteristics of these things? Total # of words in English language What are most common words ? Least common words ? How many total ‘documents’ in the world are there ? How many web pages are there ? What kind of structure does the web have ? How rapidly is it changing ?

4 Information Retrieval Introduction
Users have: An information need Use of information In an IR system, the user dynamically iterates with the system, e. g. “Was this helpful ?”

5 Information Retrieval Introduction
Similar, but not identical, architectures DBMS IR Data Documents DBMS IRS Database Engine Search Engine Query Processor Query Processor UI Queries & Reports Interface to another system UI Retrieved Output Interface to another system

6 Information Retrieval Introduction
Documents Medline, Westlaw, etc various retrieval methods – Boolean, Ranked w/weights, Vector space IRS Search Engine Silverplatter, Dialog, Inktomi Query Processor UI Retrieved Output Interface to another system Post-processing Value Add Via Web GUI, Command line

7 IRS Components Document preparation & analysis Task Definition Databases Indexing Search/Retrieval Engines Interfaces Usability & Cognitive Tools System Evaluation

8 Document Preparation & Analysis
Formatting tools Mapping to/from formats (XML, PDF, text, postscript, etc) Natural Language Processing/Feature Extractions Stemming Parsing, word sense disambiguation, morphology Tokenization

9 Filtering, selective dissemination Cross lingual retrieval
Task Definition Ad hoc Filtering, selective dissemination Cross lingual retrieval Categorization Topic detection & tracking Redundancy reduction Info synthesis/value add Cross doc/cross time summarization Presentation/visualization Info delivery when & where needed Info assistance Decision support Online analysis Resource discovery

10 Bibliographic Full text Multi-media Audio & video Web data
IR Databases Bibliographic Full text Multi-media Audio & video Web data

11 Human indexing & Categorization
In Everything Is Miscellaneous, Weinberger describes 3 orders of categorization: 1st order – organize things (made of atoms – takes up space) themselves, such as silverware in a drawer or books on a shelf 2nd order – there is a reference to the things themselves, such as a card catalogue that points to the physical space of the 1st order thing (but doesn’t necessarily say much about what’s inside) 3rd order – made of bits (takes up virtually no space) and can get to things ‘inside’ Use Everything is Miscellaneous Reference

12 Automatic indexing Indexing
Algorithms to organize and weight text in documents

13 Weighted or partial match Link analysis
Retrieval/Matching Boolean & exact match Weighted or partial match Link analysis

14 Interfaces Web GUI ‘Local’ GUI Command Line
Gesture – James Bond, Quantum of Solace Minority Report

15 Dictionaries, Thesauri Gazetteers, CIA World Fact Book Encyclopedias
Knowledge Tools Dictionaries, Thesauri Gazetteers, CIA World Fact Book Encyclopedias

16 Evaluation What questions to ask ? Is the system actually used ?
Is it efficient ? Is the system effective ? Are users satisfied ? Do they find relevant information ? Complete information ?

17 Reading Read As We May Think


Download ppt "Lecture 8 Information Retrieval Introduction"

Similar presentations


Ads by Google