2002.10.24 - SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002

Slides:



Advertisements
Similar presentations
Modern information retrieval Modelling. Introduction IR systems usually adopt index terms to process queries IR systems usually adopt index terms to process.
Advertisements

Modern Information Retrieval Chapter 1: Introduction
Query Models Use Types What do search engines do.
Retrieval Models and Ranking Systems CSC 575 Intelligent Information Retrieval.
UCLA : GSE&IS : Department of Information StudiesJF : 276lec1.ppt : 5/2/2015 : 1 I N F S I N F O R M A T I O N R E T R I E V A L S Y S T E M S Week.
Web Search - Summer Term 2006 II. Information Retrieval (Basics Cont.)
Search Engines and Information Retrieval
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) Classic Information Retrieval (IR)
Search and Retrieval: More on Term Weighting and Document Ranking Prof. Marti Hearst SIMS 202, Lecture 22.
ISP 433/533 Week 2 IR Models.
1 CS 430 / INFO 430 Information Retrieval Lecture 8 Query Refinement: Relevance Feedback Information Filtering.
SLIDE 1IS 240 – Spring 2010 Prof. Ray Larson University of California, Berkeley School of Information Principles of Information Retrieval.
SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000.
SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.
SLIDE 1IS 202 – FALL 2004 Lecture 13: Midterm Review Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am -
SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.
SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.
9/4/2001Information Organization and Retrieval Introduction to Information Retrieval University of California, Berkeley School of Information Management.
INFO 624 Week 3 Retrieval System Evaluation
Lecture 1: Introduction and History
9/6/2001Information Organization and Retrieval Introduction to Information Retrieval (cont.): Boolean Model University of California, Berkeley School of.
A Task Oriented Non- Interactive Evaluation Methodology for IR Systems By Jane Reid Alyssa Katz LIS 551 March 30, 2004.
Current Topics in Information Access: IR Background
© Tefko Saracevic1 Search strategy & tactics Governed by effectiveness&feedback.
Lecture 15: Intro to Information Retrieval
DOK 324: Principles of Information Retrieval Hacettepe University Department of Information Management.
SLIDE 1IS 202 – FALL 2003 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2003
8/28/97Information Organization and Retrieval IR Implementation Issues, Web Crawlers and Web Search Engines University of California, Berkeley School of.
SLIDE 1IS 202 – FALL 2003 Lecture 26: Final Review Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00.
Interfaces for Querying Collections. Information Retrieval Activities Selecting a collection –Lists, overviews, wizards, automatic selection Submitting.
Experimental Components for the Evaluation of Interactive Information Retrieval Systems Pia Borlund Dawn Filan 3/30/04 610:551.
SLIDE 1IS 202 – FALL 2004 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2004
September 7, 2000Information Organization and Retrieval Introduction to Information Retrieval Ray Larson & Marti Hearst University of California, Berkeley.
WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION.
ISP 433/633 Week 6 IR Evaluation. Why Evaluate? Determine if the system is desirable Make comparative assessments.
SLIDE 1IS 202 – FALL 2003 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2003
SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.
1 CS 502: Computing Methods for Digital Libraries Lecture 11 Information Retrieval I.
Modern Information Retrieval Lecture 1: Introduction.
Search and Retrieval: Relevance and Evaluation Prof. Marti Hearst SIMS 202, Lecture 20.
Search Engines and Information Retrieval Chapter 1.
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
Modern Information Retrieval: A Brief Overview By Amit Singhal Ranjan Dash.
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Information Retrieval Model Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
Lecture 1: Overview of IR Maya Ramanath. Who hasn’t used Google? Why did Google return these results first ? Can we improve on it? Is this a good result.
Information in the Digital Environment Information Seeking Models Dr. Dania Bilal IS 530 Spring 2005.
1 Information Retrieval LECTURE 1 : Introduction.
Information Retrieval
Information Retrieval Transfer Cycle Dania Bilal IS 530 Fall 2007.
1 CS 430 / INFO 430 Information Retrieval Lecture 8 Evaluation of Retrieval Effectiveness 1.
Chapter. 3: Retrieval Evaluation 1/2/2016Dr. Almetwally Mostafa 1.
Search and Retrieval: Finding Out About Prof. Marti Hearst SIMS 202, Lecture 18.
Search and Retrieval: Query Languages Prof. Marti Hearst SIMS 202, Lecture 19.
1 CS 430: Information Discovery Lecture 8 Collection-Level Metadata Vector Methods.
SIMS 202, Marti Hearst Final Review Prof. Marti Hearst SIMS 202.
Information Storage and Retrieval Fall Lecture 1: Introduction and History.
Query Models Use Types What do search engines do.
What is Information Retrieval (IR)?
Why the interest in Queries?
Query Models Use Types What do search engines do.
Thanks to Bill Arms, Marti Hearst
CS 430: Information Discovery
Introduction to Information Retrieval
Information Retrieval and Web Design
Introduction to Search Engines
Presentation transcript:

SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall SIMS 202: Information Organization and Retrieval Lecture 16: Boolean Information Retrieval

SLIDE 2IS 202 – FALL 2002 Lecture Overview Review –Introduction to Information Retrieval –The Information Seeking Process –History of IR Research IR System Structure (revisited) Central Concepts in IR Boolean Logic Boolean IR Systems Credit for some of the slides in this lecture goes to Marti Hearst

SLIDE 3IS 202 – FALL 2002 Lecture Overview Review –Introduction to Information Retrieval –The Information Seeking Process –History of IR Research IR System Structure (revisited) Central Concepts in IR Boolean Logic Boolean IR Systems Credit for some of the slides in this lecture goes to Marti Hearst

SLIDE 4IS 202 – FALL 2002 IR Topics for 202 The Search Process Information Retrieval Models Content Analysis/Zipf Distributions Evaluation of IR Systems –Precision/Recall –Relevance –User Studies System and Implementation Issues Web-Specific Issues User Interface Issues Special Kinds of Search

SLIDE 5IS 202 – FALL 2002 IR is an Iterative Process Repositories Workspace Goals

SLIDE 6IS 202 – FALL 2002 Berry-Picking Model Q0 Q1 Q2 Q3 Q4 Q5 A sketch of a searcher… “moving through many actions towards a general goal of satisfactory completion of research related to an information need.” (after Bates 89)

SLIDE 7IS 202 – FALL 2002 Restricted Form of the IR Problem The system has available only pre- existing, “canned” text passages. Its response is limited to selecting from these passages and presenting them to the user. It must select, say, 10 or 20 passages out of millions or billions!

SLIDE 8IS 202 – FALL 2002 Information Retrieval Revised Task Statement: Build a system that retrieves documents that users are likely to find relevant to their queries. This set of assumptions underlies the field of Information Retrieval.

SLIDE 9IS 202 – FALL 2002 Card-Based IR Systems Uniterm (Casey, Perry, Berry, Kent: 1958) –Developed and used from mid 1940’s) EXCURSION LUNAR

SLIDE 10IS 202 – FALL 2002 Card Systems Batten Optical Coincidence Cards (“Peek- a-Boo Cards”), 1948 Lunar Excursion

SLIDE 11IS 202 – FALL 2002 Card Systems Zatocode (edge-notched cards) Mooers, 1951 Document 1 Title: lksd ksdj sjd sjsjfkl Author: Smith, J. Abstract: lksf uejm jshy ksd jh uyw hhy jha jsyhe Document 200 Title: Xksd Lunar sjd sjsjfkl Author: Jones, R. Abstract: Lunar uejm jshy ksd jh uyw hhy jha jsyhe Document 34 Title: lksd ksdj sjd Lunar Author: Smith, J. Abstract: lksf uejm jshy ksd jh uyw hhy jha jsyhe

SLIDE 12IS 202 – FALL 2002 Computer-Based Systems Bagley’s 1951 MS thesis from MIT suggested that searching 50 million item records, each containing 30 index terms would take approximately 41,700 hours –Due to the need to move and shift the text in core memory while carrying out the comparisons 1957 – Desk Set with Katharine Hepburn and Spencer Tracy – EMERAC

SLIDE 13IS 202 – FALL 2002 Historical Milestones in IR Research 1958 Statistic Language Properties (Luhn) 1960 Probabilistic Indexing (Maron & Kuhns) 1961 Term association and clustering (Doyle) 1965 Vector Space Model (Salton) 1968 Query expansion (Roccio, Salton) 1972 Statistical Weighting (Sparck-Jones) Poisson Model (Harter, Bookstein, Swanson) 1976 Relevance Weighting (Robertson, Sparck- Jones) 1980 Fuzzy sets (Bookstein) 1981 Probability without training (Croft)

SLIDE 14IS 202 – FALL 2002 Historical Milestones in IR Research (cont.) 1983 Linear Regression (Fox) 1983 Probabilistic Dependence (Salton, Yu) 1985 Generalized Vector Space Model (Wong, Rhagavan) 1987 Fuzzy logic and RUBRIC/TOPIC (Tong, et al.) 1990 Latent Semantic Indexing (Dumais, Deerwester) 1991 Polynomial & Logistic Regression (Cooper, Gey, Fuhr) 1992 TREC (Harman) 1992 Inference networks (Turtle, Croft) 1994 Neural networks (Kwok)

SLIDE 15IS 202 – FALL 2002 Lecture Overview Review –Introduction to Information Retrieval –The Information Seeking Process –History of IR Research IR System Structure (revisited) Central Concepts in IR Boolean Logic Boolean IR Systems Credit for some of the slides in this lecture goes to Marti Hearst

SLIDE 16IS 202 – FALL 2002 Structure of an IR System Search Line Interest profiles & Queries Documents & data Rules of the game = Rules for subject indexing + Thesaurus (which consists of Lead-In Vocabulary and Indexing Language Storage Line Potentially Relevant Documents Comparison/ Matching Store1: Profiles/ Search requests Store2: Document representations Indexing (Descriptive and Subject) Formulating query in terms of descriptors Storage of profiles Storage of Documents Information Storage and Retrieval System Adapted from Soergel, p. 19

SLIDE 17IS 202 – FALL 2002 Structure of an IR System Search Line Interest profiles & Queries Documents & data Rules of the game = Rules for subject indexing + Thesaurus (which consists of Lead-In Vocabulary and Indexing Language Storage Line Potentially Relevant Documents Comparison/ Matching Store1: Profiles/ Search requests Store2: Document representations Indexing (Descriptive and Subject) Formulating query in terms of descriptors Storage of profiles Storage of Documents Information Storage and Retrieval System Adapted from Soergel, p. 19

SLIDE 18IS 202 – FALL 2002 Structure of an IR System Search Line Interest profiles & Queries Documents & data Rules of the game = Rules for subject indexing + Thesaurus (which consists of Lead-In Vocabulary and Indexing Language Storage Line Potentially Relevant Documents Comparison/ Matching Store1: Profiles/ Search requests Store2: Document representations Indexing (Descriptive and Subject) Formulating query in terms of descriptors Storage of profiles Storage of Documents Information Storage and Retrieval System Adapted from Soergel, p. 19

SLIDE 19IS 202 – FALL 2002 Structure of an IR System Search Line Interest profiles & Queries Documents & data Rules of the game = Rules for subject indexing + Thesaurus (which consists of Lead-In Vocabulary and Indexing Language Storage Line Potentially Relevant Documents Comparison/ Matching Store1: Profiles/ Search requests Store2: Document representations Indexing (Descriptive and Subject) Formulating query in terms of descriptors Storage of profiles Storage of Documents Information Storage and Retrieval System Adapted from Soergel, p. 19

SLIDE 20IS 202 – FALL 2002 Lecture Overview Review –Introduction to Information Retrieval –The Information Seeking Process –History of IR Research IR System Structure (revisited) Central Concepts in IR Boolean Logic Boolean IR Systems Credit for some of the slides in this lecture goes to Marti Hearst

SLIDE 21IS 202 – FALL 2002 Central Concepts in IR Documents Queries Collections Evaluation Relevance

SLIDE 22IS 202 – FALL 2002 Documents What do we mean by a document? –Full document? –Document surrogates? –Pages? Buckland (JASIS, Sept. 1997) “What is a Document” Are IR systems better called Document Retrieval systems? A document is a representation of some aggregation of information, treated as a unit

SLIDE 23IS 202 – FALL 2002 Collection A collection is some physical or logical aggregation of documents –A database –A Library –A index? –Others?

SLIDE 24IS 202 – FALL 2002 Queries A query is some expression of a user’s information needs Can take many forms –Natural language description of need –Formal query in a query language Queries may not be accurate expressions of the information need –Differences between conversation with a person and formal query expression

SLIDE 25IS 202 – FALL 2002 Evaluation Why Evaluate? What to Evaluate? How to Evaluate?

SLIDE 26IS 202 – FALL 2002 Why Evaluate? Determine if the system is desirable Make comparative assessments Others?

SLIDE 27IS 202 – FALL 2002 What to Evaluate? How much of the information need was satisfied How much was learned about a topic Incidental learning –How much was learned about the collection –How much was learned about other topics How inviting the system is

SLIDE 28IS 202 – FALL 2002 What to Evaluate? What can be measured that reflects users’ ability to use system? (Cleverdon 66) –Coverage of Information –Form of Presentation –Effort required/Ease of Use –Time and Space Efficiency –Recall proportion of relevant material actually retrieved –Precision proportion of retrieved material actually relevant effectiveness

SLIDE 29IS 202 – FALL 2002 Relevance (introduction) In what ways can a document be relevant to a query? –Answer precise question precisely –Who is buried in grant’s tomb? Grant –Partially answer question –Where is Danville? Near Walnut Creek –Suggest a source for more information –What is lymphodema? Look in this Medical Dictionary… –Give background information –Remind the user of other knowledge –Others...

SLIDE 30IS 202 – FALL 2002 Relevance “Intuitively, we understand quite well what relevance means. It is a primitive ‘y’ know’ concept, as is information for which we hardly need a definition. … if and when any productive contact [in communication] is desired, consciously or not, we involve and use this intuitive notion or relevance.” »Saracevic, 1975 p. 324

SLIDE 31IS 202 – FALL 2002 Relevance How relevant is the document –for this user, for this information need. Subjective, but Measurable to some extent –How often do people agree a document is relevant to a query? How well does it answer the question? –Complete answer? Partial? –Background Information? –Hints for further exploration?

SLIDE 32IS 202 – FALL 2002 Relevance Research and Thought Review to 1975 by Saracevic Reconsideration of user-centered relevance by Schamber, Eisenberg and Nilan, 1990 Special Issue of JASIS on relevance (April 1994, 45(3))

SLIDE 33IS 202 – FALL 2002 Saracevic Relevance is considered as a measure of effectiveness of the contact between a source and a destination in a communications process –Systems view –Destinations view –Subject Literature view –Subject Knowledge view –Pertinence –Pragmatic view

SLIDE 34IS 202 – FALL 2002 Define your own relevance Relevance is the (A) gage of relevance of an (B) aspect of relevance existing between an (C) object judged and a (D) frame of reference as judged by an (E) assessor Where… From Saracevic, 1975 and Schamber 1990

SLIDE 35IS 202 – FALL 2002 A. Gages Measure Degree Extent Judgement Estimate Appraisal Relation

SLIDE 36IS 202 – FALL 2002 B. Aspect Utility Matching Informativeness Satisfaction Appropriateness Usefulness Correspondence

SLIDE 37IS 202 – FALL 2002 C. Object judged Document Document representation Reference Textual form Information provided Fact Article

SLIDE 38IS 202 – FALL 2002 D. Frame of reference Question Question representation Research stage Information need Information used Point of view request

SLIDE 39IS 202 – FALL 2002 E. Assessor Requester Intermediary Expert User Person Judge Information specialist

SLIDE 40IS 202 – FALL 2002 Schamber, Eisenberg and Nilan “Relevance is the measure of retrieval performance in all information systems, including full-text, multimedia, question- answering, database management and knowledge-based systems.” Systems-oriented relevance: Topicality User-Oriented relevance Relevance as a multi-dimensional concept

SLIDE 41IS 202 – FALL 2002 Schamber, et al. Conclusions “Relevance is a multidimensional concept whose meaning is largely dependent on users’ perceptions of information and their own information need situations Relevance is a dynamic concept that depends on users’ judgements of the quality of the relationship between information and information need at a certain point in time. Relevance is a complex but systematic and measureable concept if approached conceptually and operationally from the user’s perspective.”

SLIDE 42IS 202 – FALL 2002 Froelich Centrality and inadequacy of Topicality as the basis for relevance Suggestions for a synthesis of views

SLIDE 43IS 202 – FALL 2002 Janes’ View Topicality Pertinence Relevance Utility Satisfaction

SLIDE 44IS 202 – FALL 2002 Lecture Overview Review –Introduction to Information Retrieval –The Information Seeking Process –History of IR Research IR System Structure (revisited) Central Concepts in IR Boolean Logic Boolean IR Systems Credit for some of the slides in this lecture goes to Marti Hearst

SLIDE 45IS 202 – FALL 2002 Query Languages A way to express the question (information need) Types: –Boolean –Natural Language –Stylized Natural Language –Form-Based (GUI)

SLIDE 46IS 202 – FALL 2002 Simple query language: Boolean –Terms + Connectors (or operators) –terms words normalized (stemmed) words phrases thesaurus terms –connectors AND OR NOT

SLIDE 47IS 202 – FALL 2002 Boolean Queries Cat Cat OR Dog Cat AND Dog (Cat AND Dog) (Cat AND Dog) OR Collar (Cat AND Dog) OR (Collar AND Leash) (Cat OR Dog) AND (Collar OR Leash)

SLIDE 48IS 202 – FALL 2002 Boolean Queries (Cat OR Dog) AND (Collar OR Leash) –Each of the following combinations works: Catxxxx Dogxxxxx Collarxxxx Leashxxxx

SLIDE 49IS 202 – FALL 2002 Boolean Queries (Cat OR Dog) AND (Collar OR Leash) –None of the following combinations work: Catxx Dogxx Collarxx Leashxx

SLIDE 50IS 202 – FALL 2002 Boolean Logic A B

SLIDE 51IS 202 – FALL 2002 Boolean Queries –Usually expressed as INFIX operators in IR ((a AND b) OR (c AND b)) –NOT is UNARY PREFIX operator ((a AND b) OR (c AND (NOT b))) –AND and OR can be n-ary operators (a AND b AND c AND d) –Some rules - (De Morgan revisited) NOT(a) AND NOT(b) = NOT(a OR b) NOT(a) OR NOT(b)= NOT(a AND b) NOT(NOT(a)) = a

SLIDE 52IS 202 – FALL 2002 Boolean Logic 3t33t3 1t11t1 2t22t2 1D11D1 2D22D2 3D33D3 4D44D4 5D55D5 6D66D6 8D88D8 7D77D7 9D99D9 10 D D 11 m1m1 m2m2 m3m3 m5m5 m4m4 m7m7 m8m8 m6m6 m 2 = t 1 t 2 t 3 m 1 = t 1 t 2 t 3 m 4 = t 1 t 2 t 3 m 3 = t 1 t 2 t 3 m 6 = t 1 t 2 t 3 m 5 = t 1 t 2 t 3 m 8 = t 1 t 2 t 3 m 7 = t 1 t 2 t 3

SLIDE 53IS 202 – FALL 2002 Boolean Searching “Measurement of the width of cracks in prestressed concrete beams” Formal Query: cracks AND beams AND Width_measurement AND Prestressed_concrete Cracks Beams Width measurement Prestressed concrete Relaxed Query: (C AND B AND P) OR (C AND B AND W) OR (C AND W AND P) OR (B AND W AND P)

SLIDE 54IS 202 – FALL 2002 Psuedo-Boolean Queries A new notation, from web search –+cat dog +collar leash Does not mean the same thing! Need a way to group combinations. Phrases: –“stray cat” AND “frayed collar” –+“stray cat” + “frayed collar”

SLIDE 55IS 202 – FALL 2002 Another View of IR Information need Index Pre-process Parse Collections Rank Query text input

SLIDE 56IS 202 – FALL 2002 Result Sets Run a query, get a result set Two choices –Reformulate query, run on entire collection –Reformulate query, run on result set Example: Dialog query (Redford AND Newman) -> S documents (S1 AND Sundance) ->S2 898 documents

Information need Index Pre-process Parse Collections Rank Query text input Reformulated Query Re-Rank

SLIDE 58IS 202 – FALL 2002 Feedback Queries Information need Index Pre-process Parse Collections Rank Query text input Reformulated Query Re-Rank

SLIDE 59IS 202 – FALL 2002 Ordering of Retrieved Documents Pure Boolean has no ordering In practice: –order chronologically –order by total number of “hits” on query terms What if one term has more hits than others? Is it better to one of each term or many of one term? Fancier methods have been investigated –p-norm is most famous usually impractical to implement usually hard for user to understand

SLIDE 60IS 202 – FALL 2002 Boolean Advantages –simple queries are easy to understand –relatively easy to implement Disadvantages –difficult to specify what is wanted –too much returned, or too little –ordering not well determined Dominant language in commercial systems until the WWW

SLIDE 61IS 202 – FALL 2002 Faceted Boolean Query Strategy: break query into facets (polysemous with earlier meaning of facets) –conjunction of disjunctions a1 OR a2 OR a3 b1 OR b2 c1 OR c2 OR c3 OR c4 –each facet expresses a topic “rain forest” OR jungle OR amazon medicine OR remedy OR cure Smith OR Zhou AND

SLIDE 62IS 202 – FALL 2002 Faceted Boolean Query Query still fails if one facet missing Alternative: Coordination level ranking –Order results in terms of how many facets (disjuncts) are satisfied –Also called Quorum ranking, Overlap ranking, and Best Match Problem: Facets still undifferentiated Alternative: assign weights to facets

SLIDE 63IS 202 – FALL 2002 Proximity Searches Proximity: terms occur within K positions of one another –pen w/5 paper A “Near” function can be more vague –near(pen, paper) Sometimes order can be specified Also, Phrases and Collocations –“United Nations” “Bill Clinton” Phrase Variants –“retrieval of information” “information retrieval”

SLIDE 64IS 202 – FALL 2002 Filters Filters: Reduce set of candidate docs Often specified simultaneous with query Usually restrictions on metadata –restrict by: date range internet domain (.edu.com.berkeley.edu) author size limit number of documents returned

SLIDE 65IS 202 – FALL 2002 Lecture Overview Review –Introduction to Information Retrieval –The Information Seeking Process –History of IR Research IR System Structure (revisited) Central Concepts in IR Boolean Logic Boolean IR Systems Credit for some of the slides in this lecture goes to Marti Hearst

SLIDE 66IS 202 – FALL 2002 Boolean Systems Most of the commercial database search systems that pre-date the WWW are based on Boolean search –Dialog, Lexis-Nexis, etc. Most Online Library Catalogs are Boolean systems –E.g. MELVYL Database systems use Boolean logic for searching Many of the search engines sold for intranet search of web sites are Boolean

SLIDE 67IS 202 – FALL 2002 Why Boolean? Easy to implement Efficient searching across very large databases Easy to explain results –“Has to have all of the words…”

SLIDE 68IS 202 – FALL 2002 Assignment 8 Using Lexis-Nexis

SLIDE 69IS 202 – FALL 2002 Next Time Statistical Properties of Text Building access to IR systems -- indexes