Presentation is loading. Please wait.

Presentation is loading. Please wait.

Basic IR: Queries Query is statement of user’s information need. Index is designed to map queries to likely to be relevant documents. Query type, content,

Similar presentations


Presentation on theme: "Basic IR: Queries Query is statement of user’s information need. Index is designed to map queries to likely to be relevant documents. Query type, content,"— Presentation transcript:

1 Basic IR: Queries Query is statement of user’s information need. Index is designed to map queries to likely to be relevant documents. Query type, content, representation dictates what the index must do. Varies from single keywords through specialized query languages to exemplar documents.

2 Documents as Queries “Find other documents like this one.” Query is itself a document; it can go through same sort of pre-processing (e.g., stop word removal, stemming). Characteristics of queries mimic those of documents.

3 Query Term Distribution in SavvySearch

4 Keyword Queries Query is composed of a set of keywords. Retrieve document that best matches keywords. Advantage: easy to use, supports fast indexing Disadvantage: coarse, easy to lead astray (e.g., words with multiple meanings), difficult to express complex information need

5 Boolean Queries combine queries (keywords) with Boolean operators: OR: “children OR kids” AND: “windows AND software” BUT: “unix BUT solaris” no NOT! Advantage: more precise queries Disadvantage: does not support ranking, less intuitive

6 Phrase Search Supplement single terms with phrases: exact sequence of terms Requires index that tracks proximity of terms or stores both singletons and phrases Extension is context: where proximity between terms is stated (e.g., “adele w/2 howe” to CiteSeer)

7 Query Language: Web Search Engines I Altavista Advanced Search: Form: all of these words this exact phrase any of these words none of these words Boolean Expression: AND OR AND NOT NEAR Date, File Type, Location

8 Query Language: Web Search Engines II Google Advanced Search: Find Results: with all of the words with the exact phrase with at least one of the words without the words Language, File Format, Date, Occurrences, Domain, SafeSearch Page-Specific Search: Find pages similar to the page Find pages that link to the page Topic Specific Searches

9 Typical Query Behavior on WWW Query term distribution obeys Zipf’s Law (quite skewed, although skew does drift). Length is ? terms. Few users exploit full power of query languages; most enter terms without operations and do not use advanced search interfaces. Change in behavior?

10 Natural Language Augmented Boolean approach: Treat query as document. Rank documents by how well they match the constraints of the query and return those above a certain threshold. NLP approach: Interpret semantics in a limited way to constrain query (e.g., “who” indicates a person)

11 NL Example: AskJeeves

12 Advanced Querying Pattern Matching: combinations of syntactic features, e.g., regular expressions, wild-card queries Structural Queries: forms, hypertext and hierarchies typically supports iterative querying as in guided browsing (e.g., WebGlimpse or Letitzia)

13 Advanced Querying: Letizia Recommends new pages based on user’s browsing preferences Infers interests by observing user behavior: save bookmark, follow link, spend time on page… Models documents as list of keywords Figure from http://lieber.www.media.mit.edu/people/lieber/Liebrary/Letizia/Letizia.html

14 Caching An astonishing number of people submit the same queries (e.g., “Harry Potter”). Just as single word usage is skewed (Zipf’s Law) so is query submission on WWW. Can exploit this by caching results for oft repeated queries.

15 Single vs. On-Going Queries: Filtering Find new documents from information stream that satisfy a static information need User profile represents interests; threshold represents how closely documents must match. User may provide the query or it may be learned through relevance feedback.

16 Filtering Process Documents Stream User 1 Profile User 2 Profile Docs Filtered for User 2 Docs for User 1 from MIR text


Download ppt "Basic IR: Queries Query is statement of user’s information need. Index is designed to map queries to likely to be relevant documents. Query type, content,"

Similar presentations


Ads by Google