Query Suggestion. n A variety of automatic or semi-automatic query suggestion techniques have been developed  Goal is to improve effectiveness by matching.

Slides:



Advertisements
Similar presentations
Relevance Feedback & Query Expansion
Advertisements

Relevance Feedback User tells system whether returned/disseminated documents are relevant to query/information need or not Feedback: usually positive sometimes.
Information Retrieval in Practice
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
Chapter 5: Introduction to Information Retrieval
Introduction to Information Retrieval
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
Information Retrieval IR 7. Recap of the last lecture Vector space scoring Efficiency considerations Nearest neighbors and approximations.
Information Retrieval in Practice
Chapter 6 Queries and Interfaces. Keyword Queries n Simple, natural language queries were designed to enable everyone to search n Current search engines.
Search Engines and Information Retrieval
Information Retrieval Ling573 NLP Systems and Applications April 26, 2011.
Search and Retrieval: More on Term Weighting and Document Ranking Prof. Marti Hearst SIMS 202, Lecture 22.
Query Operations: Automatic Local Analysis. Introduction Difficulty of formulating user queries –Insufficient knowledge of the collection –Insufficient.
Database Management Systems, R. Ramakrishnan1 Computing Relevance, Similarity: The Vector Space Model Chapter 27, Part B Based on Larson and Hearst’s slides.
DYNAMIC ELEMENT RETRIEVAL IN A STRUCTURED ENVIRONMENT MAYURI UMRANIKAR.
Chapter 5: Query Operations Baeza-Yates, 1999 Modern Information Retrieval.
CSM06 Information Retrieval Lecture 3: Text IR part 2 Dr Andrew Salway
1 Query Language Baeza-Yates and Navarro Modern Information Retrieval, 1999 Chapter 4.
Recall: Query Reformulation Approaches 1. Relevance feedback based vector model (Rocchio …) probabilistic model (Robertson & Sparck Jones, Croft…) 2. Cluster.
Patent Search QUERY Log Analysis Shariq Bashir Department of Software Technology and Interactive Systems Vienna.
An investigation of query expansion terms Gheorghe Muresan Rutgers University, School of Communication, Information and Library Science 4 Huntington St.,
Automatically obtain a description for a larger cluster of relevant documents Identify terms related to query terms  Synonyms, stemming variations, terms.
Query Operations: Automatic Global Analysis. Motivation Methods of local analysis extract information from local set of documents retrieved to expand.
Chapter 5: Information Retrieval and Web Search
Overview of Search Engines
The College of Saint Rose CSC 460 / CIS 560 – Search and Information Retrieval David Goldschmidt, Ph.D. from Search Engines: Information Retrieval in Practice,
Query Expansion.
Search Engines and Information Retrieval Chapter 1.
Information Retrieval and Web Search Relevance Feedback. Query Expansion Instructor: Rada Mihalcea Class web page:
COMP423.  Query expansion  Two approaches ◦ Relevance feedback ◦ Thesaurus-based  Most Slides copied from ◦
Chapter 6 Queries and Interfaces. Keyword Queries n Simple, natural language queries were designed to enable everyone to search n Current search engines.
©2008 Srikanth Kallurkar, Quantum Leap Innovations, Inc. All rights reserved. Apollo – Automated Content Management System Srikanth Kallurkar Quantum Leap.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
Query Expansion By: Sean McGettrick. What is Query Expansion? Query Expansion is the term given when a search engine adding search terms to a user’s weighted.
Query Operations J. H. Wang Mar. 26, The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text.
Query Operations. Query Models n IR Systems usually adopt index terms to process queries; n Index term: u A keyword or a group of selected words; u Any.
1 Query Operations Relevance Feedback & Query Expansion.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Probabilistic Query Expansion Using Query Logs Hang Cui Tianjin University, China Ji-Rong Wen Microsoft Research Asia, China Jian-Yun Nie University of.
Chapter 6: Information Retrieval and Web Search
1 Computing Relevance, Similarity: The Vector Space Model.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
CPSC 404 Laks V.S. Lakshmanan1 Computing Relevance, Similarity: The Vector Space Model Chapter 27, Part B Based on Larson and Hearst’s slides at UC-Berkeley.
Comparing and Ranking Documents Once our search engine has retrieved a set of documents, we may want to Rank them by relevance –Which are the best fit.
Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
1 Opinion Retrieval from Blogs Wei Zhang, Clement Yu, and Weiyi Meng (2007 CIKM)
Evaluation of (Search) Results How do we know if our results are any good? Evaluating a search engine  Benchmarks  Precision and recall Results summaries:
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
Query Expansion By: Sean McGettrick. What is Query Expansion? Query Expansion is the term given when a search engine adding search terms to a user’s weighted.
© 2004 Chris Staff CSAW’04 University of Malta of 15 Expanding Query Terms in Context Chris Staff and Robert Muscat Department of.
Vector Space Models.
CS276A Text Information Retrieval, Mining, and Exploitation Lecture 8 31 Oct 2002.
Information Retrieval and Web Search Relevance Feedback. Query Expansion Instructor: Rada Mihalcea.
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
Term Weighting approaches in automatic text retrieval. Presented by Ehsan.
Lecture 11: Relevance Feedback & Query Expansion
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
Introduction to Information Retrieval Introduction to Information Retrieval Information Retrieval and Web Search Lecture 9: Relevance feedback & query.
Chapter 6 Queries and Interfaces. Keyword Queries n Simple, natural language queries were designed to enable everyone to search n Current search engines.
Query expansion COMP423. Menu Query expansion Two approaches Relevance feedback Thesaurus-based Most Slides copied from
Lecture 9: Query Expansion. This lecture Improving results For high recall. E.g., searching for aircraft doesn’t match with plane; nor thermodynamic with.
Queries and Interfaces
Lecture 12: Relevance Feedback & Query Expansion - II
Information Retrieval in Practice
Multimedia Information Retrieval
Relevance Feedback & Query Expansion
Presentation transcript:

Query Suggestion

n A variety of automatic or semi-automatic query suggestion techniques have been developed  Goal is to improve effectiveness by matching related/similar terms  Semi-automatic techniques require user interaction to select best suggested terms n Query expansion is a related technique  Alternative queries, usually offer more terms 2

Query Suggestion n Approaches usually based on an analysis of term co- occurrence  Either in the entire document collection, a large collection of queries, or the top-ranked documents in a result list  Query-based stemming also a suggestion technique n Automatic suggestion based on general thesaurus not effective  Does not take context into account, e.g., “aquarium” is a good suggestion for “tank” in the query “tropical fish tank”, but not for “armor for tanks” 3

Term Association Measures n Dice’s Coefficient where stands for rank equivalent n Mutual Information Measure (MIM) where N is the number of documents in a collection P(a) = n a /N, P(b) = n b /N, P(a, b) = n ab /N 4 = rank Measures the extent to which words co- occurrence independently

Term Association Measures n Mutual Information measure (MIM) favors low frequency terms n Expected Mutual Information Measure (EMIM) addresses the problem of MIM by weighting MIM using P(a, b)  Actually only 1 part of EMIM focused on word occurrence  EMIM, however, favors high frequency terms 5

Term Association Measures n Pearson’s Chi-squared (χ 2 ) measure  Compares the number of co-occurrences of two words with the expected number of co-occurrences if the two words were independent  Normalizes this comparison by the expected number  Also limited form focused on word co-occurrence 6 Expected number of co- occurrence if the words occur independently Favors low- frequency terms

Association Measure Summary 7

Association Measure Example Most strongly associated words for “tropical” in a collection of TREC news stories. Co-occurrence counts are measured at the document level. 8 Identical ranking & favor low- frequency words More general than MIM & X 2

Association Measure Example Most strongly associated words for “fish”, a high frequent term, in a collection of TREC news stories. 9 Similar Top- ranked words in MIM & X 2

Association Measure Example Most strongly associated words for “fish” in a collection of TREC news stories. Co-occurrence counts are measured in windows of 5 words. 10 Still favor low-frequency terms Most stable & reliable regardless of the window sizes

Association Measures n Associated words are of little use for expanding the query “tropical fish” n Expansion based on whole query takes context into account  e.g., using Dice with term “tropical fish” gives the following highly associated words: goldfish, reptile, aquarium, coral, frog, exotic, stripe, regent, pet, wet n Impractical for all possible queries, other approaches used to achieve this effect 11

Other Approaches n Pseudo-relevance feedback  Expansion terms based on top retrieved docs for initial query n Context vectors  Represent words by the words that co-occur with them e.g., top 35 most strongly associated words for “aquarium” (using Dice’s coefficient):  Rank words for a query by ranking context vectors n Challenges (computational & accuracy): due to huge size & variability in quality of the collections 12

Other Approaches n Query logs  Best source of information about queries & related terms short pieces of text & click data  e.g., most frequent words in queries containing “tropical fish” from MSN log: stores, pictures, live, sale, types, clipart, blue, freshwater, aquarium, supplies  Query suggestion based on finding similar queries group based on click data 13

Query Expansion n Search engines suggest expanded/alternative queries in response to a query Q  Using some form of thesaurus to perform global analysis For each term t in Q, Q is expanded with synonyms and related words of t from the thesaurus 14

Query Expansion n Methods for building a thesaurus for query expansion 1. Use of a controlled vocabulary maintained by human editors, such as the Library of Congress subject headings (LCSH), e.g., The LCSH of “American Revolutionary War” is United States – History -- Revolution, An automatically derived thesaurus, constructed using word co-occurrence statistics over a collection of docs 3. Query reformulations based on query log mining by exploring the manual query reformulations of other users to make suggestions to a user Thesaurus-based query expansion does not require any user input to increase recall 15

Query Expansion n Automatic thesaurus generation using word co-occurrence  A simple approach is based on term-term similarities Start with a term-document matrix A, where each cell A t,d is a weighted count of w t,d for term t & document d Calculate C = AA T in which C u,v is a similarity score between terms u and v, the larger the number, the better An example of a derived t hesaurus with good/bad suggestions 16

Query Expansion n The quality of term association is typically a problem in an automatically generated thesaurus  Term ambiguity easily introduces irrelevant statistically correlated terms, such as “Apple” can be expanded to “Apple red fruit computer” Suffer from false positives (FP) and false negatives (FN)  High cost to manually produce and update a thesaurus  Query expansion often increases recall, but may also significantly decease precision, especially when the query contains ambiguous terms, e.g., interest rate  interest rate fascinate evaluate is unlikely to be useful 17