Generating Query Substitutions Alicia Wood. What is the problem to be solved?

Slides:



Advertisements
Similar presentations
Relevance Feedback User tells system whether returned/disseminated documents are relevant to query/information need or not Feedback: usually positive sometimes.
Advertisements

Relevance Feedback Limitations –Must yield result within at most 3-4 iterations –Users will likely terminate the process sooner –User may get irritated.
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Improvements and extras Paul Thomas CSIRO. Overview of the lectures 1.Introduction to information retrieval (IR) 2.Ranked retrieval 3.Probabilistic retrieval.
Chapter 5: Introduction to Information Retrieval
Introduction to Information Retrieval
Query Chains: Learning to Rank from Implicit Feedback Paper Authors: Filip Radlinski Thorsten Joachims Presented By: Steven Carr.
SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR.
WSCD INTRODUCTION  Query suggestion has often been described as the process of making a user query resemble more closely the documents it is expected.
Explorations in Tag Suggestion and Query Expansion Jian Wang and Brian D. Davison Lehigh University, USA SSM 2008 (Workshop on Search in Social Media)
Search Engines and Information Retrieval
DIMENSIONALITY REDUCTION BY RANDOM PROJECTION AND LATENT SEMANTIC INDEXING Jessica Lin and Dimitrios Gunopulos Ângelo Cardoso IST/UTL December
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) Classic Information Retrieval (IR)
Automatic Discovery of Technology Trends from Patent Text Youngho Kim, Yingshi Tian, Yoonjae Jeong, Ryu Jihee, Sung-Hyon Myaeng School of Engineering Information.
1 Statistical correlation analysis in image retrieval Reporter : Erica Li 2004/9/30.
Chapter 5: Query Operations Baeza-Yates, 1999 Modern Information Retrieval.
A novel log-based relevance feedback technique in content- based image retrieval Reporter: Francis 2005/6/2.
1 Query Language Baeza-Yates and Navarro Modern Information Retrieval, 1999 Chapter 4.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
Presented by Zeehasham Rasheed
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
Information Retrieval
HYPERGEO 1 st technical verification ARISTOTLE UNIVERSITY OF THESSALONIKI Baseline Document Retrieval Component N. Bassiou, C. Kotropoulos, I. Pitas 20/07/2000,
Automatically obtain a description for a larger cluster of relevant documents Identify terms related to query terms  Synonyms, stemming variations, terms.
Chapter 5: Information Retrieval and Web Search
Advisor: Hsin-Hsi Chen Reporter: Chi-Hsin Yu Date:
COMP423: Intelligent Agent Text Representation. Menu – Bag of words – Phrase – Semantics – Bag of concepts – Semantic distance between two words.
Search Engines and Information Retrieval Chapter 1.
Analyzing and Evaluating Query Reformulation Strategies in Web Search Logs ReporterHsan-Yu Lin.
COMP423.  Query expansion  Two approaches ◦ Relevance feedback ◦ Thesaurus-based  Most Slides copied from ◦
1 Wikification CSE 6339 (Section 002) Abhijit Tendulkar.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
1 A Unified Relevance Model for Opinion Retrieval (CIKM 09’) Xuanjing Huang, W. Bruce Croft Date: 2010/02/08 Speaker: Yu-Wen, Hsu.
Question Answering.  Goal  Automatically answer questions submitted by humans in a natural language form  Approaches  Rely on techniques from diverse.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Query Operations J. H. Wang Mar. 26, The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text.
Indices Tomasz Bartoszewski. Inverted Index Search Construction Compression.
1 Query Operations Relevance Feedback & Query Expansion.
1 University of Palestine Topics In CIS ITBS 3202 Ms. Eman Alajrami 2 nd Semester
Chapter 6: Information Retrieval and Web Search
LATENT SEMANTIC INDEXING Hande Zırtıloğlu Levent Altunyurt.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
1 Opinion Retrieval from Blogs Wei Zhang, Clement Yu, and Weiyi Meng (2007 CIKM)
Methods for Automatic Evaluation of Sentence Extract Summaries * G.Ravindra +, N.Balakrishnan +, K.R.Ramakrishnan * Supercomputer Education & Research.
Probabilistic Latent Query Analysis for Combining Multiple Retrieval Sources Rong Yan Alexander G. Hauptmann School of Computer Science Carnegie Mellon.
Collocations and Terminology Vasileios Hatzivassiloglou University of Texas at Dallas.
V. Clustering 인공지능 연구실 이승희 Text: Text mining Page:82-93.
Web Search and Text Mining Lecture 5. Outline Review of VSM More on LSI through SVD Term relatedness Probabilistic LSI.
Concept-based P2P Search How to find more relevant documents Ingmar Weber Max-Planck-Institute for Computer Science Joint work with Holger Bast Torino,
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
Query Suggestions in the Absence of Query Logs Sumit Bhatia, Debapriyo Majumdar,Prasenjit Mitra SIGIR’11, July 24–28, 2011, Beijing, China.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
The Development of a search engine & Comparison according to algorithms Sung-soo Kim The final report.
GENERATING RELEVANT AND DIVERSE QUERY PHRASE SUGGESTIONS USING TOPICAL N-GRAMS ELENA HIRST.
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Query Refinement and Relevance Feedback.
Evaluating Translation Memory Software Francie Gow MA Translation, University of Ottawa Translator, Translation Bureau, Government of Canada
CS791 - Technologies of Google Spring A Web­based Kernel Function for Measuring the Similarity of Short Text Snippets By Mehran Sahami, Timothy.
BioCreAtIvE Critical Assessment for Information Extraction in Biology Granada, Spain, March28-March 31, 2004 Task 2: Functional annotation of gene products.
COMP423: Intelligent Agent Text Representation. Menu – Bag of words – Phrase – Semantics Semantic distance between two words.
Image Retrieval and Ranking using L.S.I and Cross View Learning Sumit Kumar Vivek Gupta
Search Engine and Optimization 1. Agenda Indexing Algorithms Latent Semantic Indexing 2.
Queries and Interfaces
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Do-Gil Lee1*, Ilhwan Kim1 and Seok Kee Lee2
Multimedia Information Retrieval
Chapter 5: Information Retrieval and Web Search
Information Retrieval and Web Design
CoXML: A Cooperative XML Query Answering System
Presentation transcript:

Generating Query Substitutions Alicia Wood

What is the problem to be solved?

Problem Imperfect description of need Search engine not able to retrieve documents matching query Need accurate and related query substitutions

Problem (cont.) Given a query Want to generate modified query (related) –Improvements (specification) –Neutral (spelling change, synonym) –Loss of original meaning (generalization)

Who cares about this problem and why?

Who cares? User typing the query Want correct results with imperfect query

What have others done to solve this problem and why is this inadequate?

Previous Work Relevance/Pseudo relevance feedback Query term deletion Substituting query terms with related terms Latent Semantic Indexing (LSI)

Relevance/Pseudo relevance feedback Submit query for initial retrieval Processing resulting documents Modify the query by expanding with additional terms from documents Perform second retrieval with modified query Can cause query drift Computationally expensive

Query term deletion Loss of specificity from original query

Substituting query terms Relies on an initial retrieval

Latent Semantic Indexing (LSI) Identify patterns in relationships between terms and concepts in unstructured collection of text Computationally expensive

What is the proposed solution to the problem?

Solution Query modification based on pre- computed query and phrase similarity, –Ranking proposed queries –Similar queries /phrases derived from user query sessions –Learned models used to re-rank Based on similarity of new query to original query

Contributions 1.Identification of new source of data to identify similar queries and phrases 2.The definition of a scheme for scoring query suggestions 3.An algorithm to combine query and phrase suggestions –Finds highly and broadly relevant phrases 4.Identification of features that are predictive of highly relevant query suggestions

Classes of Suggestion Relevance Precise rewriting –Match user’s intent, preserve core meaning automobile insurance automotive insurance Approximate rewriting –direct close relationship to topic, scope narrowed or broadened Apple music player ipod shuffle Possible rewriting –Categorical relationship to initial query, complementary product but distinct Eye glasses contact lenses Clear mismatch – no clear relationship Jaguar xj6 os x jaguar

Classes of Rewriting Specific Rewriting (1+2) –closely related query –highly relevant Broad Rewriting (1+2+3) –query expansion –relevant to user interests

Substitutables Initial query -> generate relevant queries –Replace query as whole or phrases –Segment query into phrases –Find query pairs where one segment has changed (britney spears) (mp3s) -> (britney spears) (lyrics) Pair Independence Hypothesis Likelihood Ratio –High value = strong dependence between two terms

Validation 1000 initial queries –Generate single suggestion (q j ) for each Evaluate accuracy of approaches Train machine learned classifier Evaluate ability to produce higher quality suggestions –Word distance, normalized edit distance, number of substitutions Suggestions criteria: –Some words from initial query –Modifications shouldn’t be made at start of query

Future Work Build semantic classifier –Predict semantic class of rewriting Take inspiration from machine translation techniques Introduce language model –Avoid producing nonsensical queries