Presentation is loading. Please wait.

Presentation is loading. Please wait.

Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.

Similar presentations


Presentation on theme: "Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics."— Presentation transcript:

1 Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics

2 Introduction Create a system that learns to generate query rewrite from a large amount of user query logs. Use query expansion in Web search for evaluation of rewritten queries. For a given set of randomly selected queries, n-best rewrites are produced. From the changes introduced by the rewrites, expansion terms are extracted and added as alternate form.

3 Example For a query like herbs for chronic constipation AND operator used. Expansion terms added with OR operator. For this sentence remedies, medicine, or supplement are appropriate terms, but in this context spices are not. Herbs for mexican cooking only spices is a good alternative.

4 Goal Use the translation model and language model to expand query terms in context. Translation model proposes expansion candidates. Query language model performs a selection in the context of the surrounding query terms. SMT is readily applicable to this task. Apply to large parallel data of queries on the source side, and snippets of clicked search results on the target side. Snippets introduce noise since they are not complete sentences. TREC Data.

5 Review: Query Expansion by Q- D Term Correlation A session links query terms with a document: Aggregation of clicks over sessions will reflect the preferences of multiple users (probability distribution of doc words given query words from counts over clicked docs D over sessions): This formula considers the Query as a cohesive unit:

6 Review: Machine Translation 1/2 Linear Model for SMT: Find English string e that is a translation of foreign string f using a linear combination of feature function hm(e,f) and weights lambda: Word Alignment: Relationship of translation model and alignment model for source language string f and targe string e is via a hidden variable describing an alignment mapping from source position j to target position aj:

7 Review: Machine Translation 2/2 “Sentence Aligned” parallel training data are prepared by paring user queries with snippets of clicked search results for the respective queries. Phrase Extraction: Maximum-likelihood estimation of sentence aligned strings: Alignment with highest probability:

8 Language Model n-gram language modeling, smoothing for sparse data problems. Ultimate task is to pick appropriate phrase translations in the context of the original query for query expansion.

9 Data Training data for translation model and correlation-base model consists of pairs of queries and snippets for clicked result taken from query logs. 3 billion query-snippet pairs from which a phrase-table of 700 million query-snippet phrase translation is extracted. Trigram trained on English queries in user logs. N-gram cutoffs at minimum frequency of 4. Query were avg. length of 2.6 words. Snippets were avg. length 8.3 words.

10 Query Expansion Use Google, SMT-based system, correlation-based system, and correlation-based system using language model as filter. Expansion terms: 150,000 randomly extracted 3+ word queries rewritten by each of the systems. For each system, expansion terms from 5-best rewrites, and stored in table that maps source phrases to target phrases in context of full query.

11 Evaluation 1/2 3 independent raters, presented with queries and 10-best search results from two systems. 7-point Likert Scale

12 Evaluation 2/2

13 Conclusion SMT model is flexible enough to capture the peculiarities of query-snippet translation. Hope to apply SMT to query suggestions.


Download ppt "Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics."

Similar presentations


Ads by Google