1 Cross-Lingual Query Suggestion Using Query Logs of Different Languages SIGIR 07.

1 Cross-Lingual Query Suggestion Using Query Logs of Different Languages SIGIR 07

2 Abstract Query suggestion –To suggest relevant queries for a given query –To help users better specify their information needs Cross-Lingual Query Suggestion (CLQS): –For a query in one language, we suggest similar or relevant queries in other languages. cross-lingual keyword bidding (Search Engine) cross-language information retrieval (CLIR)

3 Introduction CLQS vs. Cross-Lingual Query Expansion –Full queries formulated by users in another language. The users of search engines –similar interests in the same period of time –queries on similar topics in different languages Key point –How to learn a similarity measure between two queries –MLQS: Term Co-Occurrence based MI and c 2

4 Estimating Cross-Lingual Query similarity Discriminative Model for Estimating Cross-Lingual Query Similarity Monolingual Query Similarity Measure Based on Click-through Information Features Used for Learning Cross-Lingual Query Similarity Measure –Bilingual Dictionary –Parallel Corpora –Online Mining for Related Queries –Monolingual Query Suggestion Estimating Cross-lingual Query Similarity

5 Discriminative Model for Estimating Cross-Lingual Query Similarity – 1/2 –q f : a source language query –q e : a target language query –sim ML : Monolingual query similarity –sim CL : Cross-lingual query similarity –T qf : translation of q f in the target language

6 Discriminative Model for Estimating Cross-Lingual Query Similarity – 2/2 Learning: LIBSVM regression algorithm – f : feature functions – f : mapping feature space onto kernel space –w : weight vector in the kernel space – relevant vs. irrelevant –strongly relevant, weakly relevant or irrelevant

8 Monolingual Query Similarity Measure Based on Click-through Information click-through information in query logs [26] KN(x) : number of keyword in a query x RD(x) : number of clicked URLs for a query x a = 0.4, b =0.6

10 1. Bilingual Dictionary – 1/2 –120,000 unique entries (built-in-house) –Given an input query q f ={w f1,w f2,…,w fn } (in source language) –By bilingual dictionary D: D(w fi )={t i1,t i2,…,t im } –C(x,y) is the number of queries in the log containing both x and y. –C(x) is the number of queries in the log containing x. –N is the total number of queries in the log

11 1. Bilingual Dictionary – 2/2 – –The set of top-4 query translations is denoted as S(T qf ) –  T  S(T qf ) Retrieve all queries containing T in target language and assign S dict (T) as their value

12 2. Parallel Corpora –Given a pair of queries q f : in the source language q e : in the target language –Bi-Directional Translation Score : IBM model 1 & GIZA++ tool P(y j |x i ) is the word to word translation probability –Top 10 queries {q e } with q f from the query log

13 3. Online Mining for Related Queries – 1/3 OOV is a major knowledge bottleneck for query translation and CLIR Assumption : –A query in the target co-occurs with the source query in many web pages –They are probably semantically related –but, amount of noise

14 3. Online Mining for Related Queries – 2/3 –Frequency in the Snippets For example: –Given a query q=abc in source language –By dictionary : a={a 1,a 2,a 3 }, b={b 1,b 2 } and c={c 1 } –Web query : q ^ (a 1 v a 2 v a 3 ) ^ (b 1 v b 2 ) ^ (c 1 ) in target language –700 snippets, most frequent 10 target queries

15 3. Online Mining for Related Queries – 3/3 –Any query q e mined from the web will be associated with a feature CODC Measure with S CODC (q f,q e )

16 4. Monolingual Query Suggestion Q 0 : candidate queries (in target language) –For each target query q e, SQ ML (q e ) : monolingual source query

18 Estimating Cross-lingual Query Similarity Four categories of features are used to learn the cross-lingual query similarity. cross-lingual query similarity score –Learning: LIBSVM regression algorithm f : feature functions f : mapping feature space onto kernel space w : weight vector in the kernel space

19 Performance Evaluation – Log Data Data Resources : –MSN Search Engine French (source language) vs. English ( target language) –A one-month English query log –7 million unique English queries –Occurrence frequency more than 5 5,000 French queries –4,171 queries have their translations in the English queries –70% training weight of LIBSVM –10% development data –20% testing

20 Performance Evaluation - CLIR Data Resources : –TREC6 CLIR data (AP88-90 newswire, 750MB) –25 short French-English queries Pairs (CL1-CL25) average long 3.3 match in the web query logs for training CLQS Source Language Target Language BM25 CLIR CLQS{qe}{qe} qfqf

21 CLQS

23 CLIR

24 Conclusion Cross-lingual query suggestion Query Logs French to English TREC6 French to English CLIR task –CLQO demonstrates the high quality

1 Cross-Lingual Query Suggestion Using Query Logs of Different Languages SIGIR 07.

Similar presentations

Presentation on theme: "1 Cross-Lingual Query Suggestion Using Query Logs of Different Languages SIGIR 07."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Cross-Lingual Query Suggestion Using Query Logs of Different Languages SIGIR 07.

Similar presentations

Presentation on theme: "1 Cross-Lingual Query Suggestion Using Query Logs of Different Languages SIGIR 07."— Presentation transcript:

Similar presentations

About project

Feedback