Download presentation
Presentation is loading. Please wait.
Published byAusten Montgomery Modified over 9 years ago
1
1 Cross-Lingual Query Suggestion Using Query Logs of Different Languages SIGIR 07
2
2 Abstract Query suggestion –To suggest relevant queries for a given query –To help users better specify their information needs Cross-Lingual Query Suggestion (CLQS): –For a query in one language, we suggest similar or relevant queries in other languages. cross-lingual keyword bidding (Search Engine) cross-language information retrieval (CLIR)
3
3 Introduction CLQS vs. Cross-Lingual Query Expansion –Full queries formulated by users in another language. The users of search engines –similar interests in the same period of time –queries on similar topics in different languages Key point –How to learn a similarity measure between two queries –MLQS: Term Co-Occurrence based MI and c 2
4
4 Estimating Cross-Lingual Query similarity Discriminative Model for Estimating Cross-Lingual Query Similarity Monolingual Query Similarity Measure Based on Click-through Information Features Used for Learning Cross-Lingual Query Similarity Measure –Bilingual Dictionary –Parallel Corpora –Online Mining for Related Queries –Monolingual Query Suggestion Estimating Cross-lingual Query Similarity
5
5 Discriminative Model for Estimating Cross-Lingual Query Similarity – 1/2 –q f : a source language query –q e : a target language query –sim ML : Monolingual query similarity –sim CL : Cross-lingual query similarity –T qf : translation of q f in the target language
6
6 Discriminative Model for Estimating Cross-Lingual Query Similarity – 2/2 Learning: LIBSVM regression algorithm – f : feature functions – f : mapping feature space onto kernel space –w : weight vector in the kernel space – relevant vs. irrelevant –strongly relevant, weakly relevant or irrelevant
7
7 Estimating Cross-Lingual Query similarity Discriminative Model for Estimating Cross-Lingual Query Similarity Monolingual Query Similarity Measure Based on Click-through Information Features Used for Learning Cross-Lingual Query Similarity Measure –Bilingual Dictionary –Parallel Corpora –Online Mining for Related Queries –Monolingual Query Suggestion Estimating Cross-lingual Query Similarity
8
8 Monolingual Query Similarity Measure Based on Click-through Information click-through information in query logs [26] KN(x) : number of keyword in a query x RD(x) : number of clicked URLs for a query x a = 0.4, b =0.6
9
9 Estimating Cross-Lingual Query similarity Discriminative Model for Estimating Cross-Lingual Query Similarity Monolingual Query Similarity Measure Based on Click-through Information Features Used for Learning Cross-Lingual Query Similarity Measure –Bilingual Dictionary –Parallel Corpora –Online Mining for Related Queries –Monolingual Query Suggestion Estimating Cross-lingual Query Similarity
10
10 1. Bilingual Dictionary – 1/2 –120,000 unique entries (built-in-house) –Given an input query q f ={w f1,w f2,…,w fn } (in source language) –By bilingual dictionary D: D(w fi )={t i1,t i2,…,t im } –C(x,y) is the number of queries in the log containing both x and y. –C(x) is the number of queries in the log containing x. –N is the total number of queries in the log
11
11 1. Bilingual Dictionary – 2/2 – –The set of top-4 query translations is denoted as S(T qf ) – T S(T qf ) Retrieve all queries containing T in target language and assign S dict (T) as their value
12
12 2. Parallel Corpora –Given a pair of queries q f : in the source language q e : in the target language –Bi-Directional Translation Score : IBM model 1 & GIZA++ tool P(y j |x i ) is the word to word translation probability –Top 10 queries {q e } with q f from the query log
13
13 3. Online Mining for Related Queries – 1/3 OOV is a major knowledge bottleneck for query translation and CLIR Assumption : –A query in the target co-occurs with the source query in many web pages –They are probably semantically related –but, amount of noise
14
14 3. Online Mining for Related Queries – 2/3 –Frequency in the Snippets For example: –Given a query q=abc in source language –By dictionary : a={a 1,a 2,a 3 }, b={b 1,b 2 } and c={c 1 } –Web query : q ^ (a 1 v a 2 v a 3 ) ^ (b 1 v b 2 ) ^ (c 1 ) in target language –700 snippets, most frequent 10 target queries
15
15 3. Online Mining for Related Queries – 3/3 –Any query q e mined from the web will be associated with a feature CODC Measure with S CODC (q f,q e )
16
16 4. Monolingual Query Suggestion Q 0 : candidate queries (in target language) –For each target query q e, SQ ML (q e ) : monolingual source query
17
17 Estimating Cross-Lingual Query similarity Discriminative Model for Estimating Cross-Lingual Query Similarity Monolingual Query Similarity Measure Based on Click-through Information Features Used for Learning Cross-Lingual Query Similarity Measure –Bilingual Dictionary –Parallel Corpora –Online Mining for Related Queries –Monolingual Query Suggestion Estimating Cross-lingual Query Similarity
18
18 Estimating Cross-lingual Query Similarity Four categories of features are used to learn the cross-lingual query similarity. cross-lingual query similarity score –Learning: LIBSVM regression algorithm f : feature functions f : mapping feature space onto kernel space w : weight vector in the kernel space
19
19 Performance Evaluation – Log Data Data Resources : –MSN Search Engine French (source language) vs. English ( target language) –A one-month English query log –7 million unique English queries –Occurrence frequency more than 5 5,000 French queries –4,171 queries have their translations in the English queries –70% training weight of LIBSVM –10% development data –20% testing
20
20 Performance Evaluation - CLIR Data Resources : –TREC6 CLIR data (AP88-90 newswire, 750MB) –25 short French-English queries Pairs (CL1-CL25) average long 3.3 match in the web query logs for training CLQS Source Language Target Language BM25 CLIR CLQS{qe}{qe} qfqf
21
21 CLQS
22
22
23
23 CLIR
24
24 Conclusion Cross-lingual query suggestion Query Logs French to English TREC6 French to English CLIR task –CLQO demonstrates the high quality
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.