Presentation is loading. Please wait.

Presentation is loading. Please wait.

Advisor: Koh Jia-Ling Nonhlanhla Shongwe 2010-09-28 EFFICIENT QUERY EXPANSION FOR ADVERTISEMENT SEARCH WANG.H, LIANG.Y, FU.L, XUE.G, YU.Y SIGIR’09.

Similar presentations


Presentation on theme: "Advisor: Koh Jia-Ling Nonhlanhla Shongwe 2010-09-28 EFFICIENT QUERY EXPANSION FOR ADVERTISEMENT SEARCH WANG.H, LIANG.Y, FU.L, XUE.G, YU.Y SIGIR’09."— Presentation transcript:

1 Advisor: Koh Jia-Ling Nonhlanhla Shongwe 2010-09-28 EFFICIENT QUERY EXPANSION FOR ADVERTISEMENT SEARCH WANG.H, LIANG.Y, FU.L, XUE.G, YU.Y SIGIR’09

2 Preview  Introduction  AdSearch  Bid phrase clustering  Index structure for efficient ad search  Query processing  Experimental evaluation  Conclusion

3 Introduction  Web has become an important venue for advertising e.g Google, Yahoo  Mainly two kinds of advertising channels  Contextual advertising  Sponsored advertising  Ranking: derived from  relevance to the user query  page content

4 Introduction cont’s  Ad’s are characterized by bid phrases  keywords the advertisers choose for their ads  Syntactic approaches suffer low recallrecall  Example  Query: “job training”  Ad: career college  Ad does not have a syntactic match and is not proposed

5 Introduction cont’s  The problem is even worse because  Shorter lengths of ads  Sparsity of the bid phrases  Propose an efficient adsearch solution  Tackle the issues with query expansion

6 AdSearch Overview

7 AdSearch cont’s  Bid phrase clustering  Bipartite Graph Construction for Bid Phrase and Ads  Agglomerative Iterative Clustering

8 Bipartite Graph Construction for Bid Phrase and Ads A, B, C Ad0, Ad1, Ad2, Ad3, Ad4 1. B = 2. A = 3. G = v ba, v bb, v bc 4. G = v a0, v a1, v a2, v a3, v a4 Corpus data C A = Ad0, Ad3 B = Ad1, Ad2, Ad3 C = Ad2, Ad3, Ad4

9 Agglomerative Iterative Clustering  Jaccard Similarity Corpus data C A = Ad0, Ad3 B = Ad1, Ad2, Ad3 C = Ad2, Ad3, Ad4 (A,B) = 1/4 (B,C) = 2/4

10 Agglomerative Iterative Clustering cont’s Corpus data C A = Ad0, Ad3 B = Ad1, Ad2, Ad3 C = Ad2, Ad3, Ad4 A, B, C Ad0, Ad1, Ad2, Ad3, Ad4 Bid-phrases Ads

11 Corpus data C A = Ad0, Ad3 B = Ad1, Ad2, Ad3 C = Ad2, Ad3, Ad4 A, B, C Ad0, Ad1, Ad2, Ad3, Ad4 Bid-phrases (A, B) = 0.25 (A, C) = 0.25 (B, C) = 0.5 Bipartite graph Ads Ad0 = A, Ad1 = B, Ad2 = B, C Ad3 = B, A, C Ad4 = C Ad0, Ad1 = 0 Ad0, Ad2 = 0 Ad0, Ad3 = 0.33 Ad0, Ad4 = 0 Ad1, Ad2 = 0.5 Ad1, Ad3 = 0.33 Ad1, Ad4 = 0 Ad2, Ad3 = 0.66 Ad2, Ad4 =0.5 Ad3, Ad4 =0.33 Merge: Ad2, Ad3 Ad2, Ad4 Ad1, Ad2 Ad0, Ad3 Merge B to C Then A A B, C Ad0 Ad1, Ad4 Ad2, Ad3

12 AdSearch cont’s Index structure for efficient adsearch  Mapping clusters of Bid Phrases to Index Terms  Block-based Index Structure  Dictionaries

13 Mapping clusters of Bid Phrases to Index Terms Clusters B A C D E

14 Block-based Index Structure 3 inverted lists Contains: Index =bid phrase List = ad 1 inverted list Contains: Index =3 bid phrases List = ad and bid phrase Query =B

15 Block-based Index Structure cont’s  Advantages over the traditional method  Similar bit phrases and their corresponding ads are placed together  Merge operations become fewer or even can be avoided  Expanding phrase B with phrase A and C, in the traditional method is not efficient.

16 Dictionaries  Dictionary D  used to record the mapping  Bid phrase to its corresponding artificial words  Locate corresponding block to a bid phrase Bid phrase artificial words (path) A6:0 B6_5:1 C6_5:2

17 Cluster path Number of distinct ads Dictionaries cont’s  Dictionary C (counter dictionary)  used to record number of distinct Ads per cluster Corpus data C A = Ad0, Ad3 B = Ad1, Ad2, Ad3 C = Ad2, Ad3, Ad4 Cluster path Distinct ads 6|Ad0, Ad3|=2 6_5|Ad1, Ad2, Ad3, Ad4| = 4 (6, 2) (6_5, 4)

18 AdSearch cont’s Query processing  Finding Related Bid phrases with Corresponding Ads  Ranking Top-k Relevant Ads

19 Finding Related Bid phrases with Corresponding Ads  The process to find related bid phrases  Input: user queries  Look up the dictionary D to get corresponding artificial words  Find minimum clusters that contain enough ads Bid phrase artificial words (path) A6:0 B6_5:1 C6_5:2 Query: ABD Cluster path Distinct ads 6|Ad0, Ad3|=2 6_5|Ad1, Ad2, Ad3, Ad4| = 4 e.g. Top 2 ads M=1.5 *2 = 3 Bid phrase artificial words (path) A6:0 B6_5:1 C6_5:2 Cluster path Distinct ads 6|Ad0, Ad3|=2 6_5|Ad1, Ad2, Ad3, Ad4| = 4

20 Finding Related Bid phrases with Corresponding Ads  The process to find related bid phrases  Return clusters, those containing at least one bid are stored in one group  Perform a multi-way merge operation to get the final results. AdAd1Ad2Ad3Ad4 Bid phrases AB,CA,B,CC AdAd1Ad2Ad3Ad4 Bid phrases A B,CA,B,C C

21 Ranking Top-k Relevant Ads  A procedure to expand the user query with related bid phrases and get a list of ads  To get the top K  User a scoring function QQuery B(x)Set of related bid phrases Similarity between x and y tfidf(y, ad) term frequency and inverse document frequency

22 Experimental evaluation  Both Chinese and English

23 Experimental evaluation cont’s NameDescription CQS1 (Chinese )or EQS1 (English)Randomly sampled 100 bid phrases and each bid phrase is associated with few distinct ads CQS2 (Chinese )or EQS2 (English)Selected 100 pairs bid phrases, each pair could return ads associated with both bid phrases inside it CQS3 (Chinese )or EQS3 (English)Constructed similarly with queries composed of 3 to 4 bid phrases CQF ( Chinese Frequent Query set)and EQS( English Frequency Query Set ) 100 popular bid phrases to build the CQF and EQF

24 Experimental evaluation cont’s  Evaluation of the clusters step

25 Experimental evaluation cont’s  Efficiency evaluation The adSearch was implemented in fixed and unfixed block sizes The block size is defined as the fraction of distinct ads in the block with regards to the whole ads. AdSearch(0.001) number of distinct ads in each block. For example Chinese data 524, 868 * 0.001 = 525 Chinese data set = 525 Inv= perform query expansion on top of the traditional inverted index

26 Experimental evaluation cont’s  Effectiveness valuation Randomly selected 50 queries 10 people invited to evaluate the returned ads by AdSearch and Baidu.

27 Experimental evaluation cont’s  Effectiveness evaluation

28 Conclusion  Introduced a AdSearch system which consists  Bid phrase clustering  For each bid phrase and ad, it will contract a bipartite graph  Used the agglomerative iterative clustering to cluster similar ads  Index structure for efficient ad search  Used a block-based index structure to index all ads and bid phrases  Used the dictionary to record mappings between bid phrases and ads  Query processing  Explained how ads we retrieved and ranked to get the top-k results

29 THANK YOU

30 Introduction cont’s Back All Docs Relevant Ads Relevant Docs (R) Relevant Ads in the Ads set (Ra ) Q = “job training”


Download ppt "Advisor: Koh Jia-Ling Nonhlanhla Shongwe 2010-09-28 EFFICIENT QUERY EXPANSION FOR ADVERTISEMENT SEARCH WANG.H, LIANG.Y, FU.L, XUE.G, YU.Y SIGIR’09."

Similar presentations


Ads by Google