Advisor: Koh Jia-Ling Nonhlanhla Shongwe 2010-09-28 EFFICIENT QUERY EXPANSION FOR ADVERTISEMENT SEARCH WANG.H, LIANG.Y, FU.L, XUE.G, YU.Y SIGIR’09.

Advisor: Koh Jia-Ling Nonhlanhla Shongwe 2010-09-28 EFFICIENT QUERY EXPANSION FOR ADVERTISEMENT SEARCH WANG.H, LIANG.Y, FU.L, XUE.G, YU.Y SIGIR’09

Preview  Introduction  AdSearch  Bid phrase clustering  Index structure for efficient ad search  Query processing  Experimental evaluation  Conclusion

Introduction  Web has become an important venue for advertising e.g Google, Yahoo  Mainly two kinds of advertising channels  Contextual advertising  Sponsored advertising  Ranking: derived from  relevance to the user query  page content

Introduction cont’s  Ad’s are characterized by bid phrases  keywords the advertisers choose for their ads  Syntactic approaches suffer low recallrecall  Example  Query: “job training”  Ad: career college  Ad does not have a syntactic match and is not proposed

Introduction cont’s  The problem is even worse because  Shorter lengths of ads  Sparsity of the bid phrases  Propose an efficient adsearch solution  Tackle the issues with query expansion

AdSearch Overview

AdSearch cont’s  Bid phrase clustering  Bipartite Graph Construction for Bid Phrase and Ads  Agglomerative Iterative Clustering

Bipartite Graph Construction for Bid Phrase and Ads A, B, C Ad0, Ad1, Ad2, Ad3, Ad4 1. B = 2. A = 3. G = v ba, v bb, v bc 4. G = v a0, v a1, v a2, v a3, v a4 Corpus data C A = Ad0, Ad3 B = Ad1, Ad2, Ad3 C = Ad2, Ad3, Ad4

Agglomerative Iterative Clustering  Jaccard Similarity Corpus data C A = Ad0, Ad3 B = Ad1, Ad2, Ad3 C = Ad2, Ad3, Ad4 (A,B) = 1/4 (B,C) = 2/4

Agglomerative Iterative Clustering cont’s Corpus data C A = Ad0, Ad3 B = Ad1, Ad2, Ad3 C = Ad2, Ad3, Ad4 A, B, C Ad0, Ad1, Ad2, Ad3, Ad4 Bid-phrases Ads

Corpus data C A = Ad0, Ad3 B = Ad1, Ad2, Ad3 C = Ad2, Ad3, Ad4 A, B, C Ad0, Ad1, Ad2, Ad3, Ad4 Bid-phrases (A, B) = 0.25 (A, C) = 0.25 (B, C) = 0.5 Bipartite graph Ads Ad0 = A, Ad1 = B, Ad2 = B, C Ad3 = B, A, C Ad4 = C Ad0, Ad1 = 0 Ad0, Ad2 = 0 Ad0, Ad3 = 0.33 Ad0, Ad4 = 0 Ad1, Ad2 = 0.5 Ad1, Ad3 = 0.33 Ad1, Ad4 = 0 Ad2, Ad3 = 0.66 Ad2, Ad4 =0.5 Ad3, Ad4 =0.33 Merge: Ad2, Ad3 Ad2, Ad4 Ad1, Ad2 Ad0, Ad3 Merge B to C Then A A B, C Ad0 Ad1, Ad4 Ad2, Ad3

AdSearch cont’s Index structure for efficient adsearch  Mapping clusters of Bid Phrases to Index Terms  Block-based Index Structure  Dictionaries

Mapping clusters of Bid Phrases to Index Terms Clusters B A C D E

Block-based Index Structure 3 inverted lists Contains: Index =bid phrase List = ad 1 inverted list Contains: Index =3 bid phrases List = ad and bid phrase Query =B

Block-based Index Structure cont’s  Advantages over the traditional method  Similar bit phrases and their corresponding ads are placed together  Merge operations become fewer or even can be avoided  Expanding phrase B with phrase A and C, in the traditional method is not efficient.

Dictionaries  Dictionary D  used to record the mapping  Bid phrase to its corresponding artificial words  Locate corresponding block to a bid phrase Bid phrase artificial words (path) A6:0 B6_5:1 C6_5:2

Cluster path Number of distinct ads Dictionaries cont’s  Dictionary C (counter dictionary)  used to record number of distinct Ads per cluster Corpus data C A = Ad0, Ad3 B = Ad1, Ad2, Ad3 C = Ad2, Ad3, Ad4 Cluster path Distinct ads 6|Ad0, Ad3|=2 6_5|Ad1, Ad2, Ad3, Ad4| = 4 (6, 2) (6_5, 4)

AdSearch cont’s Query processing  Finding Related Bid phrases with Corresponding Ads  Ranking Top-k Relevant Ads

Finding Related Bid phrases with Corresponding Ads  The process to find related bid phrases  Input: user queries  Look up the dictionary D to get corresponding artificial words  Find minimum clusters that contain enough ads Bid phrase artificial words (path) A6:0 B6_5:1 C6_5:2 Query: ABD Cluster path Distinct ads 6|Ad0, Ad3|=2 6_5|Ad1, Ad2, Ad3, Ad4| = 4 e.g. Top 2 ads M=1.5 *2 = 3 Bid phrase artificial words (path) A6:0 B6_5:1 C6_5:2 Cluster path Distinct ads 6|Ad0, Ad3|=2 6_5|Ad1, Ad2, Ad3, Ad4| = 4

Finding Related Bid phrases with Corresponding Ads  The process to find related bid phrases  Return clusters, those containing at least one bid are stored in one group  Perform a multi-way merge operation to get the final results. AdAd1Ad2Ad3Ad4 Bid phrases AB,CA,B,CC AdAd1Ad2Ad3Ad4 Bid phrases A B,CA,B,C C

Ranking Top-k Relevant Ads  A procedure to expand the user query with related bid phrases and get a list of ads  To get the top K  User a scoring function QQuery B(x)Set of related bid phrases Similarity between x and y tfidf(y, ad) term frequency and inverse document frequency

Experimental evaluation  Both Chinese and English

Experimental evaluation cont’s NameDescription CQS1 (Chinese )or EQS1 (English)Randomly sampled 100 bid phrases and each bid phrase is associated with few distinct ads CQS2 (Chinese )or EQS2 (English)Selected 100 pairs bid phrases, each pair could return ads associated with both bid phrases inside it CQS3 (Chinese )or EQS3 (English)Constructed similarly with queries composed of 3 to 4 bid phrases CQF ( Chinese Frequent Query set)and EQS( English Frequency Query Set ) 100 popular bid phrases to build the CQF and EQF

Experimental evaluation cont’s  Evaluation of the clusters step

Experimental evaluation cont’s  Efficiency evaluation The adSearch was implemented in fixed and unfixed block sizes The block size is defined as the fraction of distinct ads in the block with regards to the whole ads. AdSearch(0.001) number of distinct ads in each block. For example Chinese data 524, 868 * 0.001 = 525 Chinese data set = 525 Inv= perform query expansion on top of the traditional inverted index

Experimental evaluation cont’s  Effectiveness valuation Randomly selected 50 queries 10 people invited to evaluate the returned ads by AdSearch and Baidu.

Experimental evaluation cont’s  Effectiveness evaluation

Conclusion  Introduced a AdSearch system which consists  Bid phrase clustering  For each bid phrase and ad, it will contract a bipartite graph  Used the agglomerative iterative clustering to cluster similar ads  Index structure for efficient ad search  Used a block-based index structure to index all ads and bid phrases  Used the dictionary to record mappings between bid phrases and ads  Query processing  Explained how ads we retrieved and ranked to get the top-k results

THANK YOU

Introduction cont’s Back All Docs Relevant Ads Relevant Docs (R) Relevant Ads in the Ads set (Ra ) Q = “job training”

Advisor: Koh Jia-Ling Nonhlanhla Shongwe 2010-09-28 EFFICIENT QUERY EXPANSION FOR ADVERTISEMENT SEARCH WANG.H, LIANG.Y, FU.L, XUE.G, YU.Y SIGIR’09.

Similar presentations

Presentation on theme: "Advisor: Koh Jia-Ling Nonhlanhla Shongwe 2010-09-28 EFFICIENT QUERY EXPANSION FOR ADVERTISEMENT SEARCH WANG.H, LIANG.Y, FU.L, XUE.G, YU.Y SIGIR’09."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Advisor: Koh Jia-Ling Nonhlanhla Shongwe 2010-09-28 EFFICIENT QUERY EXPANSION FOR ADVERTISEMENT SEARCH WANG.H, LIANG.Y, FU.L, XUE.G, YU.Y SIGIR’09.

Similar presentations

Presentation on theme: "Advisor: Koh Jia-Ling Nonhlanhla Shongwe 2010-09-28 EFFICIENT QUERY EXPANSION FOR ADVERTISEMENT SEARCH WANG.H, LIANG.Y, FU.L, XUE.G, YU.Y SIGIR’09."— Presentation transcript:

Similar presentations

About project

Feedback