Online Expansion of Rare Queries for Sponsored Search Defended by Mykell Miller.

Online Expansion of Rare Queries for Sponsored Search Defended by Mykell Miller

Summary: The Short Version This paper describes and evaluates a method of determining which ads to display on a search engine result page. Users input varied queries, so it is beneficial to post ads pertaining to not only the query, but to related queries as well. However, previous methods of finding these related queries and transforming them into ads takes a long time, and therefore are done offline. This paper describes a method that allows some of the work to be done on the fly without too much overhead.

Why it’s good: The Short Version Useful Useful Ads fund search enginesAds fund search engines If ads were more relevant, Jared might actually click on themIf ads were more relevant, Jared might actually click on them The method shows statistically significant improvement in making ads more relevant, at a low overheadThe method shows statistically significant improvement in making ads more relevant, at a low overhead Interesting Interesting Interestingness is subjective, but this is MY defenseInterestingness is subjective, but this is MY defense Well-written Well-written Well-organizedWell-organized I could actually understand the math because they very clearly told me what all the variables meantI could actually understand the math because they very clearly told me what all the variables meant They defined all the relevant terms and summarized all the references so I didn’t have to read 32 other papers.They defined all the relevant terms and summarized all the references so I didn’t have to read 32 other papers. Time Travel Time Travel This paper is only three weeks oldThis paper is only three weeks old A paper that was published in April cited itA paper that was published in April cited it

Now for the long version…

What this paper is about Broad matching is where an ad is displayed when its bid phrase is similar to, but not exactly, the query the user inputted. Broad matching is where an ad is displayed when its bid phrase is similar to, but not exactly, the query the user inputted.

What this paper is about Sponsored Search Sponsored Search A.K.A. Paid search advertisingA.K.A. Paid search advertising On Search Engine Result PagesOn Search Engine Result Pages All major web search engines do thisAll major web search engines do this Context Match Context Match A.K.A. Contextual AdvertisingA.K.A. Contextual Advertising On other websitesOn other websites What we looked at last WednesdayWhat we looked at last Wednesday

More on Sponsored Search The authors assume a pay-per-click model The authors assume a pay-per-click model Google, Yahoo, and Microsoft all use this modelGoogle, Yahoo, and Microsoft all use this model Bid Phrases Bid Phrases This is the query that will result in showing this ad.This is the query that will result in showing this ad. Bidding system Bidding system An advertiser pays the search company whatever it wants to associate its ad with a bid phraseAn advertiser pays the search company whatever it wants to associate its ad with a bid phrase If an advertiser pays more, its ad gets a higher ranking.If an advertiser pays more, its ad gets a higher ranking. Example: Example: High Bidders pays $1,000,000,000,000,000,000,000 for the bid phrase “Dummy Query”High Bidders pays $1,000,000,000,000,000,000,000 for the bid phrase “Dummy Query” Low Bidders pays $1 for the bid phrase “Dummy Query”Low Bidders pays $1 for the bid phrase “Dummy Query” When I search for “Dummy Query” I see High Bidders’ ad first, then Low Bidders’ ad.When I search for “Dummy Query” I see High Bidders’ ad first, then Low Bidders’ ad.

More on Sponsored Search System An Advertiser An Account An Ad Campaign An Ad Group Creative Bid Phrases More Ad Groups More Ad Campaigns More Accounts Other Advertisers

Why Do This Paper? 30-40% of search engine result pages have no ads on them because Google, Yahoo, etc. don’t know what queries are similar to the bid phrase 30-40% of search engine result pages have no ads on them because Google, Yahoo, etc. don’t know what queries are similar to the bid phrase Previous work has developed systems that are far too inefficient to use in real life Previous work has developed systems that are far too inefficient to use in real life

My Own Experiment Query: Banana Bread Query: Nut-Free Banana Bread Query: Nut-Free Banana Bread Query: Vegan Banana Bread Query: Vegan Banana Bread

Why do tail queries have so few ads? They are often harder to interpret than more common (head and torso) queries They are often harder to interpret than more common (head and torso) queries There are rarely exact matches for bid queries There are rarely exact matches for bid queries There is little historical click data There is little historical click data Search engines don’t like posting irrelevant ads Search engines don’t like posting irrelevant ads

What does this paper accomplish? Online query expansion for tail queries Online query expansion for tail queries New way to index query expansions for fast computation of query similarity New way to index query expansions for fast computation of query similarity A way to go from pre-expanded queries to expanding related queries on the fly A way to go from pre-expanded queries to expanding related queries on the fly A ranking and scoring method A ranking and scoring method

The Architecture of their system

Query Feature Extraction Unigrams Unigrams Process them viaProcess them via Stemming Stemming Taking words like “Extraction” and “Extracting” and stemming them to “Extract”Taking words like “Extraction” and “Extracting” and stemming them to “Extract” Stop words Stop words Ignoring words you don’t likeIgnoring words you don’t like Phrases Phrases Multi-word phrases are from a dictionary of ~10 million phrases gathered from query logs and web pagesMulti-word phrases are from a dictionary of ~10 million phrases gathered from query logs and web pages Semantic Classes Semantic Classes Developed a hierarchical taxonomy of 6000 semantic classesDeveloped a hierarchical taxonomy of 6000 semantic classes Annotate each query with the 5 most likely semantic classesAnnotate each query with the 5 most likely semantic classes

Related Query Retrieval Now we have a pseudo-query made up of features. Now we have a pseudo-query made up of features. Compare this pseudo-query to our inverted index and pull out related pseudo-queries Compare this pseudo-query to our inverted index and pull out related pseudo-queries Runs a system that pulls out key words then calculates the similarity using a dot product Runs a system that pulls out key words then calculates the similarity using a dot product

Query Expansion Q* is the set of features describing the original features and related queries Q* is the set of features describing the original features and related queries The weight of a given feature in Q* is a linear combination of its weight in the original and related queries The weight of a given feature in Q* is a linear combination of its weight in the original and related queries This expansion is efficient because you’re only looking at the features in related queries This expansion is efficient because you’re only looking at the features in related queries

Ad Feature Weighting Extract the same features from the bid phrases of ad groups as from queries (unigrams, phrases, semantic classes) Extract the same features from the bid phrases of ad groups as from queries (unigrams, phrases, semantic classes) Since the weighting from the queries would unfairly benefit short ad groups, use the BM25 weighting scheme. Since the weighting from the queries would unfairly benefit short ad groups, use the BM25 weighting scheme.

Title Match Boosting Increases the score of ads whose titles match the original query very well Increases the score of ads whose titles match the original query very well

Scoring Function The end result of all this The end result of all this A weighted sum of dot products between features and the title match boost A weighted sum of dot products between features and the title match boost

Now on to the results!

Test Set Test set: 400 random rare queries from Yahoo Test set: 400 random rare queries from Yahoo 121 were in the lookup table, 279 were not121 were in the lookup table, 279 were not Eliminated the 10% of rare queries that were foreignEliminated the 10% of rare queries that were foreign Human editors judged the top 3 ads. Human editors judged the top 3 ads. 3556 judgments3556 judgments The system was built off of every ad Yahoo has and 100 million queries based off of U.S. Yahoo The system was built off of every ad Yahoo has and 100 million queries based off of U.S. Yahoo

Metrics Discounted Cumulative Gain (DCG) Discounted Cumulative Gain (DCG) “a measure of effectiveness of a Web search engine algorithm or related applications, often used in information retrieval. Using a graded relevance scale of documents in a search engine result set, DCG measures the usefulness, or gain, of a document based on its position in the result list. The gain is accumulated cumulatively from the top of the result list to the bottom with the gain of each result discounted at lower ranks.” –Wikipedia“a measure of effectiveness of a Web search engine algorithm or related applications, often used in information retrieval. Using a graded relevance scale of documents in a search engine result set, DCG measures the usefulness, or gain, of a document based on its position in the result list. The gain is accumulated cumulatively from the top of the result list to the bottom with the gain of each result discounted at lower ranks.” –Wikipedia DCG is a number; higher numbers are betterDCG is a number; higher numbers are better Precision-Recall Curves Precision-Recall Curves Precision: Fraction of results returned that are relevantPrecision: Fraction of results returned that are relevant Recall: Fraction of relevant results that are returnedRecall: Fraction of relevant results that are returned A way to visualize it; higher is betterA way to visualize it; higher is better

Ad Matching Algorithms Tested Baseline Baseline The original, unexpanded version of the query vectorThe original, unexpanded version of the query vector Offline Expansion Offline Expansion Expands the original query by pre-processing offline onlyExpands the original query by pre-processing offline only Online Expansion Online Expansion Expands the original query by processing online onlyExpands the original query by processing online only Online + Offline Expansion Online + Offline Expansion Expands the original query using both offline and online expansion algorithmsExpands the original query using both offline and online expansion algorithms

Test Results: Queries not found in lookup table Tested the baseline vs online expansion Tested the baseline vs online expansion The online expansion gave statistically significant improvements The online expansion gave statistically significant improvements

Test Results: Queries found in lookup table Tested all 4 algorithms Tested all 4 algorithms Best: offline expansion Best: offline expansion Second best: online + offline expansion Second best: online + offline expansion Difference between the two was not statistically significant Difference between the two was not statistically significant

Test results: full set Tested on all four algorithms Tested on all four algorithms Best: online + offline expansion Best: online + offline expansion Online expansion also offers statistically significant improvement Online expansion also offers statistically significant improvement Even better: hybrid Even better: hybrid

Efficiency The table lookup takes only 1 ms The table lookup takes only 1 ms Least efficient when a query is not in the lookup table Least efficient when a query is not in the lookup table When a query is not in the lookup table, there is a 50% overhead When a query is not in the lookup table, there is a 50% overhead This is badThis is bad But given the small proportion of queries not in the lookup table, the estimated average is 12.5% overhead But given the small proportion of queries not in the lookup table, the estimated average is 12.5% overhead This is goodThis is good

Online Expansion of Rare Queries for Sponsored Search Defended by Mykell Miller.

Similar presentations

Presentation on theme: "Online Expansion of Rare Queries for Sponsored Search Defended by Mykell Miller."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Online Expansion of Rare Queries for Sponsored Search Defended by Mykell Miller.

Similar presentations

Presentation on theme: "Online Expansion of Rare Queries for Sponsored Search Defended by Mykell Miller."— Presentation transcript:

Similar presentations

About project

Feedback