# Catching the Drift: Learning Broad Matches from Clickthrough Data Sonal Gupta, Mikhail Bilenko, Matthew Richardson University of Texas at Austin, Microsoft.

## Presentation on theme: "Catching the Drift: Learning Broad Matches from Clickthrough Data Sonal Gupta, Mikhail Bilenko, Matthew Richardson University of Texas at Austin, Microsoft."— Presentation transcript:

Catching the Drift: Learning Broad Matches from Clickthrough Data Sonal Gupta, Mikhail Bilenko, Matthew Richardson University of Texas at Austin, Microsoft Research

Identifying Broad Matches  Good keyword mappings retrieve relevant ads that users click  How to measure what is relevant and likely to be clicked?  Human judgments: expensive, hard to scale  Past user clicks: provide click data for kw → kw’ when user was shown ad(kw' ) in context of kw  Highly available, less trustworthy  What similarity functions may indicate relevance of kw → kw' ?  Syntactic (edit distance, TF-IDF cosine, string kernels, …)  Co-occurrence (in documents, query sessions, bid campaigns, …)  Expanded representation (search result snippets, category bags, …)

Approach  Task: train a learner to estimate p(click | kw → kw' ) for any kw → kw'  Data  triples from clickthrough logs, where kw → kw' was suggested by previous broad match mappings  Features  Convert each pair to a feature vector capturing similarities etc. (kw → kw') →  For each triple, create an instance: ( ϕ (kw, kw' ), click)  Learner: max-margin averaged perceptron (strong theory, very efficient) ϕ 1 (kw, kw' ) ϕ n (kw, kw' ) … where ϕ i (kw, kw' ) can be any function of kw, kw' or both

5 Example: Creating an Instance  Historical broad match clickthrough data: kw  kw'  ad(kw' )  click event digital slr  canon rebel  Canon Rebel Kit for \$499  click seattle baseball  mariners tickets  Mariners season tickets  no click  Feature functions  Instances [0.78 0.001 0.9], 1 [0.05 0.02 0.2], 0 Original kwBroad match kw' ϕ1ϕ1 ϕ2ϕ2 ϕ3ϕ3 digital slrcanon rebel0.780.0010.9 seattle baseballmariners tickets0.050.020.2

Experiments  Data  2 months of previous broad match ads from Microsoft Content Ads logs  1 month for training, 1 month for testing  68 features (syntactic, co-occurrence based, etc.); greedy feature selection  Metrics  LogLoss:  LogLoss Lift: difference between obtained LogLoss and an oracle that has access to empirical p(click | kw → kw' ) in test set.  CTR and revenue results in live test with users

Results

Live Test Results  Use CTR prediction to maximize expected revenue  Re-rank mappings to incorporate revenue  +18% revenue, -2% CTR

Online Learning with Amnesia  Advertisers, campaigns, bidded keywords and delivery contexts change very rapidly: high concept drift  Recent data is more informative  Goal: utilize older data while capturing changes in distributions  Averaged Perceptron doesn’t capture drift  Solution: Amnesiac Averaged Perceptron  Exponential weight decay when averaging hypotheses

Results Model-LogLossLogL Lift Prior0.65720.1224 Feature Selection + Online Learning + Amnesia 0.57090.0361 Online+Feature Selection, No Amnesia0.60330.0685 Online+Amnesia, No Feature Selection0.65630.1215 Feature Selection+Amnesia, Weekly Batch0.59480.0600

Contributions and Conclusions learning broad matches from implicit feedback  Combining arbitrary similarity measures/features  Using clickthrough logs as implicit feedback  Amnesiac Averaged Perceptron  Exponentially weighted averaging: distant examples “fade out”  Online learning adapts to market dynamics

Thank You!

13 Features and Feature Selection  Co-occurrence feature examples:  User search sessions: keywords searched within 10 mins  Advertiser campaigns: keywords co-bidded by the same advertiser  Past clickthrough rates of original and broad matched keywords  Various syntactic similarities  Various existing broad matching lists  and so on…  Feature Selection:  A total of 68 features  Greedy feature selection

Additional Information  Estimation of expected value of click over all the ads shown for a broad match mapping E(p(click(ad(kw))|q))  Query Expansion vs. Broad Matching  Our broad matching algorithm can be extended for query expansion  But, broad matching is for a fixed set of bidded keywords  Forgetron vs. Amesiac Averaged Perceptron  Forgetron maintains a set of budget support vectors: stores examples explicitly and does not take into account all the data  AAP: weighted average over all the examples, no need to store examples explicitly

Results Model-LogLossLogL Lift Prior0.65720.1224 Feature Selection + Online Learning + Amnesia 0.57090.0361 Online+Amnesia, No Feature Selection0.65630.1215 Feature Selection+Amnesia, Weekly Batch0.59480.0600 Online+Feature Selection, No Amnesia0.60330.0685

16 Amnesiac Averaged Perceptron

Download ppt "Catching the Drift: Learning Broad Matches from Clickthrough Data Sonal Gupta, Mikhail Bilenko, Matthew Richardson University of Texas at Austin, Microsoft."

Similar presentations