Presentation is loading. Please wait.

Presentation is loading. Please wait.

Simrank++: Query Rewriting through link analysis of the click graph Ioannis Antonellis Hector Garcia-Molina

Similar presentations


Presentation on theme: "Simrank++: Query Rewriting through link analysis of the click graph Ioannis Antonellis Hector Garcia-Molina"— Presentation transcript:

1 Simrank++: Query Rewriting through link analysis of the click graph Ioannis Antonellis antonell@cs.stanford.edu Hector Garcia-Molina hector@cs.stanford.edu Chi-Chao Chang chichao@yahoo-inc.com

2 Sponsored Search Model Queries Advertisers Bids 2Stanford Infolab

3 Auction Model query Relevance Ads Bid amount 3Stanford Infolab

4 Motivating Example addicting games www.addictinggames.com No ads! 4Stanford Infolab

5 Motivating Example free online games 5Stanford Infolab

6 Modified Sponsored Search Model Advertisers bid on queries For each query – Search engine runs an auction – ad relevance and bid amount – Top 5-10 ads get displayed along with regular search results Extra: Advertisers are charged a default amount in cases where their ads are being displayed for queries they didn’t bid on 6Stanford Infolab

7 Outline Sponsored Search Model Motivating Example Query Rewriting using the click graph Simrank Evidence-based Simrank Weighted Simrank Experiments 7Stanford Infolab

8 Sponsored search system Sponsored Search System HistoryAdsBids qads 8Stanford Infolab

9 Query Rewriting Front End HistoryAdsBids q ads Back End q, rewrites for q 9Stanford Infolab

10 Click Graph from sponsored search 10 Stanford Infolab pc camera Digital camera tv flower Hp.com Bestbuy.com Teleflora.com Orchids.com QueriesAds Clicks 10 20 5 30 7 15 16 15 Similar Queries pc camera Digital camera pc Digital camera tv camera tv Digital camera pc tv

11 Simrank [JW 2003] Intuition: – “Two queries are similar if they are connected to similar ads” – “Two ads are similar if they are connected to similar queries” Iterative procedure: at each iteration similarity propagates in the graph 11Stanford Infolab

12 Simrank [JW 2003] N(q): # of ads connected to q E(q): set of ads connected to q sim k (q,q’): q-q’ similarity at k-th iteration Initially sim(q,q) = 1, sim(q,q’) = 0, sim(a,a) = 1, sim(a,a’) = 0 Time: O(n 4 ) 12Stanford Infolab

13 Simrank 13 Stanford Infolab pc camera Digital camera tv flower Hp.com Bestbuy.com Teleflora.com Orchids.com QueriesAds Clicks Two random surfers model

14 Simrank in matrix notation Input: transition matrix P, decay factor C, number of iterations k Output: similarity matrix S For i = 1:k, do – temp = C P T S P – S = temp + I – Diag(diag(temp)) end 14Stanford Infolab Worst case running time: O(n 3 ), see also next talk

15 Simrank 15 Stanford Infolab pc camera Digital camera tv flower Hp.com bestbuy.com teleflora.com orchids.com pccameradigital camera tvflower pc1 camera0.08891 digital camera 0.08890.17781 tv00.0889 1 flower00001 1 st Iteration C = 0.8

16 Simrank 16 Stanford Infolab pc camera Digital camera tv flower Hp.com bestbuy.com teleflora.com orchids.com pccameradigital camera tvflower pc1 camera0.12441 digital camera 0.12440.24891 tv0.03560.1244 1 flower00001 2 nd Iteration C = 0.8

17 Simrank 17 Stanford Infolab pc camera Digital camera tv flower Hp.com bestbuy.com teleflora.com orchids.com pccameradigital camera tvflower pc1 camera0.16501 digital camera 0.16500.331 tv0.07610.1650 1 flower00001 12 th Iteration C = 0.8

18 Outline Sponsored Search Model Motivating Example Query Rewriting using the click graph Simrank Evidence-based Simrank Weighted Simrank Evaluation 18Stanford Infolab

19 Evidence-based Simrank Problem: Simrank scores in complete bipartite graphs are counter-intuitive See Theorems in paper, here examples for intuition

20 Evidence-based Simrank 20Stanford Infolab pc camera Hp.com camera Digital camera Hp.com Bestbuy.com iterationCamera – digital camera Pc - camera 10.40.8 20.560.8 30.6240.8 40.64960.8 50.659840.8 60.6639330.8 iterationCamera – digital camera Pc - camera 10.30.4 20.420.4 30.4680.4 40.48720.4 50.494880.4 60.4979520.4 C = 0.8

21 Evidence-based Simrank 21Stanford Infolab pc camera Hp.com camera Digital camera Hp.com Bestbuy.com iterationCamera – digital camera Pc - camera 10.30.4 20.420.4 30.4680.4 40.48720.4 50.494880.4 60.4979520.4 C = 0.8

22 Outline Sponsored Search Model Motivating Example Query Rewriting using the click graph Simrank Evidence-based Simrank Weighted Simrank Evaluation 22Stanford Infolab

23 Weighted Simrank 23Stanford Infolab flower orchids Teleflora.com flower orchids Teleflora.com 1000 1 Variance on weights matters

24 Weighted Simrank 24Stanford Infolab flower orchids Teleflora.com flower orchids Teleflora.com 1000 1 1 Absolute value of weights matters

25 Weighted Simrank 25Stanford Infolab pc camera Digital camera tv flower Hp.com bestbuy.com teleflora.com orchids.com

26 Simrank++ Input: transition matrix P’, evidence matrix V, decay factor C, number of iterations k Output: similarity matrix S’ For i = 1:k, do – temp = C P’ T S’ P’ – S’ = temp + I – Diag(diag(temp)) End S’ = V.*S’ 26Stanford Infolab

27 Outline Sponsored Search Model Motivating Example Query Rewriting using the click graph Simrank Evidence-based Simrank Weighted Simrank Evaluation 27Stanford Infolab

28 Evaluation 28Stanford Infolab Dataset: – 2 weeks Yahoo! click graph, 15 million queries, 14 million ads, 28 million edges – Extracted largest connected component and further decomposed it into 5 subgraphs (details in the paper) – Edge weights: adjusted clicks over impressions rate (to account for position bias) Evaluation set: – 120 queries sampled from search engine traffic

29 Evaluation 29Stanford Infolab Comparison with: – Pearson similarity – Jaccard similarity – cosine similarity

30 Metrics 30Stanford Infolab – Precision/recall (manual evaluation) Precision(q) = relevant rewrites of q / number of rewrites for q (among all methods) Recall(q) = relevant rewrites of q / number of relevant rewrites for q (among all methods) – Query coverage Number of queries for which the method gives at least one rewrite – Query rewriting depth Total number of rewrites for a given query

31 Evaluation 31Stanford Infolab

32 Evaluation 32Stanford Infolab

33 Evaluation 33Stanford Infolab

34 Evaluation 34Stanford Infolab

35 Conclusions/Open issues Proposed use of Simrank for query rewriting Two extensions: evidence-based, weighted Simrank++ overall best method Ad Selection models Blend with semantic text-similarity methods Incremental computation of Simrank++ values Applications to recommendation systems 35Stanford Infolab

36 Thank You! http://infoblog.stanford.edu 36Stanford Infolab


Download ppt "Simrank++: Query Rewriting through link analysis of the click graph Ioannis Antonellis Hector Garcia-Molina"

Similar presentations


Ads by Google