Download presentation
Presentation is loading. Please wait.
Published byLoreen Carpenter Modified over 9 years ago
1
Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 1 Graph Query Reformulation with Diversity Davide Mottin, University of Trento Francesco Bonchi, Yahoo Labs - Francesco Gullo, Yahoo Labs
2
Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 2 Issues with Pattern Search O OHO S Query 510 matches Too many matches Results are not grouped
3
Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 3 Solution: Discovering Specializations O OHO S 510 matches O OHOH O S CH 3 O OHO S SH O OHO S H3CH3C O O S CH 3 O OHOH O S H S 448 matches46 matches114 matches 382 matches 46 matches Specializations
4
Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 4 Applications Finding groups of molecules having a particular reagent Analyze a set of proteins to find diseases Workflow optimization Anomaly detection in a network 3D shape search with similar properties
5
Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 5 Dealing with Specializations in Web and Relational Data Faceted Search present aspects of the results [Roy08] Query reformulation Modify some of the query conditions In structured databases [Mishra09] In web search [Dang10] First Study of Problem on GRAPHS
6
Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 6 Graph Query Reformulation Results Query Specializations: query supergraphs … Exponential number of specializations
7
Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 7 Challenges The number of reformulation is exponential Quantify the interestingness of a reformulation Finding query specializations is NP-complete
8
Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 8 A Naïve Approach: k-most frequent super-patterns Query 480 matches 450 matches 100 matches Super Patterns 30 matches 420 matches Until k patterns are found: - Retrieve the most frequent super-pattern Until k patterns are found: - Retrieve the most frequent super-pattern Frequent ≠ Interesting !
9
Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 9 Our Approach Graph Query Reformulation with Diversity Finds k meaningful specializations efficiently
10
Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 10 Finding Meaningful Specializations Results Query Diversity Find k meaningful specializations: 1.Span all the results 2.Present different aspects of the results ? Find k meaningful specializations: 1.Span all the results 2.Present different aspects of the results ?
11
Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 11 Diversity Matters Results Query Objective function f(Q) λ = 1 Non optimal: f({Q 1 ’,Q 2 ’}) = 7 Optimal: f({Q 3 ’,Q 4 ’}) = 8 λ = 1 Non optimal: f({Q 1 ’,Q 2 ’}) = 7 Optimal: f({Q 3 ’,Q 4 ’}) = 8
12
Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 12 Problem Graph Query Reformulation with Diversity 12 Theorem (NP-hardness) The problem reduces to MAX-SUM Diversification Problem, so it is NP-hard Theorem (NP-hardness) The problem reduces to MAX-SUM Diversification Problem, so it is NP-hard
13
Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 13 Solution: Greedy Algorithm 13 Greedy While k-specializations are not found 1.Find the specialization leading to the maximum increment of the objective function (marginal gain) 2.Add the specialization to the results Theorem The algorithm is a ½-approximation Theorem The algorithm is a ½-approximation Finding the maximum gain is #P-complete [Valiant79] Solution Fast_MMPG: Branch and bound algorithm to efficiently find the specialization with the maximum marginal gain
14
Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 14 The multiplicity vector Results 0000011000221102222023331 Output set of specializations
15
Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 15 Upper bound on the Marginal gain Lemma The marginal gain increases if the multiplicity of the considered item is where |Q| is the number of reformulations in the reformulated set constructed so far. Lemma The marginal gain increases if the multiplicity of the considered item is where |Q| is the number of reformulations in the reformulated set constructed so far. Upper bound : is the value of the objective function considering only results with multiplicity Theorem
16
Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 16 Upper bound Results 0000012111 Output set of specializations 12111
17
Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 17 Until the reformulation with the maximum upper bound and marginal gain is not found 1.Expand the reformulation with the max upper bound 2.Prune Reformulations with marginal gain smaller than the upper bound so far Until the reformulation with the maximum upper bound and marginal gain is not found 1.Expand the reformulation with the max upper bound 2.Prune Reformulations with marginal gain smaller than the upper bound so far The Fast_MMPG Algorithm upper bound marginal gain
18
Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 18 Experimental Setup 18 Datasets: AIDS: 10k chemical compounds Financial: 17k transaction workflows Web: 13k interactions with a recommender system Baseline algorithms: k-freq: returns top-k frequent supergraphs of a query LIndex: informative patterns index Experiments: Time and objective function value varying k, query size, λ Anecdotal Scalability
19
Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 19 Time Comparison Number of specializations 1.k-freq runs only slighly faster 2.Time increases linearly in k 3.Fast_MMPG has real-time performance Query size 1.Fast_MMPG comparable to k- freq 2.Time decreases with query size (less reformulations) number of reformulations (k) query size
20
Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 20 Objective function gain 20 Analysis 1.Lambda correctly moves the objective function towards diversity 2.k-freq only captures coverage
21
Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 21 Qualitative evaluation k-freq Fast_MMPG C O O OH C O CH 3 C O Fe C O NH 2 C O CH 3 C O C O C C O CH 2 C C O NH 2 C O CH 2 C NH Query Analysis k-freq finds specialization of the same superquery Fast_MMPG returns reformulations with more diversified structures
22
Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 22 Conclusions First study of the problem of query reformulation in graph databases Principled objective function optimizing coverage and diversity Algorithmic solutions with quality guarantees and real time responses
23
Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 23 Questions? Thank you!
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.