Diversified Top-k Graph Pattern Matching 1 Yinghui Wu UC Santa Barbara Wenfei Fan University of Edinburgh Southwest Jiaotong University Xin Wang.

Slides:



Advertisements
Similar presentations
Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.
Advertisements

Problems and Their Classes
Location Recognition Given: A query image A database of images with known locations Two types of approaches: Direct matching: directly match image features.
Ranking Outliers Using Symmetric Neighborhood Relationship Wen Jin, Anthony K.H. Tung, Jiawei Han, and Wei Wang Advances in Knowledge Discovery and Data.
CrowdER - Crowdsourcing Entity Resolution
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Yinghui Wu, LFCS DB talk Database Group Meeting Talk Yinghui Wu 10/11/ Simulation Revised for Graph Pattern Matching.
Shuai Ma, Yang Cao, Wenfei Fan, Jinpeng Huai, Tianyu Wo Capturing Topology in Graph Pattern Matching University of Edinburgh.
Correlation Search in Graph Databases Yiping Ke James Cheng Wilfred Ng Presented By Phani Yarlagadda.
New Models for Graph Pattern Matching Shuai Ma ( 马 帅 )
The IEEE International Conference on Big Data 2013 Arash Fard M. Usman Nisar Lakshmish Ramaswamy John A. Miller Matthew Saltz Computer Science Department.
Minimizing Seed Set for Viral Marketing Cheng Long & Raymond Chi-Wing Wong Presented by: Cheng Long 20-August-2011.
GRAIL: Scalable Reachability Index for Large Graphs VLDB2010 Vineet Chaoji Mohammed J. Zaki.
In Search of Influential Event Organizers in Online Social Networks
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
1 Efficient Subgraph Search over Large Uncertain Graphs Ye Yuan 1, Guoren Wang 1, Haixun Wang 2, Lei Chen 3 1. Northeastern University, China 2. Microsoft.
Lectures on Network Flows
Efficient Processing of Top-k Spatial Keyword Queries João B. Rocha-Junior, Orestis Gkorgkas, Simon Jonassen, and Kjetil Nørvåg 1 SSTD 2011.
Dean H. Lorenz, Danny Raz Operations Research Letter, Vol. 28, No
Yinghui Wu LFCS Lab Lunch Homomorphism and Simulation Revised for Graph Matching.
Making Pattern Queries Bounded in Big Graphs 11 Yang Cao 1,2 Wenfei Fan 1,2 Jinpeng Huai 2 Ruizhe Huang 1 1 University of Edinburgh 2 Beihang University.
Querying Big Graphs within Bounded Resources 1 Yinghui Wu UC Santa Barbara Wenfei Fan University of Edinburgh Southwest Jiaotong University Xin Wang.
Yinghui Wu, SIGMOD 2012 Query Preserving Graph Compression Wenfei Fan 1,2 Jianzhong Li 2 Xin Wang 1 Yinghui Wu 1,3 1 University of Edinburgh 2 Harbin Institute.
1 QSX: Querying Social Graphs Querying big graphs Parallel query processing Boundedly evaluable queries Query-preserving graph compression Query answering.
Subgraph Containment Search Dayu Yuan The Pennsylvania State University 1© Dayu Yuan9/7/2015.
Performance Guarantees for Distributed Reachability Queries Wenfei Fan 1,2 Xin Wang 1 Yinghui Wu 1,3 1 University of Edinburgh 2 Harbin Institute of Technology.
Graph Partitioning Problem Kernighan and Lin Algorithm
Mehdi Kargar Aijun An York University, Toronto, Canada Discovering Top-k Teams of Experts with/without a Leader in Social Networks.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
Searching for Extremes Among Distributed Data Sources with Optimal Probing Zhenyu (Victor) Liu Computer Science Department, UCLA.
Querying Business Processes Under Models of Uncertainty Daniel Deutch, Tova Milo Tel-Aviv University ERP HR System eComm CRM Logistics Customer Bank Supplier.
Association Rules with Graph Patterns Yinghui Wu Washington State University Wenfei Fan Jingbo Xu University of Edinburgh Southwest Jiaotong University.
The Fast Optimal Voltage Partitioning Algorithm For Peak Power Density Minimization Jia Wang, Shiyan Hu Department of Electrical and Computer Engineering.
CS 533 Information Retrieval Systems.  Introduction  Connectivity Analysis  Kleinberg’s Algorithm  Problems Encountered  Improved Connectivity Analysis.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
Heuristic Optimization Methods Greedy algorithms, Approximation algorithms, and GRASP.
Yinghui Wu, ICDE Adding Regular Expressions to Graph Reachability and Pattern Queries Wenfei Fan Shuai Ma Nan Tang Yinghui Wu University of Edinburgh.
Andreas Papadopoulos - [DEXA 2015] Clustering Attributed Multi-graphs with Information Ranking 26th International.
Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 1 Graph Query Reformulation with Diversity Davide Mottin, University.
Efficient Processing of Top-k Spatial Preference Queries
Zhuo Peng, Chaokun Wang, Lu Han, Jingchao Hao and Yiyuan Ba Proceedings of the Third International Conference on Emerging Databases, Incheon, Korea (August.
Answering pattern queries using views Yinghui Wu UC Santa Barbara Wenfei Fan University of EdinburghSouthwest Jiaotong University Xin Wang.
Distributed Graph Simulation: Impossibility and Possibility 1 Yinghui Wu Washington State University Wenfei Fan University of Edinburgh Southwest Jiaotong.
1 QSX: Querying Social Graphs Approximate query answering Query-driven approximation Data-driven approximation Graph systems.
Computer Science and Engineering TreeSpan Efficiently Computing Similarity All-Matching Gaoping Zhu #, Xuemin Lin #, Ke Zhu #, Wenjie Zhang #, Jeffrey.
Mining Top-K Large Structural Patterns in a Massive Network Feida Zhu 1, Qiang Qu 2, David Lo 1, Xifeng Yan 3, Jiawei Han 4, and Philip S. Yu 5 1 Singapore.
Presented by: Sandeep Chittal Minimum-Effort Driven Dynamic Faceted Search in Structured Databases Authors: Senjuti Basu Roy, Haidong Wang, Gautam Das,
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A self-organizing map for adaptive processing of structured.
Iterative Rounding in Graph Connectivity Problems Kamal Jain ex- Georgia Techie Microsoft Research Some slides borrowed from Lap Chi Lau.
CPT-S Advanced Databases 11 Yinghui Wu EME 49.
CPT-S Advanced Databases 11 Yinghui Wu EME 49.
Outline  Introduction  Subgraph Pattern Matching  Types of Subgraph Pattern Matching  Models of Computation  Distributed Algorithms  Performance.
Bo Zong, Yinghui Wu, Ambuj K. Singh, Xifeng Yan 1 Inferring the Underlying Structure of Information Cascades
Yinghui Wu, SIGMOD Incremental Graph Pattern Matching Wenfei Fan Xin Wang Yinghui Wu University of Edinburgh Jianzhong Li Jizhou Luo Harbin Institute.
CPT-S Advanced Databases 11 Yinghui Wu EME 49.
1 Chapter 5 Branch-and-bound Framework and Its Applications.
1 Link Privacy in Social Networks Aleksandra Korolova, Rajeev Motwani, Shubha U. Nabar CIKM’08 Advisor: Dr. Koh, JiaLing Speaker: Li, HueiJyun Date: 2009/3/30.
Ning Jin, Wei Wang ICDE 2011 LTS: Discriminative Subgraph Mining by Learning from Search History.
Cohesive Subgraph Computation over Large Graphs
Answering pattern queries using views
Data Driven Resource Allocation for Distributed Learning
RE-Tree: An Efficient Index Structure for Regular Expressions
Upper Bound for Defragmenting Buddy Heaps
Lectures on Network Flows
Approximate Lineage for Probabilistic Databases
Effective Social Network Quarantine with Minimal Isolation Costs
Diversified Top-k Subgraph Querying in a Large Graph
Efficient Processing of Top-k Spatial Preference Queries
Approximate Graph Mining with Label Costs
CS137: Electronic Design Automation
Presentation transcript:

Diversified Top-k Graph Pattern Matching 1 Yinghui Wu UC Santa Barbara Wenfei Fan University of Edinburgh Southwest Jiaotong University Xin Wang

Graph pattern matching in social search 2 Graph pattern matching in social networks Applications: social relationship search, social role analysis, expert search, etc. Social graphs are typically large, with billions of nodes and edges. Challenges ◦Costly over large social networks; ◦Matching algorithms return too many results; ◦“query focus” in social network queries These motivate us to find best matches of the specific pattern node via graph pattern matching. However the problems are challenging!

Hardness of the problems 3 Top-k graph pattern matching problem Complexity: O(|G||Q| + |G| 2 ) time with early termination. Diversified top-k graph pattern matching problem Complexity: ◦NP-complete; ◦2-approximable in O((|Q||G|+|V|(|V|+|E|)) time; ◦“Early termination” heuristic algorithm in O((|Q||G|+|V|(|V|+|E|)) time. Approximating Diversification 2-approximable algorithm ◦Idea: rounding down diversification function and reduce to Maximum dispersion. Early termination heuristics ◦Idea: greedily select new matches that maximizes the difference with selected matches.

Finding best candidates 4 Project Manager* Programmer DB manager Tester PM 1 BA PM 2 PM 3 PM 4 PRG 1 DB 1 DB 2 PRG 3 DB 3 PRG 4 PRG 2 UD 1 UD 2 ST 1 ST 2 ST 3 ST 4 Query: find good PM (project manager) candidates collaborated with PRG (programmer), DB (database developer) and ST (software tester). Collaboration network G “query focus” complete matching relation (project manager, PM 1 ), (project manager, PM 2 ) (project manager, PM 3 ), (project manager, PM 4 ) (programmer, PRG 1 ), (programmer, PRG 2 ) (programmer, PRG 3 ), (programmer, PRG 4 ) (DBmanager, DB 1 ), (DBmanager, DB 2 ) (DBmanager, DB 3 ) (tester, ST 1 ), (tester, ST 2 ) (tester, ST 3 ), (tester, ST 4 ) Pattern graph Q When graph pattern matching is defined in terms of subgraph Isomorphism, no match of Q can be identified in G, since it is too restrictive to define matches as isomorphic subgraphs. We adopt to find matches using graph simulation, which computes a binary relation on the pattern nodes in Q and their matches in G.

Problem formalization 5 Graph pattern matching using simulation (VLDB 10) ◦a graph G matches a pattern P if there exists a matching relation S; ◦for each pair (u, v) in S, v is a node in G that matches u in P; ◦for each edge (u, u’) in P, there exists an edge (v, v’) in G and (u’, v’) is in S. Graph pattern matching revised ◦extend a pattern with a designated output node u 0 ◦matches Q(G): the matches of u 0 ◦readily extends to multiple output nodes Problem: we want to find (diversified) top-K matches for graph pattern matching with a designated output node. Project Manager * Programmer DB manager Tester (PM 1 -PM 4 ) in the example

Top-k matching problem 6 Relevance ◦Relevant set R(u,v) for a match v of a query node u: all descendants of v as matches of descendants of u ◦a unique, maximum relevance set ◦Relevance function ◦The more reachable matches, the better Top-k matching: find top-k match set that maximizes total relevance PM 2 DB 2 PRG 3 DB 3 PRG 4 PRG 2 ST 2 ST 3 ST 4

Match Diversification 7 Match diversity ◦Diversity function: set difference of the relevant set Diversification: a bi-criteria combination of both relevance and diversity ◦relevance: common neighbors, Jaccard coefficient… ◦diversity: neighborhood diversity, distance-based diversity Diversified Top-k Matching: find a set S of matches for output node s.t

Finding Top-k Matches (for Acyclic Patterns) 8 Finding Top-k matches for acyclic patterns ◦Initializes a heap S, and a vector for each candidate v ◦Computes a set of matches for some query nodes (can be determined without following steps) ◦Iteratively updates vectors of other candidates by propagating the partial answers ◦Termination condition: (1) each v in S is a match of u o, and (2) min v ∈ S (l(u o, v)) ≥ max v′ ∈ can(uo)\S (h(u o, v)), where l(u o, v) and h(u o, v) denote a lower bound and upper bound of r(u o, v). xXv: match?v.R: relevance set v.lower, v.upper: relevance bound

9 Project Manager* Programmer DB manager PM 1 BA PM 2 PM 3 PM 4 PRG 1 DB 1 DB 2 PRG 3 DB 3 PRG 4 PRG 2 UD 1 UD 2 ST 1 ST 2 ST 3 ST 4 Finding Top-k Matches (for Acyclic Patterns) vv.T = PM 1 PM 2 PM 3 PM 4 PRG 1 PRG j (j ∈ [3,4]) DB k (k ∈ [1,3]) vv.T = PM 1 PM 2 PM 3 PM 4 PRG 1 PRG j (j ∈ [3,4]) DB 2 DB k (k ∈ [1,3]) After initialization, vectors of parts nodes. Starting propagation from DB 2, after propagation, parts of the vectors are as below. PM2 is verified to be a valid match, and its relevant set includes { DB 2, PRG 4, PRG 3 }, which is the largest relevant set compared with other PMs. Early termination condition is met.

Finding Top-k matches for cyclic patterns ◦Computes topological rank r(u) of query nodes u in Q; ◦Iteratively updates vectors of candidates by propagating the partial answers if the corresponding u scc contains only one node; ◦Otherwise, employs Procedure SccProcess to verify matches. Finding Top-k Matches (for Cyclic Patterns) 10 Project Manager* Programmer DB manager Tester Project Manager* Programmer DB manager Tester r(PM) = 2 r(ST) = 0 r(u scc ) = 1

vv.T = PM 1, Ф, 0, 4> PM 2 PM 3 PM 4 PRG 2 PRG 3 PRG 4 DB 2 DB 3 11 PM 1 BA PM 2 PM 3 PM 4 PRG 1 DB 1 DB 2 PRG 3 DB 3 PRG 4 PRG 2 UD 1 UD 2 ST 1 ST 2 ST 3 ST 4 Finding Top-k Matches (for Cyclic Patterns) Project Manager* Programmer DB manager Tester X DB3 =true X PRG2 =true X DB2 =true X PRG3 =true X PRG4 =true X PM2 =true X PM3 =true PM2 and PM3 are top-2 matches, since we can determine their relevance sets are largest two sets. The algorithm can terminate early, although PM2 has another descendant ST2 which is also a true match of ST and PM1 is not verified at all. Start propagation from ST3 and ST4

F()PM 1 PM 2 PM 3 PM 4 PM PM PM PM Finding Top-k Diversified Matches VR(u o, v) δr () PM 1 {PRG 1, DB 1, ST 1, ST 2 }4 PM 2 {PRG 4, PRG 3, PRG 2, DB 2, DB 3, ST 2, ST 3, ST 4 }8 PM 3 {PRG 3, PRG 2, DB 2, DB 3, ST 3, ST 4 }6 PM 4 {PRG 3, PRG 2, DB 2, DB 3, ST 3, ST 4 }6 δd ()PM 1 PM 2 PM 3 PM 4 PM 1 010/1111 PM 2 10/1101/4 PM 3 11/400 PM 4 11/400 PM 1 and PM 3 are picked by TopKDiv as top-2 diversified matches. F’(PM1, PM3)=0.5*(4/11+6/11) + 1 = 1.45 PM 1 PM 3 PRG 1 DB 1 DB 2 PRG 3 DB 3 PRG 2 ST 1 ST 2 ST 3 ST 4 PM1 and PM3 have no descendant matches in common, and influence a large part of the matches.

13 PM 2 and PM 3 are picked by TopKDH as top-2 diversified matches. vv.T = PM 1, Ф, 0, 4> PM 2 PM 3 PM 4 F’’(PM2, PM3)=(1-0.1) * (7/11+6/11) + 2*0.1*/(2-1) * 1/7 = 1.1 Finding Top-k Diversified Matches PM 1 BA PM 2 PM 3 PM 4 PRG 1 DB 1 DB 2 PRG 3 DB 3 PRG 4 PRG 2 UD 1 UD 2 ST 1 ST 2 ST 3 ST 4 PM2,PM3,PM4 are verified true matches, and the termination condition is satisfied.

Experimental evaluation 14 Dataset ◦Real-life graphs ◦Synthetic graphs Amazon EC2 Instance with 3.75GB memory, 2 EC2 compute unit. Algorithms ◦Top-k matching (with/without optimization) ◦Brute force algorithm ◦Diversified algorithm: Approximation & Heuristic with early termination Graphs|V||E| Amazon co-purchasing network548,5521,788,725 Citation1,397,2403,021,489 Youtube1,609,9694,509,826

15 Experimental evaluation Varying |Q| on Youtube

16 Experimental evaluation Varying |Q| on AmazonVarying |Q| on Youtube

17 Experimental evaluation

Conclusion && Future work 18 Conclusion revised graph patterns by supporting a designated output node; defined functions to measure match relevance and diversity, as well as a bi-criteria objective function based on both; algorithms for computing top-k matches, and for finding diversified top-k matches, with properties such as constant approximation ratios and early termination; verified effectiveness of our methods. Future work Optimization techniques to further reduce the number of matches examined by our algorithms; Distributed top-k matching algorithms on graphs that are partitioned, distributed and possibly compressed.

19 Thanks!