Research Directions for Big Data Graph Analytics John A. Miller, Lakshmish Ramaswamy, Krys J. Kochut and Arash Fard Department of Computer Science University.

Slides:



Advertisements
Similar presentations
Bellman-Ford algorithm
Advertisements

The Primal-Dual Method: Steiner Forest TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A A AA A A A AA A A.
Chapter 5: Tree Constructions
Chapter 4 Partition I. Covering and Dominating.
Oracle Labs Graph Analytics Research Hassan Chafi Sr. Research Manager Oracle Labs Graph-TA 2/21/2014.
Shuai Ma, Yang Cao, Wenfei Fan, Jinpeng Huai, Tianyu Wo Capturing Topology in Graph Pattern Matching University of Edinburgh.
New Models for Graph Pattern Matching Shuai Ma ( 马 帅 )
The IEEE International Conference on Big Data 2013 Arash Fard M. Usman Nisar Lakshmish Ramaswamy John A. Miller Matthew Saltz Computer Science Department.
Design and Analysis of Algorithms Single-source shortest paths, all-pairs shortest paths Haidong Xue Summer 2012, at GSU.
Towards Efficient Query Processing on Massive Evolving Graphs (C-Big2012) Arash Fard, Amir Abdolrashidi, Lakshmish Ramaswamy and John A. Miller UGA Presentation.
Graphs Chapter 12. Chapter Objectives  To become familiar with graph terminology and the different types of graphs  To study a Graph ADT and different.
Zheng Zhang, Exact Matchingslide 1 Exact (Graph) Matching Zheng Zhang e Presentation for VO Structural Pattern Recognition.
Applied Discrete Mathematics Week 12: Trees
UMass Lowell Computer Science Analysis of Algorithms Prof. Karen Daniels Fall, 2001 Lectures 3 Tuesday, 9/25/01 Graph Algorithms: Part 1 Shortest.
CSC 213 Lecture 23: Shortest Path Algorithms. Weighted Graphs Each edge in weighted graph has numerical weight Weights can be distances, building costs,
Graph Algorithms: Part 1
Greedy Algorithms Reading Material: Chapter 8 (Except Section 8.5)
Graphs Chapter 12. Chapter 12: Graphs2 Chapter Objectives To become familiar with graph terminology and the different types of graphs To study a Graph.
A scalable multilevel algorithm for community structure detection
Fall 2007CS 2251 Graphs Chapter 12. Fall 2007CS 2252 Chapter Objectives To become familiar with graph terminology and the different types of graphs To.
Greedy Algorithms Like dynamic programming algorithms, greedy algorithms are usually designed to solve optimization problems Unlike dynamic programming.
1 Shortest Path Calculations in Graphs Prof. S. M. Lee Department of Computer Science.
Social Media Mining Graph Essentials.
FAST FREQUENT FREE TREE MINING IN GRAPH DATABASES Marko Lazić 3335/2011 Department of Computer Engineering and Computer Science,
Making Pattern Queries Bounded in Big Graphs 11 Yang Cao 1,2 Wenfei Fan 1,2 Jinpeng Huai 2 Ruizhe Huang 1 1 University of Edinburgh 2 Beihang University.
Managing Large RDF Graphs (Infinite Graph) Vaibhav Khadilkar Department of Computer Science, The University of Texas at Dallas FEARLESS engineering.
SUPPORTING A MODELING CONTINUUM IN SCALATION John A. Miller Michael E. Cotterell Stephen J. Buckley University of Georgia IBM Thomas J. Watson Research.
CSCI-455/552 Introduction to High Performance Computing Lecture 18.
G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs Sherif SakrSameh ElniketyYuxiong He NICTA & UNSW Sydney, Australia Microsoft Research Redmond,
Trees and Distance. 2.1 Basic properties Acyclic : a graph with no cycle Forest : acyclic graph Tree : connected acyclic graph Leaf : a vertex of degree.
Graph Algorithms. Definitions and Representation An undirected graph G is a pair (V,E), where V is a finite set of points called vertices and E is a finite.
A Clustering Algorithm based on Graph Connectivity Balakrishna Thiagarajan Computer Science and Engineering State University of New York at Buffalo.
TCP Traffic and Congestion Control in ATM Networks
Graph pattern matching Graph GGraph Qisomorphism f(a) = 1 f(b) = 6 f(c) = 8 f(d) = 3 f(g) = 5 f(h) = 2 f(i) = 4 f(j) = 7.
1 Frequent Subgraph Mining Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY June 12, 2010.
Shortest Path Problems Dijkstra’s Algorithm. Introduction Many problems can be modeled using graphs with weights assigned to their edges: Airline flight.
The all-pairs shortest path problem (APSP) input: a directed graph G = (V, E) with edge weights goal: find a minimum weight (shortest) path between every.
Answering pattern queries using views Yinghui Wu UC Santa Barbara Wenfei Fan University of EdinburghSouthwest Jiaotong University Xin Wang.
Partitioning the Labeled Spanning Trees of an Arbitrary Graph into Isomorphism Classes Austin Mohr.
Most of contents are provided by the website Graph Essentials TJTSD66: Advanced Topics in Social Media.
Distributed Graph Simulation: Impossibility and Possibility 1 Yinghui Wu Washington State University Wenfei Fan University of Edinburgh Southwest Jiaotong.
The Structure of the Web. Getting to knowing the Web How big is the web and how do you measure it? How many people use the web? How many use search engines?
Data Structures and Algorithms in Parallel Computing Lecture 3.
Research Directions for Big Data Graph Analytics John A. Miller, Lakshmish Ramaswamy, Krys J. Kochut and Arash Fard.
Graphs Chapter 12. Chapter 12: Graphs2 Chapter Objectives To become familiar with graph terminology and the different types of graphs To study a Graph.
Graph Theory. undirected graph node: a, b, c, d, e, f edge: (a, b), (a, c), (b, c), (b, e), (c, d), (c, f), (d, e), (d, f), (e, f) subgraph.
Introduction to Algorithms All-Pairs Shortest Paths My T. UF.
Suppose G = (V, E) is a directed network. Each edge (i,j) in E has an associated ‘length’ c ij (cost, time, distance, …). Determine a path of shortest.
Introduction to Graph Theory By: Arun Kumar (Asst. Professor) (Asst. Professor)
Chapter 20: Graphs. Objectives In this chapter, you will: – Learn about graphs – Become familiar with the basic terminology of graph theory – Discover.
Indian Institute of Technology Kharagpur PALLAB DASGUPTA Graph Theory: Trees Pallab Dasgupta, Professor, Dept. of Computer Sc. and Engineering, IIT
Outline  Introduction  Subgraph Pattern Matching  Types of Subgraph Pattern Matching  Models of Computation  Distributed Algorithms  Performance.
Yinghui Wu, SIGMOD Incremental Graph Pattern Matching Wenfei Fan Xin Wang Yinghui Wu University of Edinburgh Jianzhong Li Jizhou Luo Harbin Institute.
Data Structures and Algorithm Analysis Graph Algorithms Lecturer: Jing Liu Homepage:
Construction of Optimal Data Aggregation Trees for Wireless Sensor Networks Deying Li, Jiannong Cao, Ming Liu, and Yuan Zheng Computer Communications and.
Chapter 11. Chapter Summary  Introduction to trees (11.1)  Application of trees (11.2)  Tree traversal (11.3)  Spanning trees (11.4)
Lecture 20. Graphs and network models 1. Recap Binary search tree is a special binary tree which is designed to make the search of elements or keys in.
Pagerank and Betweenness centrality on Big Taxi Trajectory Graph
Parallel Programming By J. H. Wang May 2, 2017.
Probabilistic Data Management
Data Structures and Algorithms in Parallel Computing
Directed Graphs Directed Graphs 1 Shortest Path Shortest Path
CS223 Advanced Data Structures and Algorithms
On Efficient Graph Substructure Selection
Simulation based approach Shang Zechao
Efficient Subgraph Similarity All-Matching
CS 584 Project Write up Poster session for final Due on day of final
Graphs G = (V, E) V are the vertices; E are the edges.
Chapter 16 1 – Graphs Graph Categories Strong Components
Introduction to Graph Theory
Presentation transcript:

Research Directions for Big Data Graph Analytics John A. Miller, Lakshmish Ramaswamy, Krys J. Kochut and Arash Fard Department of Computer Science University of Georgia Athens, GA, USA

Outline ● Introduction ● Graph Analytics ● Problems in Graph Analytics ● Path Problems ● Pattern Problems ● Graph Simulations ● Graph Morphisms ● Comparison of Pattern Matching Models ● Applications ● Computational Models and Frameworks ● Conclusions and Future Work

Introduction ● Big Data Analytics ● Find patterns or relationships in large volumes of data y 1, y 2, …, y k = f (x 1, x 2, …, x k ) ● Input x to f can be stored in an m-by-n matrix and output y can be stored in an m-by-k matrix. ● Numerous techniques from Regression to Neural Networks may be applied to analyze the data, e.g., find correlations, make predictions.

Graph Analytics ● Another way to look at data focuses on how n data items are connected f(u, v) – where f is a function u and v are data items ● Indicating for example relatedness, similarity, distance, etc. ● Again an n-by-n matrix can be used to tabulate the function f ● In large data sets, many of the data items will not be directly connected, so it is more efficient to use some form of graph

Graph Analytics ● Vertex-labeled Digraph – directed graph with vertex labels ● Fully-labeled Multi-Digraph – adds edge labels – allows multiple edges between 2 vertices, if labels are distinct – G(V, E, L, l) – vertices, labeled edges, label set, vertex labeling

Problems in Graph Analytics ● There are many problems in graph analytics. ● Several involve the three “P”s ● Path Problems – Reachability – Shortest Path ● Pattern Problems – Graph Simulation – Graph Morphisms ● Partition Problems

Path Problems ● Reachability – Given a graph G and two vertices, u, w ∈ G.V, find a path (sequence of edges) connecting them path(u, w) = uv 1 l i1, v 1 v 2 l i2,..., v n w l in+1 ∈ G.E – Reachability is simply reach(u, w) = ∃ path(u, w) ● Shortest Path – Given k vertices, find a minimum distance path that includes all k vertices. – For k = 2, the two vertices will be endpoints – Algorithms: Dijkstra, Bellman-Ford

Pattern Problems ● Pattern Matching Model – Given a query graph Q, match its labeled vertices to corresponding labeled vertices in a data graph G Φ : Q.V → 2 G.V – All the pattern matching models start with matching vertex labels – For each vertex u in Q.V, require ∀ u' ∈ Φ (u), l(u') = l(u) ● Graph Simulation – For each label matching pair (u, u') in Q.V × G.V, require matching children ∀ v ∈ child Q (u), ∃ v' ∈ Φ(v) such that u'v' ∈ G.E

Example Query (reduced G)

Pattern Problems ● Dual Simulation – For each vertex pair (u, u') in Q.V × G.V, also require matching parents ∀ w ∈ parent Q (u), ∃ w' ∈ Φ(w) such that w'u' ∈ G.E ● Strong Simulation – Matching patterns in G must be contained in some ball B radius(B) = diameter(Q) – Locality makes the pattern in G look more like query Q

Pattern Problems ● Strict Simulation – only balls containing vertices in an initial dual match ● Tight Simulation – only balls with centers corresponding to a central vertex in Q radius(B) = radius(Q) ● CAR-tight Simulation – child and parent match must satisfy ● Type – e.g., child labels A and B ● Cardinality – e.g., 2 A's and 3 B's (new restriction)

Pattern Problems ● Graph Homomorphism – Mapping function f: Q.V → G.V such that ∀ u ∈ Q, u' = f(u), l(u') = l(u) uv ∈ Q.E implies u'v' ∈ G.E ● Subgraph Isomorphism – Bijective function f: Q.V → G'.V (G' subgraph of G) such that ∀ u ∈ Q, u' = f(u), l(u') = l(u) uv ∈ Q.E iff u'v' = f(u)f(v) ∈ G'.E

ModelSubgraph Results# Graph Sim Φ(1, 2, 3, 4) → ( {1, 6, 8, 12, 16, 19, 20, 24, 27, 30}, {2, 7, 13, 15, 17, 21, 23, 26, 29}, {3, 4, 5, 9, 11, 14, 18, 22, 25, 28}, {3, 4, 5, 9, 11, 14, 18, 22, 25, 28} ) 29 Dual Sim Φ(1, 2, 3, 4) → ( {1, 6, 8, 12, 16, 19, 20, 24, 27, 30}, {2, 7, 13, 15, 17, 21, 23, 26, 29}, {3, 4, 5, 9, 14, 18, 22, 25, 28}, {3, 4, 5, 9, 14, 18, 22, 25, 28} ) 28 Strong Sim Φ(1, 2, 3, 4) → ( {1, 6, 8}, {2, 7}, {3, 4, 5, 9}, {3, 4, 5, 9}), ( 12, 13, 14, 14), ( {16, 19, 20}, {15, 17, 21}, {14, 18, 22}, {14, 18, 22}) 20 Strict Sim Φ(1, 2, 3, 4) → ({1, 6, 8}, {2, 7}, {3, 4, 5, 9}, {3, 4, 5, 9}), (12, 13, 14, 14) 12 Tight Sim Φ(1, 2, 3, 4) → (1, 2, {3, 4, 5}, {3, 4, 5}), (12, 13, 14, 14) 8 car-Tight Sim Φ(1, 2, 3, 4) → (1, 2, {3, 4, 5}, {3, 4, 5}) 5 Graph Homomorph f(1, 2, 3, 4) → (1, 2, 3, 4), (1, 2, 3, 5), (1, 2, 4, 5), (1, 2, 3, 3), (1, 2, 4, 4), (1, 2, 5, 5), (12, 13, 14, 14) 8 Subgraph Isomorph f(1, 2, 3, 4) → (1, 2, 3, 4), (1, 2, 3, 5), (1, 2, 4, 5) 5 Comparison of Matching Models

ModelComplexity Class SourceResults Contained In Graph SimQuadraticHenzinger et al Dual SimCubicMa et al. 2011Graph Sim Strong SimCubicMa et al. 2011Dual Sim Strict SimCubicFard et al. 2013Strong Sim Tight SimCubicFard et al. 2014aStrict Sim Car-Tight SimCubicFard et al. 2014bTight Sim Graph Homeomorph NP-hardFortune et al Graph Homomorph NP-hardHell and Nesetril, 1990Graph Homeomorph and Tight Sim Subgraph Isomorph NP-hardGarey and Johnson, 1979Graph Homomorph and Car-Tight Sim Comparison of Matching Models

Applications ● Graph Databases – Neo4j, OrientDB and Titan ● Semantic Web – SPARQL Queries – (subject, predicate, object) subject -edge-> object ● Social Media – Facebook, Twitter, LinkedIn, Amazon – Find patterns in graphs

Computational Models and Frameworks ● MapReduce/Hadoop – May not be ideal for highly iterative algorithms ● Apache Storm – More efficient stream processing ● Apache Spark – Can maintain intermediate results in main memory ● Bulk Synchronous Parallel (BSP) – Vertex-Centric Model – Testing of Graph through Tight Simulation showed good scalability

Conclusions and Future Work ● Research in Graph Pattern Matching – Improved Sequential Algorithms ● Recent ones for Subgraph Isomorphism – DualIso [Saltz et al. 2014] and TurboIso [Han et al. 2013] – much faster than prior generation Ullmann and VF2 – Parallel and Distributed Algorithms/Implementations ● More Vertex-Centric Programming ● Asynchronous Programming – harder, faster – Extensions ● Adding Edge Labels ● Practical Solutions for Graph Database and SPARQL Query Engines

Full Example