Presentation is loading. Please wait.

Presentation is loading. Please wait.

Oracle Labs Graph Analytics Research Hassan Chafi Sr. Research Manager Oracle Labs Graph-TA 2/21/2014.

Similar presentations


Presentation on theme: "Oracle Labs Graph Analytics Research Hassan Chafi Sr. Research Manager Oracle Labs Graph-TA 2/21/2014."— Presentation transcript:

1 Oracle Labs Graph Analytics Research Hassan Chafi Sr. Research Manager Oracle Labs Graph-TA 2/21/2014

2 Copyright © 2013-2014, Oracle and/or its affiliates. All rights reserved. Confidential – Oracle Restricted 2  The following is intended to provide some insight into a line of research in Oracle Labs. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described in connection with any Oracle product or service remains at the sole discretion of Oracle. Any views expressed in this presentation are my own and do not necessarily reflect the views of Oracle.

3 Copyright © 2013-2014, Oracle and/or its affiliates. All rights reserved. Confidential – Oracle Restricted 3 Green-Marl A DSL for Graph Analysis  Green-Marl – A DSL for graph algorithms – started as Stanford Project (2011)  Approach – User program graph algorithms in an intuitive way (productivity) – The compiler creates an efficient implantation (performance) – For multiple different environments (portability)

4 Copyright © 2013-2014, Oracle and/or its affiliates. All rights reserved. Confidential – Oracle Restricted 4 Technical Challenges for Graph Processing Different Challenges for Different Concerns User Data Management Execution Flexibility How to persist the graph data? How to construct the graph representation? How to specify pattern-matching queries? How to specify graph algorithms? How to find patterns efficiently? How to run algorithms fast? How to handle very large graphs? Raw Data Which graph data model to use? How to visualize data and results?

5 Copyright © 2013-2014, Oracle and/or its affiliates. All rights reserved. Confidential – Oracle Restricted 5 Data Management Execution Flexibility Competitive Landscape \ RDF + Other RDF DBs.. ` + Other PG DBs..  Recent camp of so-called graph databases  Adopt property graph data model  Major focus on data management But, the product group (OSG) is trying to enter these sectors as well Property Graph Database HDFS Hadoop/Giraph  Engines (only) built for execution of graph algorithms  May consider distributed execution for very large graphs  Programming can be challenging (Distributed) Analytic Engines RDF Database (Pattern Matching)  RDF: more traditional, standardized graph data model  One big focus on pattern- matching applications Oracle already has expertise in this area

6 Copyright © 2013-2014, Oracle and/or its affiliates. All rights reserved. Confidential – Oracle Restricted 6 Data Management Execution Our Approach  We provide powerful graph analytic engines that are integrated with existing or developing Oracle technologies Flexibility RDF PG GMQL In-Memory Pattern Matching In-Memory Graph Analysis Green-Marl DSL + Compiler Property Graph Database Big Graph Analysis RDF Database (Pattern Matching) Distributed Graph Analysis BDA In-memory graph analytic engine for Oracle PG Distributed graph analytic engine for Oracle PG DSL that generates programs for both environments In memory pattern- matching accelerator Pattern-Matching QL for PG

7 Copyright © 2013-2014, Oracle and/or its affiliates. All rights reserved. Confidential – Oracle Restricted 7 Major Milestones Achieved So Far CY2011 (3Q) CY2012CY2013 Green-Marl DSL In-Memory Graph Analytic Engine (PG) Distributed Graph Analytic (BDA) In-Memory Pattern Matching (RDF) Spec & Initial Compiler Parallel C++ Runtime (Standalone) An Open-Source Distributed Engine (Giraph) Compiler for multiple back-ends Tech-Transfer Planned (OSG) Showed: our in- memory analysis runs 10~100x faster than a popular PG Database (Neo4J) In-Memory Engine Design Basic Feature Implementation Integration with Oracle PG Database Compiler Optimization Language Extension Showed: we can compile into very different environments Enables: 30+ Built- in Algorithms for the analytic engine Tech-Transfer Discussion ((BDA) Handles: multiple client, snapshot consistency, sharing instances … Design Exploration Basic Feature Implementation (On-going) Showed: Giraph has critical, innate performance and compatibility issues Compilation to Java Algorithms Implementation Started as University Project Showed: we can exploit network BW very efficiently *First target is to use the in- memory engine Algorithm Exploration Initial Implementation Can we apply the same parallel, in- memory approach to pattern matching? Will be a part of Oracle Property Graph Option Tech-Transfer Discussion (OSG/RDF) Showed: x200 faster than SQL- based solutions

8 Copyright © 2013-2014, Oracle and/or its affiliates. All rights reserved. Confidential – Oracle Restricted 8 Algorithm Implementation Detecting Components and Communities Tarjan’s, Kosaraju’s, Weakly Connected Components, Label Propagation (w/ variants), Soman and Narang’s Ranking and Walking Pagerank, Personalized Pagerank, Betwenness Centrality (w/ variants), Closeness Centrality, Degree Centrality, Eigenvector Centrality, HITS, Random walking and sampling (w/ variants) Evaluating Community Structures ∑ ∑ Conductance, Modularity Clustering Coefficient (Triangle Counting) Adamic-Adar Path-Finding Hop-Distance (BFS) Dijkstra’s, Bi-directional Dijkstra’s Bellman-Ford’s Link Prediction SALSA (Twitter’s Who-to-follow) Other Classics Vertex Cover Minimum Spanning-Tree(Prim’s)

9 Copyright © 2013-2014, Oracle and/or its affiliates. All rights reserved. Confidential – Oracle Restricted 9 Algorithm: Triangle Counting Experimental Results Our implementation running on two different architectures (1 machine) GraphLab’s implementation running on 31 machines Hadoop implementation running on 1000+ machines *preprocessing time included Our single machine implementation outperforms other distributed systems Hadoop takes a lot of execution time SPARC provides additional performance benefits Hadoop numbers are excerpted from WWW’11paper

10 Copyright © 2013-2014, Oracle and/or its affiliates. All rights reserved. Confidential – Oracle Restricted 10 Subgraph Isomorphism Problem A B B C B A C B B C A B A B C B X Y Z Z Z Y Y Z Z Y X Z Z Z X Y Z X X Z Query Graph Q Data Graph G Subgraphs of G that are isomorphic to Q

11 Copyright © 2013-2014, Oracle and/or its affiliates. All rights reserved. Confidential – Oracle Restricted 11 Experimental Results: Comparison against DB LUBM 8K and 25K on x86 and Sparc 221x 174x 100x 206x 162x 92x 80x 103x


Download ppt "Oracle Labs Graph Analytics Research Hassan Chafi Sr. Research Manager Oracle Labs Graph-TA 2/21/2014."

Similar presentations


Ads by Google