# STAR: Steiner-Tree Approximation in Relationship Graphs Max-Planck Institute for Informatics, Database and Information Systems, Gjergji Kasneci, Maya Ramanath,

## Presentation on theme: "STAR: Steiner-Tree Approximation in Relationship Graphs Max-Planck Institute for Informatics, Database and Information Systems, Gjergji Kasneci, Maya Ramanath,"— Presentation transcript:

STAR: Steiner-Tree Approximation in Relationship Graphs Max-Planck Institute for Informatics, Database and Information Systems, Gjergji Kasneci, Maya Ramanath, Mauro Sozio, Fabian M. Suchanek, Gerhard Weikum

Introduction Entity-Relationship Graphs – An other way of representing relational Data – Consist of labeled Nodes and Edges, – Node Labels correspond to Entities – Edge Labels represent relations between Entities – Edge Weights and Entity relation strength. – Taxonomic Relations (subClassOf, type )

Introduction Example of an Entity Relationship Graph SpecializationSpecialization GeneralizationGeneralization

Introduction Quering E-R Graphs – The Relationship Search Query Class: Given a set of two, three, or more entities (nodes), find their closest relationships (edges or paths) that connect the entities in the strongest possible way. Strongest Related to Informativenes – A Relationship Search Query Example Query: “How are Germany’s chancellor Angela Merkel, the mathematician Richard Courant, Turing-Award winner Jim Gray, and the Dalai Lama related?” Informative answer: All have a doctoral degree from a German university – How are Angela Merkel, Arnold Swarzenegger, Max Plank and Germany are Related ?

Motivation and Problem Information Discovery as opposed to Lookup The Nature of the Answer – Can be a Tree embeded In Original Graph – Input Nodes (Query) must be connected by the Tree – How Good is the answer? A scoring model can exploit node and edge weights The formal Definition of the Problem: – Compute the k lowest-cost Steiner trees:

Motivation and Problem What is a Steiner Tree Problem? Steiner Tree Examples: Steiner tree for three terminals V’ = {A, B, C} Note the Steiner Point S. Steiner tree for four terminals V’ = {A, B, C, D} Note the Steiner Points S1, S2.

Motivation and Problem Steiner Tree Problem Complexity – NP-Hard Complete (Optimal) – Approximate Solution algorithms – Approximation Ratio: Measures the Quality of approximation algorithm Weight of Aproximate Graph out / weight of Optimal Graph Output Benefits by Reducing Approximation Ratio – Viable Runtimes (efficiency) – Better Graph quality (Informativenes) near-optimal

Paper Contributions Presents STAR a new Efficient algorithm – Computes near-optimal Steiner Trees – Exploits Taxonomic Schema (when available) – Viable Runtimes over large graphs STAR Approximation Ratio Proofs: – O(logn), for n given query entities (Worst Case) Improvement over other approximation ratios –, or – STAR practically is better than a - approximation algorithm STAR top-k tree capability STAR Outperforms State of the art algorithms by an order of magnitude Can be applied either on main memory datasets or on-disc resident Large Graphs. Evaluation via Comparison with other cutting edge algorithms

The Star Algorithm Introduction First Phase Second Phase Examples

The Star Algorithm Introduction First Phase Second Phase Examples

The Star Algorithm – Introduction Problem Definition – As Stated in introduction – Further we are interested in finding Top-k result trees by increasing order Exploitation of Taxonomic Backbones – Node Labels as Entities – Edge Labels as weights or relations – Taxonomic Availability is not compulsory Runs in 2 Phases Phase 1: Uses Taxonomic Information (when available) – Builds a quick Tree by pruning the Original Graph – Interconnects all given nodes Phase 2: Iteratively improves the Tree from Phase 1

The Star Algorithm Introduction First Phase Second Phase Examples

The Star Algorithm - First Phase Prunes Original Graph Runs Iterators in each Terminal Iterators Run in a Round Robin Manner Iterators Follow only Taxonomic Edges: – subClassOf, type

Single Breadth – First - Search Iterator Pruning Example

15 Breadth First Search s 2 5 4 7 8 369 Observe Taxonomic Structure

16 Breadth First Search s 2 5 4 7 8 369 0 Undiscovered Discovered Finished Queue: s Top of queue 2 1 Shortest path from s

17 Breadth First Search s 2 5 4 7 8 369 0 Undiscovered Discovered Finished Queue: s 2 Top of queue 3 1 1

18 Breadth First Search s 2 5 4 7 8 369 0 Undiscovered Discovered Finished Queue: s 2 3 Top of queue 5 1 1 1

19 Breadth First Search s 2 5 4 7 8 369 0 Undiscovered Discovered Finished Queue: 2 3 5 Top of queue 1 1 1

20 Breadth First Search s 2 5 4 7 8 369 0 Undiscovered Discovered Finished Queue: 2 3 5 Top of queue 4 1 1 1 2

21 Breadth First Search s 2 5 4 7 8 369 0 Undiscovered Discovered Finished Queue: 2 3 5 4 Top of queue 1 1 1 2 5 already discovered: don't enqueue

22 Breadth First Search s 2 5 4 7 8 369 0 Undiscovered Discovered Finished Queue: 2 3 5 4 Top of queue 1 1 1 2

23 Breadth First Search s 2 5 4 7 8 369 0 Undiscovered Discovered Finished Queue: 3 5 4 Top of queue 1 1 1 2

24 Breadth First Search s 2 5 4 7 8 369 0 Undiscovered Discovered Finished Queue: 3 5 4 Top of queue 1 1 1 2 6 2

25 Breadth First Search s 2 5 4 7 8 369 0 Undiscovered Discovered Finished Queue: 3 5 4 6 Top of queue 1 1 1 2 2

26 Breadth First Search s 2 5 4 7 8 369 0 Undiscovered Discovered Finished Queue: 5 4 6 Top of queue 1 1 1 2 2

27 Breadth First Search s 2 5 4 7 8 369 0 Undiscovered Discovered Finished Queue: 5 4 6 Top of queue 1 1 1 2 2

28 Breadth First Search s 2 5 4 7 8 369 0 Undiscovered Discovered Finished Queue: 4 6 Top of queue 1 1 1 2 2

29 Breadth First Search s 2 5 4 7 8 369 0 Undiscovered Discovered Finished Queue: 4 6 Top of queue 1 1 1 2 2 8 3

30 Breadth First Search s 2 5 4 7 8 369 0 Undiscovered Discovered Finished Queue: 4 6 8 Top of queue 1 1 1 2 2 3

31 Breadth First Search s 2 5 4 7 8 369 0 Undiscovered Discovered Finished Queue: 6 8 Top of queue 1 1 1 2 2 3 7 3

32 Breadth First Search s 2 5 4 7 8 369 0 Undiscovered Discovered Finished Queue: 6 8 7 Top of queue 1 1 1 2 2 3 9 3 3

33 Breadth First Search s 2 5 4 7 8 369 0 Undiscovered Discovered Finished Queue: 6 8 7 9 Top of queue 1 1 1 2 2 3 3 3

34 Breadth First Search s 2 5 4 7 8 369 0 Undiscovered Discovered Finished Queue: 8 7 9 Top of queue 1 1 1 2 2 3 3 3

35 Breadth First Search s 2 5 4 7 8 369 0 Undiscovered Discovered Finished Queue: 7 9 Top of queue 1 1 1 2 2 3 3 3

36 Breadth First Search s 2 5 4 7 8 369 0 Undiscovered Discovered Finished Queue: 7 9 Top of queue 1 1 1 2 2 3 3 3

37 Breadth First Search s 2 5 4 7 8 369 0 Undiscovered Discovered Finished Queue: 7 9 Top of queue 1 1 1 2 2 3 3 3

38 Breadth First Search s 2 5 4 7 8 369 0 Undiscovered Discovered Finished Queue: 7 9 Top of queue 1 1 1 2 2 3 3 3

39 Breadth First Search s 2 5 4 7 8 369 0 Undiscovered Discovered Finished Queue: 9 Top of queue 1 1 1 2 2 3 3 3

40 Breadth First Search s 2 5 4 7 8 369 0 Undiscovered Discovered Finished Queue: 9 Top of queue 1 1 1 2 2 3 3 3

41 Breadth First Search s 2 5 4 7 8 369 0 Undiscovered Discovered Finished Queue: 9 Top of queue 1 1 1 2 2 3 3 3

42 Breadth First Search s 2 5 4 7 8 369 0 Undiscovered Discovered Finished Queue: Top of queue 1 1 1 2 2 3 3 3

43 Breadth First Search s 2 5 4 7 8 369 0 Level Graph 1 1 1 2 2 3 3 3

First – Phase Example (Simple Breadth – First – Search Iterator from each Terminal) V’ = {Max Planck, Arnold Schwarzenegger, Germany}

45 Breadth First Search Iterators from Each Terminal As soon as iterators meet a result is constructed

46 Undiscovered Discovered Finished Top of queue Queue T2: T2 T3 Breadth First Search Iterators from Each Terminal

47 Undiscovered Discovered Finished Top of queue T1 T3 Breadth First Search Iterators from Each Terminal

48 Undiscovered Discovered Finished Top of queue T1 T2 Breadth First Search Iterators from Each Terminal

49 Undiscovered Discovered Finished Top of queue T1 T2 Breadth First Search Iterators from Each Terminal

50 Undiscovered Discovered Finished Top of queue T1 T2 Breadth First Search Iterators from Each Terminal

51 Undiscovered Discovered Finished Top of queue T1 T2 Breadth First Search Iterators from Each Terminal

52 Undiscovered Discovered Finished Top of queue T1 T2 Breadth First Search Iterators from Each Terminal

53 Undiscovered Discovered Finished Top of queue T1 T2 Breadth First Search Iterators from Each Terminal

54 Undiscovered Discovered Finished Top of queue T1 T2 Breadth First Search Iterators from Each Terminal

55 Undiscovered Discovered Finished Top of queue T1 T2 Breadth First Search Iterators from Each Terminal

56 Undiscovered Discovered Finished Top of queue T1 T2 Breadth First Search Iterators from Each Terminal

57 Undiscovered Discovered Finished Top of queue T1 T2 Breadth First Search Iterators from Each Terminal

58 Undiscovered Discovered Finished Top of queue T1 T2 Breadth First Search Iterators from Each Terminal

59 Undiscovered Discovered Finished Top of queue T1 T2 Breadth First Search Iterators from Each Terminal

60 Undiscovered Discovered Finished Top of queue T1 T2 Breadth First Search Iterators from Each Terminal

61 Undiscovered Discovered Finished Top of queue T1 T2 Breadth First Search Iterators from Each Terminal

62 Undiscovered Discovered Finished Top of queue T1 T2 Breadth First Search Iterators from Each Terminal Entity already discovered in T2 iterator: don't enqueue

63 Undiscovered Discovered Finished Top of queue T1 T2 Breadth First Search Iterators from Each Terminal Entity already discovered in T2 iterator: T2 & T3 Iterators Met  Stop T3 Iterator

64 Undiscovered Discovered Finished Top of queue T1 T2 Breadth First Search Iterators from Each Terminal

65 Undiscovered Discovered Finished Top of queue T1 T2 Breadth First Search Iterators from Each Terminal Entity already discovered in T2 iterator: T1 & T2 Iterators Met  Stop T1 Iterator

66 Queue T1: Undiscovered Discovered Finished Top of queue T2 Breadth First Search Iterators from Each Terminal

67 Queue T1: Undiscovered Discovered Finished Top of queue T2 Breadth First Search Iterators from Each Terminal

68 Queue T1: Undiscovered Discovered Finished Top of queue Queue T2: Breadth First Search Iterators from Each Terminal

The Star Algorithm – Second Phase Aims to Improve the Tree from Phase 1 Follows an iterative improvement procedure – Certain paths are replaced on each Iteration – New path weights are lower Some Definitions : Terminal Node: – Any node v є V’ Degree of a node v, deg(v): – Is the number of edges connected to the node Fixed Node: – Any node v, of deg(v) ≥ 3 – Any Terminal Node

The Star Algorithm – Second Phase Loose Path : – A path p in T is a loose path if it has minimal length and its end nodes are fixed nodes. Fixed nodes should not be removed during Improvement Follows that Every intermediate node v in a loose path must be a Steiner node of deg(v) = 2 A loose Path is a path that can be replaced during improvement process A minimal Steiner Tree with respect to V’ is a tree in which all loose paths represent shortest paths between fixed nodes.

The Star Algorithm – Second Phase Observations Removing a LP  T1, T1 subtrees Replacing any LP by a shorter – Compute shortest path between any node of T1 to any node of T2 Removing and Inserting LPs  Fixed nodes and Unfixed nodes

The Star Algorithm – Second Phase Finding an approximate Steiner Tree 1.Remove a LP 2.Decomposition of T into T1 and T2 3.Connect T1 and T2 by a shorter than LP path

The Star Algorithm – Second Phase Finding an approximate Steiner Tree

The Star Algorithm – Second Phase The Tree improving algorithm The Difficult Steiner Tree Problem is Reduced – Find shortes paths between node subsets In each iteration lp with max weight is removed (Heuristic)

The Star Algorithm – Second Phase The method: replace(lp, T) Removes the loose path form T T is split into subgraphs T1 and T2 The shortest path connecting any node of T1 to any node of T2 is determined – replace (lp, T) calls findShortestPath(VT1, VT2, lp) – findShortestPath(VT1, VT2, lp), returns the shortest path

Steiner Tree Approximation - Phase 2 The overall Graph G

Steiner Tree Approximation - Phase 2

T2

Phase 2– shortest Path Algorithm All pruned vertices are needed Runs “One single source shortest path iterator from V(T1) and V(T2)” i.e. Find the shortest path from a source Vertex V to all other vertices in graphs.

Phase 2– shortest Path Algorithm Vertex distance d(v) initialization Assign TWO distances (d1, d2) to each vertex Assign d1 = 0 to all vertices of V(T1) Assign d2 = 0 to all vertices of V(T2) Assign d1= ∞ to all vertices of V(T2) Assign d2= ∞ to all vertices of V(T1) Assign d1= d2 = ∞ to all pruned or not queried vertices

Phase 2– shortest Path Algorithm T1 is considered a single node of distance 0 from itself and distance ∞ from T2 T2 accordingly Other nodes not members of T1 or T2 have infinite distances from both T1 or T2

Phase 2– shortest Path Algorithm ItrCurOthVV’ 12 Q1 (d1)Q2 (d2) Arn(0)Ger(0) Pol(0) Max(0) Phy(0) Sci(0) Per(0) Current: points to iterator of minimal fringe nodes And that is currently expanded

Phase 2– shortest Path Algorithm ItrCurOthVV’ 12 121Ger Q1 (d1)Q2 (d2) Arn(0) Pol(0) Max(0) Phy(0) Sci(0) Per(0) Fringe(Q2) < Fringe (Q1) Swap (current, Other) Dequeue Germany form Q2

Phase 2– shortest Path Algorithm ItrCurOthVV’ 12 21Ger 121 Sta Q1 (d1)Q2 (d2) Arn(0)Sta(0,95) Pol(0) Max(0) Phy(0) Sci(0) Per(0) d2(State) = 0 + 0,95 Enqueue(State) in Q2 0,95

Phase 2– shortest Path Algorithm ItrCurOthVV’ 12 21Ger 121 Sta 221GerAng Q1 (d1)Q2 (d2) Arn(0)Ang(0,96) Pol(0)Sta(0,95) Max(0) Phy(0) Sci(0) Per(0) d2(Angela Merkel) = 0 + 0,96 Enqueue Angela Merkel in Q2 0,95 0,96

Phase 2– shortest Path Algorithm ItrCurOthVV’ 12 21Ger 121 Sta 121GerAng 221 Q1 (d1)Q2 (d2) Arn(0)Sta(0,95) Pol(0) Max(0) Phy(0) Sci(0) Per(0) Dequeue Angela Merkel from Q2 0,95 0,96

Phase 2– shortest Path Algorithm ItrCurOthVV’ 12 21Ger 121 Sta 121GerAng 221 Phy Q1 (d1)Q2 (d2) Arn(0)Phy(1,91) Pol(0)Sta(0,95) Max(0) Phy(0) Sci(0) Per(0) d2(Physicist) = 0,96 + 0,95 Enqueue Physicist in Q2 0,95 0,96 0,95

Phase 2– shortest Path Algorithm ItrCurOthVV’ 12 21Ger 121 Sta 121GerAng 221 Phy 221AngPol Q1 (d1)Q2 (d2) Arn(0)Phy(1,91) Pol(0)Pol(1,91) Max(0)Sta(0,95) Phy(0) Sci(0) Per(0) d2(Politician) = 0,96 + 0,95 Enqueue Politician in Q2 0,95 0,96 0,95

Phase 2– shortest Path Algorithm ItrCurOthVV’ 12 21Ger 121 Sta 121GerAng 221 Phy 221AngPol 321Phy Q1 (d1)Q2 (d2) Arn(0)Pol(1,91) Pol(0)Sta(0,95) Max(0) Phy(0) Sci(0) Per(0) Dequeue Physicist from Q2 0,95 0,96 0,95

Phase 2– shortest Path Algorithm ItrCurOthVV’ 12 21Ger 121 Sta 121GerAng 221 Phy 221AngPol 321PhySci Q1 (d1)Q2 (d2) Arn(0)Sci (2,9) Pol(0)Pol(1,91) Max(0)Sta(0,95) Phy(0) Sci(0) Per(0) d2(Scientist) = 1,91 + 0,99=2,9 Enqueue Scintist in Q2 0,95 0,96 0,95 0,99

Phase 2– shortest Path Algorithm ItrCurOthVV’ 12 21Ger 121 Sta 121GerAng 221 Phy 221AngPol 321PhySci Q1 (d1)Q2 (d2) Arn(0)Sci (2,9) Pol(0)Pol(1,91) Max(0)Sta(0,95) Phy(0) Sci(0) Per(0) 0,95 0,96 0,95 0,99 Stop since Physicist ϵ V(T1)

Phase 2– shortest Path Algorithm ItrCurOthVV’ 12 21Ger 121 Sta 121GerAng 221 Phy 221AngPol 321PhySci Q1 (d1)Q2 (d2) Arn(0)Per(3,8) Pol(0)Pol(1,91) Max(0)Sta(0,95) Phy(0) Sci(0) Per(0) Return vertices in vector V : V = {Germany, Angela Merkel, Physicist } 0,95 0,96 0,95 0,99

Phase 2– shortest Path Algorithm First Iteration Result:

Phase 2– shortest Path Algorithm Second Iteration: Remove LP Apply Again the algorithm: To find Shortest Path between T1 and T2 Stop here Since no Loose Paths can be improved

Aproximation Guarantee Lemmas and Theorems Lemma 1 – A Tree T with terminal set V’, |V’| ≥ 2 has at least |V’| - 1 and at most 2|V’| - 3 loose paths. The approximation ratio for the cost of the tree returned by star is independent of the 1 st Phase result.

Aproximation Guarantee Lemmas and Theorems Lemma 2 – Let T A be the Steiner tree yielded by the STAR algorithm. Let L (T A ) be the set of loose paths in T A. For any circular ordering u 1, …, u N of the terminals in T A there is a mapping μ: L (T A )  V’ X V’, such that: 1.μ is defined for all loose paths in T A 2.For each loose path P with end points u and v, let T 1 and T 2 the two trees obtained by removing from T A all nodes in P (and their edges), except u and v; then μ(P) = {u i, u i+1 } for some i=1, …, N and one of the nodes u i, u i+1 belongs to T 1, while the other one belongs to T 2 ; 3.For each pair of terminals {u i, u i+1 } there are at most 2┌ logN ┐ +2 loose paths mapped to {u i, u i+1 }.

Aproximation Guarantee Lemmas and Theorems Theorem 1 (approximation order) – The STAR algorithm is a (4┌ logN┐+4 )-approximation algorithm for the Steiner Tree Problem. – Therefore:

Aproximation Guarantee Lemmas and Theorems Improvement Guarantee Rule – STAR might have exponential running time. – Infinitesimally small amount cost reduction at each iteration. – An Improvement Guarantee Rule solves this: – Replace loose path P if and only if: – Where P’ is the path to be replaced by STAR, given that є > 0

Aproximation Guarantee Lemmas and Theorems Lemma 3 (Time complexity ) – Given є > 0, the STAR algorithm with the improvement-guarantee rule is guaranteed to terminate in – steps – Where m is the number of edges – is the ratio of the maximum and minimum cost of the edges in the input graph.

Aproximation Guarantee Lemmas and Theorems Theorem 2 – Given є > 0, the STAR algorithm with the improvement-guarantee rule is a - approximation algorithm for the steiner tree problem. Its Running time is Where n, m, N denote the number of Vertices, edges and terminals of the input graph.

Approximate Top-K Interconnections Observing loose path weight is an upper bound for new interconnecting path weights No loose paths in the final tree T after improvements Top-K interconections are computed starting from the final tree T returned by original STAR

Approximate Top-K Interconnections Lines 1-3 compute the original tree T T is enqueued in priority queue Q New trees generated by artificially relaxing current tree lps (Lines 4-9)

Approximate Top-K Interconnections Relax(T, є ) – Tunable value є >0 used to artificially create loose path weights – New weights used as upper bounds. – Artificial Upper Bounds for New interconnecting paths between sub trees

Approximate Top-K Interconnections improveTree’(T’, V’) – Replace(lp, T) calls findShortestPath(V(T1), V (T2), lp) – findShortestPath(V(T1), V (T2), lp) uses higher artificiall weights – New interconnecting paths are not the same but still the shortest between T1, T2. – Node disjoint to loose path new interconnecting paths considered. – This gives us result diversity required for top-k Original algorithm

Approximate Top-K Interconnections reweight(T’) – Re-weights the result of improveTree(T, V’) by: – Acting on loose paths of T’ (also loose paths of T) – Setting back W(T’) to its initial value before relaxation.

Evaluation STAR Compared to most known Steiner Tree Approximation Algorithms: – DNH, DPBF, BLINKS, BANKS (both versions) Compared in terms of quality (avg. weight) and performance (avg. runtime) Semantic Quality or User perceived Relevance is not Considered An earlier work of them showed that: – A steiner tree based scoring function contribute to high relevance from a users view point

Evaluation - Algorithms in Comparison DNH (Distance Network Heuristic) – 2-approximation algorithm DPBF – Dynamic programming approach – Optimal Tree Can be computed (not an approximation) – Best on small number of terminals (Queries) BLINKS – Newest – Experimentally BEST in the field Banks I & II – Keyword proximity search on relational data

Evaluation Types of Comparisons Performed Top-1 comparison of STAR, DNH DPBF, BANKS I & II Top-k comparison of STAR, BANKS I & II, BLINKS External Storage Comparison of STAR and BANKS

Top-1 Comparisons (STAR, DNH, DPBF, BANKS I & II) Worst Case Theoretical properties of algorithms: – DNH, approximation ratio: 2(1- 1/n), n =|V’| Goal a good approximation ratio on given G, V’ – STAR, approximation ratio: 4logn + 4 – BANKS I & II approximation ratio: O(n) – DPBF, approximation ratio: Does n’t nave (Optimal Steiner Tree) Used for comparison of all others to optimal tree weights

Top-1 Comparisons (STAR, DNH, DPBF, BANKS I & II) Datasets – View DBLP and IMDB as Graphs Nodes  entities: (author, publication, conference, actor, movie, year, etc.) Edges  Relations: (cited by, author of, acted in, etc.). – Dataset DBLB: Sub graph of 15,000 nodes & 150,000 Edges. Due to DNH & DPBF constraints (perform on main memory only) – Dataset IMDB : Sub graph of 30,000 nodes & 80,000 Edges. – Two Different Datasets needed to tackle different Topologies – No edge weights present in both datasets -> randomly assigned – No taxonomic information present in both datasets (Not a problem for STAR tackled in 1 st Phase)

Top-1 Comparisons (STAR, DNH, DPBF, BANKS I & II) Queries – Query sets of 3, 5 and 7 – Each set of 60 queries – Same number of terminals only – Randomly acquired terminals

Top-1 Comparisons (STAR, DNH, DPBF, BANKS I & II) Metrics – Reference: Optimal Scores Returned by DPBF – Compare weight by STAR to weights by all others – Running times of all Algorithms comparison

Top-1 Comparisons (STAR, DNH, DPBF, BANKS I & II) Results – Observe DPBF performance for all #terminals Weight Runtime – Observe DPBF performance for 7 #terminals ????

Top-1 Comparisons (STAR, DNH, DPBF, BANKS I & II) DBLP Results – For all –Terminals STAR weight is better than all the others STAR runtime outperforms all others Even though DNH has a better Approximation Ratio

Top-1 Comparisons (STAR, DNH, DPBF, BANKS I & II) IMDB Results – STAR weight is slightly not better than DNH – A hypothesis is; DBLP Higher Edge-To-Node Ratio – Banks II performance improved relative to competitors ? – Still Outperformed by STAR

Top-k Comparisons (STAR, BANKS I & II, BLINKS) DNH can not compute Top-k results BLINKS – Uses indexing for Query time Speedup – Requires Entire Graph in Main Memory – Datasets are again used – Uses a partitioning strategy (Block Sizes of Nodes) – Initially Tuned for better results DBLB: 100 node Block Size IMDB: 5 node Block Size

Top-k Comparisons (STAR, BANKS I & II, BLINKS) Metrics – BLINKS avg. weight is not applicable Returns only Root nodes of result trees at output Queries – Comparison for k=10, k=50, k=100 – DBLP & IMDB: 5 terminals Random queries 60 queries

Top-k Comparisons (STAR, BANKS I & II, BLINKS) Results – Index construction Time by BLINKS excluded – BLINKS has the worst runtime though – BANKS II & BLINKS runtimes is worse on denser DBLP Graph

Top-k Comparisons (STAR, BANKS I & II, BLINKS)

External Storage Comparison of STAR and BANKS STAR & BANKS direct applicability to Graphs NOT FITED to main memory Simulation of such a scenario – Disk Resident Datasets Dataset: – YAGO Knowledge Base ( Nodes: 1.7 Milion, Edges: 14 Milion) Edge Weights supoted Graph Stored in a Relational Database of Schema: EDGE(source, target, weight) Type and Subclass taxonomy (STAR 1 st Phase) supported – Database Call overhead uniformly treated on STAR & BANKS

External Storage Comparison of STAR and BANKS STAR & BANKS direct applicability to Graphs NOT FITED to main memory Simulation of such a scenario: – Disk Resident Datasets Dataset: – YAGO Knowledge Base ( Nodes: 1.7 Milion, Edges: 14 Milion) Edge Weights supoted Type and Subclass taxonomy (STAR 1 st Phase) supported – Graph Stored in a Relational Database of Schema: EDGE(source, target, weight) – Edge Exploration: Database access for each edge – overhead uniformly treated on both STAR & BANKS by edge loading.

External Storage Comparison of STAR and BANKS Queries: – 2 sets, 3 and 6 Terminals – Top-1, Top-3, Top-6 results – Terminal nodes randomly chosen – 30 queries made Metrics: – Average Weight (quality of output Trees) – Efficiency (running times) – Number of edges accessed

External Storage Comparison of STAR and BANKS Results: – BANKS I & II, some times 30 min to return results Excluded from Evaluation – fair enough – STAR Outperforms: an order of magnitude faster – STAR accesses an order of magnitude fewer edges Gain from taxonomic structure (1 st Phase)

Results Summary Fairness by Giving all algorithms the same inputs Diversity of algorithms – DNH only handles graphs in main memory – BLINKS: Indexing, different metric, luck of approximation guarantee – Not Steiner-Tree-Like query methods STAR outstanding performance: – 1) Graph Taxonomic Structure when Possible – 2) Iterators needed per improvement step, Number of Terminal Independence – 3) Tight upper bounds and path pruning

Conclusion E-R Style data Graph Query Problem addressed Inherent Taxonomic Structure Exploited STAR Does not depend ONLY on Taxonomic Information – 2 nd Phase fast “findShortestPath” algorithm DNH Contradiction: – Better approximation rate while similar results as STAR STAR achieves a good approximation O(logn), to Optimal Steiner Tree