STAR: Steiner-Tree Approximation in Relationship Graphs Max-Planck Institute for Informatics, Database and Information Systems, Gjergji Kasneci, Maya Ramanath,

Slides:



Advertisements
Similar presentations
The Primal-Dual Method: Steiner Forest TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A A AA A A A AA A A.
Advertisements

Chapter 5: Tree Constructions
Review: Search problem formulation
Heuristic Search techniques
Problem solving with graph search
An Introduction to Artificial Intelligence
Greedy best-first search Use the heuristic function to rank the nodes Search strategy –Expand node with lowest h-value Greedily trying to find the least-cost.
Efficient Keyword Search for Smallest LCAs in XML Database Yu Xu Department of Computer Science & Engineering University of California, San Diego Yannis.
~1~ Infocom’04 Mar. 10th On Finding Disjoint Paths in Single and Dual Link Cost Networks Chunming Qiao* LANDER, CSE Department SUNY at Buffalo *Collaborators:
CSE 326: Data Structures Lecture #19 Approaches to Graph Exploration Bart Niswonger Summer Quarter 2001.
Lecture 24 Coping with NPC and Unsolvable problems. When a problem is unsolvable, that's generally very bad news: it means there is no general algorithm.
DISCOVER: Keyword Search in Relational Databases Vagelis Hristidis University of California, San Diego Yannis Papakonstantinou University of California,
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
Combinatorial Algorithms
Optimization of Pearl’s Method of Conditioning and Greedy-Like Approximation Algorithm for the Vertex Feedback Set Problem Authors: Ann Becker and Dan.
Approximating Maximum Edge Coloring in Multigraphs
Complexity 15-1 Complexity Andrei Bulatov Hierarchy Theorem.
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
1 Internet Networking Spring 2006 Tutorial 6 Network Cost of Minimum Spanning Tree.
A general approximation technique for constrained forest problems Michael X. Goemans & David P. Williamson Presented by: Yonatan Elhanani & Yuval Cohen.
Greedy Algorithms Reading Material: Chapter 8 (Except Section 8.5)
SubSea: An Efficient Heuristic Algorithm for Subgraph Isomorphism Vladimir Lipets Ben-Gurion University of the Negev Joint work with Prof. Ehud Gudes.
2-Layer Crossing Minimisation Johan van Rooij. Overview Problem definitions NP-Hardness proof Heuristics & Performance Practical Computation One layer:
1 Internet Networking Spring 2004 Tutorial 6 Network Cost of Minimum Spanning Tree.
The community-search problem and how to plan a successful cocktail party Mauro SozioAris Gionis Max Planck Institute, Germany Yahoo! Research, Barcelona.
Greedy Algorithms Like dynamic programming algorithms, greedy algorithms are usually designed to solve optimization problems Unlike dynamic programming.
Distributed Combinatorial Optimization
Steiner trees Algorithms and Networks. Steiner Trees2 Today Steiner trees: what and why? NP-completeness Approximation algorithms Preprocessing.
Tirgul 7 Review of graphs Graph algorithms: – BFS (next tirgul) – DFS – Properties of DFS – Topological sort.
The Shortest Path Problem
Outline Introduction The hardness result The approximation algorithm.
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5: Global Routing © KLMH Lienig 1 FLUTE: Fast Lookup Table Based RSMT Algorithm.
Authors: Bhavana Bharat Dalvi, Meghana Kshirsagar, S. Sudarshan Presented By: Aruna Keyword Search on External Memory Data Graphs.
V. V. Vazirani. Approximation Algorithms Chapters 3 & 22
Efficient Gathering of Correlated Data in Sensor Networks
Computational Complexity Polynomial time O(n k ) input size n, k constant Tractable problems solvable in polynomial time(Opposite Intractable) Ex: sorting,
Network Aware Resource Allocation in Distributed Clouds.
Mehdi Kargar Aijun An York University, Toronto, Canada Discovering Top-k Teams of Experts with/without a Leader in Social Networks.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
Graph Algorithms. Definitions and Representation An undirected graph G is a pair (V,E), where V is a finite set of points called vertices and E is a finite.
A Clustering Algorithm based on Graph Connectivity Balakrishna Thiagarajan Computer Science and Engineering State University of New York at Buffalo.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
Graph-based Text Classification: Learn from Your Neighbors Ralitsa Angelova , Gerhard Weikum : Max Planck Institute for Informatics Stuhlsatzenhausweg.
Zhuo Peng, Chaokun Wang, Lu Han, Jingchao Hao and Yiyuan Ba Proceedings of the Third International Conference on Emerging Databases, Incheon, Korea (August.
CSE 589 Part VI. Reading Skiena, Sections 5.5 and 6.8 CLR, chapter 37.
Lecture 3: Uninformed Search
Optimizing Pheromone Modification for Dynamic Ant Algorithms Ryan Ward TJHSST Computer Systems Lab 2006/2007 Testing To test the relative effectiveness.
Efficient Computing k-Coverage Paths in Multihop Wireless Sensor Networks XuFei Mao, ShaoJie Tang, and Xiang-Yang Li Dept. of Computer Science, Illinois.
Hedonic Clustering Games Moran Feldman Joint work with: Seffi Naor and Liane Lewin-Eytan.
Routing Topology Algorithms Mustafa Ozdal 1. Introduction How to connect nets with multiple terminals? Net topologies needed before point-to-point routing.
15.082J & 6.855J & ESD.78J September 30, 2010 The Label Correcting Algorithm.
Algorithms for hard problems Parameterized complexity Bounded tree width approaches Juris Viksna, 2015.
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
NOTE: To change the image on this slide, select the picture and delete it. Then click the Pictures icon in the placeholder to insert your own image. Fast.
Clustering Data Streams A presentation by George Toderici.
STAR: Steiner-Tree Approximation in Relationship Graphs Presented By: Moamen Khet The Hebrew University Of Jerusalem Seminar on Databases & The Internet.
1 Chapter 5 Branch-and-bound Framework and Its Applications.
Lecture 3: Uninformed Search
Steiner trees: Approximation Algorithms
Minimum Spanning Tree 8/7/2018 4:26 AM
Haim Kaplan and Uri Zwick
Chapter 5. Optimal Matchings
Structural graph parameters Part 2: A hierarchy of parameters
Enumerating Distances Using Spanners of Bounded Degree
Bart M. P. Jansen June 3rd 2016, Algorithms for Optimization Problems
Introduction Wireless Ad-Hoc Network
All pairs shortest path problem
Algorithms (2IL15) – Lecture 7
Compact routing schemes with improved stretch
Presentation transcript:

STAR: Steiner-Tree Approximation in Relationship Graphs Max-Planck Institute for Informatics, Database and Information Systems, Gjergji Kasneci, Maya Ramanath, Mauro Sozio, Fabian M. Suchanek, Gerhard Weikum

Introduction Entity-Relationship Graphs – An other way of representing relational Data – Consist of labeled Nodes and Edges, – Node Labels correspond to Entities – Edge Labels represent relations between Entities – Edge Weights and Entity relation strength. – Taxonomic Relations (subClassOf, type )

Introduction Example of an Entity Relationship Graph SpecializationSpecialization GeneralizationGeneralization

Introduction Quering E-R Graphs – The Relationship Search Query Class: Given a set of two, three, or more entities (nodes), find their closest relationships (edges or paths) that connect the entities in the strongest possible way. Strongest Related to Informativenes – A Relationship Search Query Example Query: “How are Germany’s chancellor Angela Merkel, the mathematician Richard Courant, Turing-Award winner Jim Gray, and the Dalai Lama related?” Informative answer: All have a doctoral degree from a German university – How are Angela Merkel, Arnold Swarzenegger, Max Plank and Germany are Related ?

Motivation and Problem Information Discovery as opposed to Lookup The Nature of the Answer – Can be a Tree embeded In Original Graph – Input Nodes (Query) must be connected by the Tree – How Good is the answer? A scoring model can exploit node and edge weights The formal Definition of the Problem: – Compute the k lowest-cost Steiner trees:

Motivation and Problem What is a Steiner Tree Problem? Steiner Tree Examples: Steiner tree for three terminals V’ = {A, B, C} Note the Steiner Point S. Steiner tree for four terminals V’ = {A, B, C, D} Note the Steiner Points S1, S2.

Motivation and Problem Steiner Tree Problem Complexity – NP-Hard Complete (Optimal) – Approximate Solution algorithms – Approximation Ratio: Measures the Quality of approximation algorithm Weight of Aproximate Graph out / weight of Optimal Graph Output Benefits by Reducing Approximation Ratio – Viable Runtimes (efficiency) – Better Graph quality (Informativenes) near-optimal

Paper Contributions Presents STAR a new Efficient algorithm – Computes near-optimal Steiner Trees – Exploits Taxonomic Schema (when available) – Viable Runtimes over large graphs STAR Approximation Ratio Proofs: – O(logn), for n given query entities (Worst Case) Improvement over other approximation ratios –, or – STAR practically is better than a - approximation algorithm STAR top-k tree capability STAR Outperforms State of the art algorithms by an order of magnitude Can be applied either on main memory datasets or on-disc resident Large Graphs. Evaluation via Comparison with other cutting edge algorithms

The Star Algorithm Introduction First Phase Second Phase Examples

The Star Algorithm Introduction First Phase Second Phase Examples

The Star Algorithm – Introduction Problem Definition – As Stated in introduction – Further we are interested in finding Top-k result trees by increasing order Exploitation of Taxonomic Backbones – Node Labels as Entities – Edge Labels as weights or relations – Taxonomic Availability is not compulsory Runs in 2 Phases Phase 1: Uses Taxonomic Information (when available) – Builds a quick Tree by pruning the Original Graph – Interconnects all given nodes Phase 2: Iteratively improves the Tree from Phase 1

The Star Algorithm Introduction First Phase Second Phase Examples

The Star Algorithm - First Phase Prunes Original Graph Runs Iterators in each Terminal Iterators Run in a Round Robin Manner Iterators Follow only Taxonomic Edges: – subClassOf, type

Single Breadth – First - Search Iterator Pruning Example

15 Breadth First Search s Observe Taxonomic Structure

16 Breadth First Search s Undiscovered Discovered Finished Queue: s Top of queue 2 1 Shortest path from s

17 Breadth First Search s Undiscovered Discovered Finished Queue: s 2 Top of queue 3 1 1

18 Breadth First Search s Undiscovered Discovered Finished Queue: s 2 3 Top of queue

19 Breadth First Search s Undiscovered Discovered Finished Queue: Top of queue 1 1 1

20 Breadth First Search s Undiscovered Discovered Finished Queue: Top of queue

21 Breadth First Search s Undiscovered Discovered Finished Queue: Top of queue already discovered: don't enqueue

22 Breadth First Search s Undiscovered Discovered Finished Queue: Top of queue

23 Breadth First Search s Undiscovered Discovered Finished Queue: Top of queue

24 Breadth First Search s Undiscovered Discovered Finished Queue: Top of queue

25 Breadth First Search s Undiscovered Discovered Finished Queue: Top of queue

26 Breadth First Search s Undiscovered Discovered Finished Queue: Top of queue

27 Breadth First Search s Undiscovered Discovered Finished Queue: Top of queue

28 Breadth First Search s Undiscovered Discovered Finished Queue: 4 6 Top of queue

29 Breadth First Search s Undiscovered Discovered Finished Queue: 4 6 Top of queue

30 Breadth First Search s Undiscovered Discovered Finished Queue: Top of queue

31 Breadth First Search s Undiscovered Discovered Finished Queue: 6 8 Top of queue

32 Breadth First Search s Undiscovered Discovered Finished Queue: Top of queue

33 Breadth First Search s Undiscovered Discovered Finished Queue: Top of queue

34 Breadth First Search s Undiscovered Discovered Finished Queue: Top of queue

35 Breadth First Search s Undiscovered Discovered Finished Queue: 7 9 Top of queue

36 Breadth First Search s Undiscovered Discovered Finished Queue: 7 9 Top of queue

37 Breadth First Search s Undiscovered Discovered Finished Queue: 7 9 Top of queue

38 Breadth First Search s Undiscovered Discovered Finished Queue: 7 9 Top of queue

39 Breadth First Search s Undiscovered Discovered Finished Queue: 9 Top of queue

40 Breadth First Search s Undiscovered Discovered Finished Queue: 9 Top of queue

41 Breadth First Search s Undiscovered Discovered Finished Queue: 9 Top of queue

42 Breadth First Search s Undiscovered Discovered Finished Queue: Top of queue

43 Breadth First Search s Level Graph

First – Phase Example (Simple Breadth – First – Search Iterator from each Terminal) V’ = {Max Planck, Arnold Schwarzenegger, Germany}

45 Breadth First Search Iterators from Each Terminal As soon as iterators meet a result is constructed

46 Undiscovered Discovered Finished Top of queue Queue T2: T2 T3 Breadth First Search Iterators from Each Terminal

47 Undiscovered Discovered Finished Top of queue T1 T3 Breadth First Search Iterators from Each Terminal

48 Undiscovered Discovered Finished Top of queue T1 T2 Breadth First Search Iterators from Each Terminal

49 Undiscovered Discovered Finished Top of queue T1 T2 Breadth First Search Iterators from Each Terminal

50 Undiscovered Discovered Finished Top of queue T1 T2 Breadth First Search Iterators from Each Terminal

51 Undiscovered Discovered Finished Top of queue T1 T2 Breadth First Search Iterators from Each Terminal

52 Undiscovered Discovered Finished Top of queue T1 T2 Breadth First Search Iterators from Each Terminal

53 Undiscovered Discovered Finished Top of queue T1 T2 Breadth First Search Iterators from Each Terminal

54 Undiscovered Discovered Finished Top of queue T1 T2 Breadth First Search Iterators from Each Terminal

55 Undiscovered Discovered Finished Top of queue T1 T2 Breadth First Search Iterators from Each Terminal

56 Undiscovered Discovered Finished Top of queue T1 T2 Breadth First Search Iterators from Each Terminal

57 Undiscovered Discovered Finished Top of queue T1 T2 Breadth First Search Iterators from Each Terminal

58 Undiscovered Discovered Finished Top of queue T1 T2 Breadth First Search Iterators from Each Terminal

59 Undiscovered Discovered Finished Top of queue T1 T2 Breadth First Search Iterators from Each Terminal

60 Undiscovered Discovered Finished Top of queue T1 T2 Breadth First Search Iterators from Each Terminal

61 Undiscovered Discovered Finished Top of queue T1 T2 Breadth First Search Iterators from Each Terminal

62 Undiscovered Discovered Finished Top of queue T1 T2 Breadth First Search Iterators from Each Terminal Entity already discovered in T2 iterator: don't enqueue

63 Undiscovered Discovered Finished Top of queue T1 T2 Breadth First Search Iterators from Each Terminal Entity already discovered in T2 iterator: T2 & T3 Iterators Met  Stop T3 Iterator

64 Undiscovered Discovered Finished Top of queue T1 T2 Breadth First Search Iterators from Each Terminal

65 Undiscovered Discovered Finished Top of queue T1 T2 Breadth First Search Iterators from Each Terminal Entity already discovered in T2 iterator: T1 & T2 Iterators Met  Stop T1 Iterator

66 Queue T1: Undiscovered Discovered Finished Top of queue T2 Breadth First Search Iterators from Each Terminal

67 Queue T1: Undiscovered Discovered Finished Top of queue T2 Breadth First Search Iterators from Each Terminal

68 Queue T1: Undiscovered Discovered Finished Top of queue Queue T2: Breadth First Search Iterators from Each Terminal

The Star Algorithm – Second Phase Aims to Improve the Tree from Phase 1 Follows an iterative improvement procedure – Certain paths are replaced on each Iteration – New path weights are lower Some Definitions : Terminal Node: – Any node v є V’ Degree of a node v, deg(v): – Is the number of edges connected to the node Fixed Node: – Any node v, of deg(v) ≥ 3 – Any Terminal Node

The Star Algorithm – Second Phase Loose Path : – A path p in T is a loose path if it has minimal length and its end nodes are fixed nodes. Fixed nodes should not be removed during Improvement Follows that Every intermediate node v in a loose path must be a Steiner node of deg(v) = 2 A loose Path is a path that can be replaced during improvement process A minimal Steiner Tree with respect to V’ is a tree in which all loose paths represent shortest paths between fixed nodes.

The Star Algorithm – Second Phase Observations Removing a LP  T1, T1 subtrees Replacing any LP by a shorter – Compute shortest path between any node of T1 to any node of T2 Removing and Inserting LPs  Fixed nodes and Unfixed nodes

The Star Algorithm – Second Phase Finding an approximate Steiner Tree 1.Remove a LP 2.Decomposition of T into T1 and T2 3.Connect T1 and T2 by a shorter than LP path

The Star Algorithm – Second Phase Finding an approximate Steiner Tree

The Star Algorithm – Second Phase The Tree improving algorithm The Difficult Steiner Tree Problem is Reduced – Find shortes paths between node subsets In each iteration lp with max weight is removed (Heuristic)

The Star Algorithm – Second Phase The method: replace(lp, T) Removes the loose path form T T is split into subgraphs T1 and T2 The shortest path connecting any node of T1 to any node of T2 is determined – replace (lp, T) calls findShortestPath(VT1, VT2, lp) – findShortestPath(VT1, VT2, lp), returns the shortest path

Steiner Tree Approximation - Phase 2 The overall Graph G

Steiner Tree Approximation - Phase 2

T2

Phase 2– shortest Path Algorithm All pruned vertices are needed Runs “One single source shortest path iterator from V(T1) and V(T2)” i.e. Find the shortest path from a source Vertex V to all other vertices in graphs.

Phase 2– shortest Path Algorithm Vertex distance d(v) initialization Assign TWO distances (d1, d2) to each vertex Assign d1 = 0 to all vertices of V(T1) Assign d2 = 0 to all vertices of V(T2) Assign d1= ∞ to all vertices of V(T2) Assign d2= ∞ to all vertices of V(T1) Assign d1= d2 = ∞ to all pruned or not queried vertices

Phase 2– shortest Path Algorithm T1 is considered a single node of distance 0 from itself and distance ∞ from T2 T2 accordingly Other nodes not members of T1 or T2 have infinite distances from both T1 or T2

Phase 2– shortest Path Algorithm ItrCurOthVV’ 12 Q1 (d1)Q2 (d2) Arn(0)Ger(0) Pol(0) Max(0) Phy(0) Sci(0) Per(0) Current: points to iterator of minimal fringe nodes And that is currently expanded

Phase 2– shortest Path Algorithm ItrCurOthVV’ Ger Q1 (d1)Q2 (d2) Arn(0) Pol(0) Max(0) Phy(0) Sci(0) Per(0) Fringe(Q2) < Fringe (Q1) Swap (current, Other) Dequeue Germany form Q2

Phase 2– shortest Path Algorithm ItrCurOthVV’ 12 21Ger 121 Sta Q1 (d1)Q2 (d2) Arn(0)Sta(0,95) Pol(0) Max(0) Phy(0) Sci(0) Per(0) d2(State) = 0 + 0,95 Enqueue(State) in Q2 0,95

Phase 2– shortest Path Algorithm ItrCurOthVV’ 12 21Ger 121 Sta 221GerAng Q1 (d1)Q2 (d2) Arn(0)Ang(0,96) Pol(0)Sta(0,95) Max(0) Phy(0) Sci(0) Per(0) d2(Angela Merkel) = 0 + 0,96 Enqueue Angela Merkel in Q2 0,95 0,96

Phase 2– shortest Path Algorithm ItrCurOthVV’ 12 21Ger 121 Sta 121GerAng 221 Q1 (d1)Q2 (d2) Arn(0)Sta(0,95) Pol(0) Max(0) Phy(0) Sci(0) Per(0) Dequeue Angela Merkel from Q2 0,95 0,96

Phase 2– shortest Path Algorithm ItrCurOthVV’ 12 21Ger 121 Sta 121GerAng 221 Phy Q1 (d1)Q2 (d2) Arn(0)Phy(1,91) Pol(0)Sta(0,95) Max(0) Phy(0) Sci(0) Per(0) d2(Physicist) = 0,96 + 0,95 Enqueue Physicist in Q2 0,95 0,96 0,95

Phase 2– shortest Path Algorithm ItrCurOthVV’ 12 21Ger 121 Sta 121GerAng 221 Phy 221AngPol Q1 (d1)Q2 (d2) Arn(0)Phy(1,91) Pol(0)Pol(1,91) Max(0)Sta(0,95) Phy(0) Sci(0) Per(0) d2(Politician) = 0,96 + 0,95 Enqueue Politician in Q2 0,95 0,96 0,95

Phase 2– shortest Path Algorithm ItrCurOthVV’ 12 21Ger 121 Sta 121GerAng 221 Phy 221AngPol 321Phy Q1 (d1)Q2 (d2) Arn(0)Pol(1,91) Pol(0)Sta(0,95) Max(0) Phy(0) Sci(0) Per(0) Dequeue Physicist from Q2 0,95 0,96 0,95

Phase 2– shortest Path Algorithm ItrCurOthVV’ 12 21Ger 121 Sta 121GerAng 221 Phy 221AngPol 321PhySci Q1 (d1)Q2 (d2) Arn(0)Sci (2,9) Pol(0)Pol(1,91) Max(0)Sta(0,95) Phy(0) Sci(0) Per(0) d2(Scientist) = 1,91 + 0,99=2,9 Enqueue Scintist in Q2 0,95 0,96 0,95 0,99

Phase 2– shortest Path Algorithm ItrCurOthVV’ 12 21Ger 121 Sta 121GerAng 221 Phy 221AngPol 321PhySci Q1 (d1)Q2 (d2) Arn(0)Sci (2,9) Pol(0)Pol(1,91) Max(0)Sta(0,95) Phy(0) Sci(0) Per(0) 0,95 0,96 0,95 0,99 Stop since Physicist ϵ V(T1)

Phase 2– shortest Path Algorithm ItrCurOthVV’ 12 21Ger 121 Sta 121GerAng 221 Phy 221AngPol 321PhySci Q1 (d1)Q2 (d2) Arn(0)Per(3,8) Pol(0)Pol(1,91) Max(0)Sta(0,95) Phy(0) Sci(0) Per(0) Return vertices in vector V : V = {Germany, Angela Merkel, Physicist } 0,95 0,96 0,95 0,99

Phase 2– shortest Path Algorithm First Iteration Result:

Phase 2– shortest Path Algorithm Second Iteration: Remove LP Apply Again the algorithm: To find Shortest Path between T1 and T2 Stop here Since no Loose Paths can be improved

Aproximation Guarantee Lemmas and Theorems Lemma 1 – A Tree T with terminal set V’, |V’| ≥ 2 has at least |V’| - 1 and at most 2|V’| - 3 loose paths. The approximation ratio for the cost of the tree returned by star is independent of the 1 st Phase result.

Aproximation Guarantee Lemmas and Theorems Lemma 2 – Let T A be the Steiner tree yielded by the STAR algorithm. Let L (T A ) be the set of loose paths in T A. For any circular ordering u 1, …, u N of the terminals in T A there is a mapping μ: L (T A )  V’ X V’, such that: 1.μ is defined for all loose paths in T A 2.For each loose path P with end points u and v, let T 1 and T 2 the two trees obtained by removing from T A all nodes in P (and their edges), except u and v; then μ(P) = {u i, u i+1 } for some i=1, …, N and one of the nodes u i, u i+1 belongs to T 1, while the other one belongs to T 2 ; 3.For each pair of terminals {u i, u i+1 } there are at most 2┌ logN ┐ +2 loose paths mapped to {u i, u i+1 }.

Aproximation Guarantee Lemmas and Theorems Theorem 1 (approximation order) – The STAR algorithm is a (4┌ logN┐+4 )-approximation algorithm for the Steiner Tree Problem. – Therefore:

Aproximation Guarantee Lemmas and Theorems Improvement Guarantee Rule – STAR might have exponential running time. – Infinitesimally small amount cost reduction at each iteration. – An Improvement Guarantee Rule solves this: – Replace loose path P if and only if: – Where P’ is the path to be replaced by STAR, given that є > 0

Aproximation Guarantee Lemmas and Theorems Lemma 3 (Time complexity ) – Given є > 0, the STAR algorithm with the improvement-guarantee rule is guaranteed to terminate in – steps – Where m is the number of edges – is the ratio of the maximum and minimum cost of the edges in the input graph.

Aproximation Guarantee Lemmas and Theorems Theorem 2 – Given є > 0, the STAR algorithm with the improvement-guarantee rule is a - approximation algorithm for the steiner tree problem. Its Running time is Where n, m, N denote the number of Vertices, edges and terminals of the input graph.

Approximate Top-K Interconnections Observing loose path weight is an upper bound for new interconnecting path weights No loose paths in the final tree T after improvements Top-K interconections are computed starting from the final tree T returned by original STAR

Approximate Top-K Interconnections Lines 1-3 compute the original tree T T is enqueued in priority queue Q New trees generated by artificially relaxing current tree lps (Lines 4-9)

Approximate Top-K Interconnections Relax(T, є ) – Tunable value є >0 used to artificially create loose path weights – New weights used as upper bounds. – Artificial Upper Bounds for New interconnecting paths between sub trees

Approximate Top-K Interconnections improveTree’(T’, V’) – Replace(lp, T) calls findShortestPath(V(T1), V (T2), lp) – findShortestPath(V(T1), V (T2), lp) uses higher artificiall weights – New interconnecting paths are not the same but still the shortest between T1, T2. – Node disjoint to loose path new interconnecting paths considered. – This gives us result diversity required for top-k Original algorithm

Approximate Top-K Interconnections reweight(T’) – Re-weights the result of improveTree(T, V’) by: – Acting on loose paths of T’ (also loose paths of T) – Setting back W(T’) to its initial value before relaxation.

Evaluation STAR Compared to most known Steiner Tree Approximation Algorithms: – DNH, DPBF, BLINKS, BANKS (both versions) Compared in terms of quality (avg. weight) and performance (avg. runtime) Semantic Quality or User perceived Relevance is not Considered An earlier work of them showed that: – A steiner tree based scoring function contribute to high relevance from a users view point

Evaluation - Algorithms in Comparison DNH (Distance Network Heuristic) – 2-approximation algorithm DPBF – Dynamic programming approach – Optimal Tree Can be computed (not an approximation) – Best on small number of terminals (Queries) BLINKS – Newest – Experimentally BEST in the field Banks I & II – Keyword proximity search on relational data

Evaluation Types of Comparisons Performed Top-1 comparison of STAR, DNH DPBF, BANKS I & II Top-k comparison of STAR, BANKS I & II, BLINKS External Storage Comparison of STAR and BANKS

Top-1 Comparisons (STAR, DNH, DPBF, BANKS I & II) Worst Case Theoretical properties of algorithms: – DNH, approximation ratio: 2(1- 1/n), n =|V’| Goal a good approximation ratio on given G, V’ – STAR, approximation ratio: 4logn + 4 – BANKS I & II approximation ratio: O(n) – DPBF, approximation ratio: Does n’t nave (Optimal Steiner Tree) Used for comparison of all others to optimal tree weights

Top-1 Comparisons (STAR, DNH, DPBF, BANKS I & II) Datasets – View DBLP and IMDB as Graphs Nodes  entities: (author, publication, conference, actor, movie, year, etc.) Edges  Relations: (cited by, author of, acted in, etc.). – Dataset DBLB: Sub graph of 15,000 nodes & 150,000 Edges. Due to DNH & DPBF constraints (perform on main memory only) – Dataset IMDB : Sub graph of 30,000 nodes & 80,000 Edges. – Two Different Datasets needed to tackle different Topologies – No edge weights present in both datasets -> randomly assigned – No taxonomic information present in both datasets (Not a problem for STAR tackled in 1 st Phase)

Top-1 Comparisons (STAR, DNH, DPBF, BANKS I & II) Queries – Query sets of 3, 5 and 7 – Each set of 60 queries – Same number of terminals only – Randomly acquired terminals

Top-1 Comparisons (STAR, DNH, DPBF, BANKS I & II) Metrics – Reference: Optimal Scores Returned by DPBF – Compare weight by STAR to weights by all others – Running times of all Algorithms comparison

Top-1 Comparisons (STAR, DNH, DPBF, BANKS I & II) Results – Observe DPBF performance for all #terminals Weight Runtime – Observe DPBF performance for 7 #terminals ????

Top-1 Comparisons (STAR, DNH, DPBF, BANKS I & II) DBLP Results – For all –Terminals STAR weight is better than all the others STAR runtime outperforms all others Even though DNH has a better Approximation Ratio

Top-1 Comparisons (STAR, DNH, DPBF, BANKS I & II) IMDB Results – STAR weight is slightly not better than DNH – A hypothesis is; DBLP Higher Edge-To-Node Ratio – Banks II performance improved relative to competitors ? – Still Outperformed by STAR

Top-k Comparisons (STAR, BANKS I & II, BLINKS) DNH can not compute Top-k results BLINKS – Uses indexing for Query time Speedup – Requires Entire Graph in Main Memory – Datasets are again used – Uses a partitioning strategy (Block Sizes of Nodes) – Initially Tuned for better results DBLB: 100 node Block Size IMDB: 5 node Block Size

Top-k Comparisons (STAR, BANKS I & II, BLINKS) Metrics – BLINKS avg. weight is not applicable Returns only Root nodes of result trees at output Queries – Comparison for k=10, k=50, k=100 – DBLP & IMDB: 5 terminals Random queries 60 queries

Top-k Comparisons (STAR, BANKS I & II, BLINKS) Results – Index construction Time by BLINKS excluded – BLINKS has the worst runtime though – BANKS II & BLINKS runtimes is worse on denser DBLP Graph

Top-k Comparisons (STAR, BANKS I & II, BLINKS)

External Storage Comparison of STAR and BANKS STAR & BANKS direct applicability to Graphs NOT FITED to main memory Simulation of such a scenario – Disk Resident Datasets Dataset: – YAGO Knowledge Base ( Nodes: 1.7 Milion, Edges: 14 Milion) Edge Weights supoted Graph Stored in a Relational Database of Schema: EDGE(source, target, weight) Type and Subclass taxonomy (STAR 1 st Phase) supported – Database Call overhead uniformly treated on STAR & BANKS

External Storage Comparison of STAR and BANKS STAR & BANKS direct applicability to Graphs NOT FITED to main memory Simulation of such a scenario: – Disk Resident Datasets Dataset: – YAGO Knowledge Base ( Nodes: 1.7 Milion, Edges: 14 Milion) Edge Weights supoted Type and Subclass taxonomy (STAR 1 st Phase) supported – Graph Stored in a Relational Database of Schema: EDGE(source, target, weight) – Edge Exploration: Database access for each edge – overhead uniformly treated on both STAR & BANKS by edge loading.

External Storage Comparison of STAR and BANKS Queries: – 2 sets, 3 and 6 Terminals – Top-1, Top-3, Top-6 results – Terminal nodes randomly chosen – 30 queries made Metrics: – Average Weight (quality of output Trees) – Efficiency (running times) – Number of edges accessed

External Storage Comparison of STAR and BANKS Results: – BANKS I & II, some times 30 min to return results Excluded from Evaluation – fair enough – STAR Outperforms: an order of magnitude faster – STAR accesses an order of magnitude fewer edges Gain from taxonomic structure (1 st Phase)

Results Summary Fairness by Giving all algorithms the same inputs Diversity of algorithms – DNH only handles graphs in main memory – BLINKS: Indexing, different metric, luck of approximation guarantee – Not Steiner-Tree-Like query methods STAR outstanding performance: – 1) Graph Taxonomic Structure when Possible – 2) Iterators needed per improvement step, Number of Terminal Independence – 3) Tight upper bounds and path pruning

Conclusion E-R Style data Graph Query Problem addressed Inherent Taxonomic Structure Exploited STAR Does not depend ONLY on Taxonomic Information – 2 nd Phase fast “findShortestPath” algorithm DNH Contradiction: – Better approximation rate while similar results as STAR STAR achieves a good approximation O(logn), to Optimal Steiner Tree

Thank You For Your Attention