# Finding Skyline Nodes in Large Networks. Evaluation Metrics:  Distance from the query node. (John)  Coverage of the Query Topics. (Big Data, Cloud Computing,

## Presentation on theme: "Finding Skyline Nodes in Large Networks. Evaluation Metrics:  Distance from the query node. (John)  Coverage of the Query Topics. (Big Data, Cloud Computing,"— Presentation transcript:

Finding Skyline Nodes in Large Networks

Evaluation Metrics:  Distance from the query node. (John)  Coverage of the Query Topics. (Big Data, Cloud Computing, Map Reduce) Motivation Finding Skyline Nodes in Large Networks 2

Homogeneous Approach ? Finding Skyline Nodes in Large Networks 3 Score = λ. Distance + (1- λ ). Coverage How to get λ ?

Weighted Set Cover ? Finding Skyline Nodes in Large Networks 4  Find nodes with smallest aggregate distance from the query node, such that they cover all query topics.  Ignore some interesting nodes.  Cannot rank the results. abc abcacd abcde Q = { a, b, c } u1u1 u2u2 u3u3 u4u4 u5u5 u6u6 u7u7 u8u8 u 0 = q

Graph Skyline Finding Skyline Nodes in Large Networks 5  Dominance on Coverage: u > c v Query topics covered by node u is a superset of the query topics covered by node v.  Dominance on Distance: u > d v Distance of u from q is less than that of v from q.  Dominance: u > v (1) u > c v and u ≥ d v ; or (2) u ≥ c v and u > d v. abc abcacd abcde Q = { a, b, c } u1u1 u2u2 u3u3 u4u4 u5u5 u6u6 u7u7 u8u8 u 0 = q

Ranking of Skyline Nodes Finding Skyline Nodes in Large Networks 6  Too many skyline nodes.  Rank them. abc abcacd abcde Q = { a, b, c } u1u1 u2u2 u3u3 u4u4 u5u5 u6u6 u7u7 u8u8 u 0 = q  Dominance Count: # nodes dominated by a skyline node. [Lin et. al., ICDE ‘07]  Higher Dominance Count => more pruning from candidate set.  1. DC(u 4 ) = {u 5, u 6, u 7 }, 2. DC(u 1 ) = {u 5 } 3. DC(u 2 ) = Φ; 4. DC(u 3 ) = Φ

Algorithm Finding Skyline Nodes in Large Networks 7  Construct a Query DAG.  Three variables associated with each DAG node: Count (C), Dominance (D), Traversal (T). abc abcacd abcde Q = { a, b, c } u1u1 u2u2 u3u3 u4u4 u5u5 u6u6 u7u7 u8u8 u 0 = q abc abacbc a bc Input NetworkQuery DAG  Naïve Complexity: O(n2 r )  Complexity with Preprocessing: O(nr 2 ) C = 0 D = - T = - C = 2 D = - T = - C = 0 D = - T = - C = 2 D = - T = - C = 0 D = - T = - C = 1 D = - T = - C = 2 D = - T = -

Query DAG Construction Finding Skyline Nodes in Large Networks 8 abc abcacd abcde Q = { a, b, c } u1u1 u2u2 u3u3 u4u4 u5u5 u6u6 u7u7 u8u8 u 0 = q ab a b c u4u4 u7u7 u1u1 u5u5 u2u2 u3u3 u4u4 u6u6 u7u7

Query DAG Construction (cont.) Finding Skyline Nodes in Large Networks 9 abc abcacd abcde Q = { a, b, c } u1u1 u2u2 u3u3 u4u4 u5u5 u6u6 u7u7 u8u8 u 0 = q ab a b c u1u1 u5u5 u2u2 u3u3 u4u4 u6u6 u7u7 abc

Query DAG Construction (cont.) Finding Skyline Nodes in Large Networks 10 abc abcacd abcde Q = { a, b, c } u1u1 u2u2 u3u3 u4u4 u5u5 u6u6 u7u7 u8u8 u 0 = q ab ab c u1u1 u5u5 u2u2 u3u3 u4u4 u6u6 u7u7 abc ac bc

Find Dominance Variable Finding Skyline Nodes in Large Networks 11  Perform a topological ordering of the DAG nodes to evaluate the Dominance variable (D) of each DAG node.  # Nodes dominated (or equal) by coverage. abc abcacd abcde Q = { a, b, c } u1u1 u2u2 u3u3 u4u4 u5u5 u6u6 u7u7 u8u8 u 0 = q abc abacbc a bc Input NetworkQuery DAG  Naïve Complexity: O(n2 r )  Complexity by Topological Ordering: O(3 r ) C = 0 D = 3 T = - C = 2 D = 2 T = - C = 0 D = 4 T = - C = 2 D = 7 T = - C = 0 D = 3 T = - C = 1 D = 1 T = - C = 2 D = 2 T = -

Find Traversal Variable Finding Skyline Nodes in Large Networks 12  Perform a Breadth First Search (BFS) starting from the query node.  # Nodes not dominated by distance. abc abcacd abcde Q = { a, b, c } u1u1 u2u2 u3u3 u4u4 u5u5 u6u6 u7u7 u8u8 u 0 = q abc abacbc a bc Input NetworkQuery DAG  Complexity by BFS: O(n+e) C = 0 D = 3 T = 0 C = 2 D = 2 T = 2 C = 0 D = 4 T = 0 C = 2 D = 7 T = 1 C = 0 D = 3 T = 0 C = 1 D = 1 T = 1 C = 2 D = 2 T = 2 h =2

Find Skyline Nodes Finding Skyline Nodes in Large Networks 13  Store DAG nodes into a Lookup Table. Skyline Bit for each DAG node.  Helps to prune non-skyline nodes directly. abc abcacd abcde Q = { a, b, c } u1u1 u2u2 u3u3 u4u4 u5u5 u6u6 u7u7 u8u8 u 0 = q abc abacbc a bc Input Network Query DAG h =1 abc0 ab0 ac0 bc0 a1 b1 c1 Lookup Table abc

Find Skyline Nodes (cont.) Finding Skyline Nodes in Large Networks 14 abc abcacd abcde Q = { a, b, c } u1u1 u2u2 u3u3 u4u4 u5u5 u6u6 u7u7 u8u8 u 0 = q abc abacbc a bc Input Network Query DAG h =2 abc1 ab1 ac1 bc1 a1 b1 c1 Lookup Table  Store DAG nodes into a Lookup Table. Skyline Bit for each DAG node.  Helps to prune non-skyline nodes directly.

Dominance Count of Skyline Nodes Finding Skyline Nodes in Large Networks 15 abc abcacd abcde Q = { a, b, c } u1u1 u2u2 u3u3 u4u4 u5u5 u6u6 u7u7 u8u8 u 0 = q abc abacbc a bc Input Network Query DAG h =2 abc1 ab1 ac1 bc1 a1 b1 c1 Lookup Table C = 0 D = 3 T = 0 C = 2 D = 2 T = 1 C = 0 D = 4 T = 0 C = 2 D = 7 T = 0 C = 0 D = 3 T = 0 C = 1 D = 1 T = 1 C = 2 D = 2 T = 1  DC(u 4 ) = D(abc)-T(abc)-T(ab)-T(ac)-T(bc)-T(a)-T(b)-T(c)-1 = 3  Top-k Buffer to store top-k skyline nodes.

Pruning and Early Termination Finding Skyline Nodes in Large Networks 16  DC(u 4 ) = D(abc)-T(abc)-T(ab)-T(ac)-T(bc)-T(a)-T(b)-T(c)-1 = 3  Top-k Buffer to store top-k skyline nodes.

Experimental Results Finding Skyline Nodes in Large Networks 17  DC(u 4 ) = D(abc)-T(abc)-T(ab)-T(ac)-T(bc)-T(a)-T(b)-T(c)-1 = 3  Top-k Buffer to store top-k skyline nodes.

Efficiency Finding Skyline Nodes in Large Networks 18  DC(u 4 ) = D(abc)-T(abc)-T(ab)-T(ac)-T(bc)-T(a)-T(b)-T(c)-1 = 3  Top-k Buffer to store top-k skyline nodes.

Conclusion and Future Works Finding Skyline Nodes in Large Networks 19  DC(u 4 ) = D(abc)-T(abc)-T(ab)-T(ac)-T(bc)-T(a)-T(b)-T(c)-1 = 3  Top-k Buffer to store top-k skyline nodes.  Efficient Algorithm to find top-k skyline nodes in large attributed network.  Required experimental evaluation in real and synthetic datasets.  Time Complexity is linear in the number of nodes and edges in the network. Distance based indexing might improve the efficiency.  Top-k Skyline set instead of Top-k Skyline nodes might be more effective.

Questions Finding Skyline Nodes in Large Networks 20  DC(u 4 ) = D(abc)-T(abc)-T(ab)-T(ac)-T(bc)-T(a)-T(b)-T(c)-1 = 3  Top-k Buffer to store top-k skyline nodes. Thank You ! ! !

Download ppt "Finding Skyline Nodes in Large Networks. Evaluation Metrics:  Distance from the query node. (John)  Coverage of the Query Topics. (Big Data, Cloud Computing,"

Similar presentations