Download presentation

Presentation is loading. Please wait.

Published byGerman Perryman Modified over 2 years ago

1
Chenghui Ren, Luyi Mo, Ben Kao, Reynold Cheng, David Cheung The University of Hong Kong {chren, lymo, kao, ckcheng,

2
2 Social networks Web

3
3 Personalized PageRank Random Walk with Restart Discounted Hitting Time SALSA PageRank Measures of the importance of nodes Measures of the proximities between nodes

4
4 A common property: Computing them requires solving linear systems PR SALSA PPR DHT RWR # of nodes in the graph: n A: n x n matrix, captures the graph structure b: vector of size n, depends on the measures computed, input query vector x: vector of size n, gives the measures of the nodes in the graph

5
5 u With a probability d, transit to a neighboring node With a probability (1-d), transit to the starting node x (v) steady-state probability that we are at node v A: derived from the graph b: RWR with starting node 1 x: RWR scores

6
6 Evolving Graph Sequence (EGS) [VLDB’11] Time … measure measure measure measure … Information modeled by graph changes over time.

7
7 Wikipedia, 20,000 Wiki pages, 1000 daily snapshots Key moments: PR score changes significantly

8
8 Evolving Matrix Sequence (EMS) Objective: efficiently compute various measures over an EMS

9
9 many b’s RWR score between any two nodes n b’s many A’s Each matrix in the EMS 1 year daily snapshots 365 A’s LU decomposition

10
10 Solving LUx 1 = b 1 Solving LUx 2 = b 2 Solving LUx q = b q Much faster than LU Forward & backward substitutions LU factors

11
11 #fill-ins: 8 (fill-in: An entry that is 0 in A but becomes non-zero in L and U) More fill-ins will cause: More space to store (L, U) More time to do forward/backward substitutions in solving LUx = b More time to do LU decomposition

12
12 #fill-ins: 8 (fill-in: An entry that is 0 in A but becomes non-zero in L and U) #fill-ins: 1

13
13 Finding the optimal ordering to minimize #fill-ins is NP-complete Effective heuristic reordering strategies Markowitz AMD Degree … Most effective

14
14 LU decompositionLU decomposition for all A’s many b’s RWR score between any two nodes n b’s many A’s Each matrix in the EMS 1 year daily snapshots 365 A’s Reordering + Reordering for all A’s +

15
15 How many orderings should be computed? T orderings? 1 ordering? Others? The EMS gradually evolves over time: successive graphs in Wiki share 99% of edges Can we apply incremental methods?

16
16 best ordering quality but slow Markowitz orderings

17
17 Bennett‘s Incremental LUDE [1965’] bad ordering!

18
18 Cluster 1Cluster M Tradeoff between good ordering and fast incremental LUDE

19
19 1. Structure allocation to store LU factors 2. Numerical computation Zooming in 70% Adjacency-lists structures Bennett’s incremental LU

20
20 Universal Static Structure (Able to accommodate non-zero entries of LU factors of all matrices in a cluster) Cluster

21
21 Universal Static Structure (Able to accommodate non-zero entries of LU factors of all matrices in a cluster) Cluster

22
22 Cluster 1Cluster M No structural change overhead, better ordering quality with static structure

23
23 Datasets Two real datasets (which derive two EMS’s) ▪ Wiki (pages and their hyperlinks) default ▪ DBLP (authors and their co-authorships) Synthetic EMSs Settings Java, Linux, CPU: 3.4GHz Octo- Core, Memory: 16G Dataset#snapshots|V||E 1 ||E last | Wiki100020,00056,181138,072 DBLP100097,931387,960547,164

24
24 Ordering quality Quality-loss of an ordering O of A: Efficiency Speedup over BF’s execution time O * : Markowitz ordering of A # of extra fill-ins

25
25 INC applies Markowitz ordering of A1 to all matrices in the whole EMS Snapshot number Snapshot #

26
26 CINC applies Markowitz ordering of A1 to all matrices in the cluster CLUDE applies Markowitz ordering of A U to all matrices in the cluster

27
27 Reasons of the big gap between CLUDE and CINC: (1) CLUDE gives better ordering quality (2) CLUDE uses static data structures for storing the matrices’ LU factors

28
28 General observation: CLUDE gives the best ordering quality, at the same time is much faster than INC and CINC

29
29 EGS processing Computation of shortest path distance between two nodes across a graph sequence Computation of various measures (PR/SALSA/PPR/DHT/RWR) on single graphs Approximation methods (power iteration, Monte Carlo) ▪ Two order of magnitude faster if A is decomposed Sparse matrix decomposition Maintaining measures incrementally Approximation methods ▪ An order of magnitude faster Graph streams How to detect sub-graphs that change rapidly over small window of the stream Graphs that arrive in the stream are not archived

30
30 We studied the LUDEM problem Interesting structural analyses on a sequence of evolving graphs can be carried out efficiently We designed CLUDE for the LUDEM problem based on matrix ordering and incremental LU decomposition CLUDE outperformed others in terms of both ordering quality and speed

31
31 Thank you! Contact Info: Luyi Mo University of Hong Kong

32
32 LU decompositionLU decomposition for all A’s many b’smany A’s BF: T orderings (1 ordering for 1 matrix) best ordering, slow INC: 1 ordering (for all matrices) bad ordering, slow CINC: cluster-based good ordering, fast CLUDE: cluster-based, static structure good ordering, fastest

33
33 Translating the web page Publicizing the web site through newsletters Providing a rich site summary … How to evaluate the effectiveness of these actions? Actions taken Changes to PR score Google official guide

34
34 Segmentation clustering algorithm: A cluster consists of successive snapshots A cluster satisfies: EMS

35
35 Distributed algorithms Key moment detection Key moment of a measure over an EGS: the moment at which the measure score changes dramatically

36
36 It can be easily computed for symmetric matrices

37
37 Key: Control the size of the cluster The smaller the cluster is, the higher the chance the CINC or CLUDE satisfy the quality constraint Beta-clustering algorithms are thus proposed

38
38

39
39 In 1992, IBM and HARRIS announced their alliance to share technology HARRIS’s stock price hit a closing high shortly after the announcement

Similar presentations

© 2016 SlidePlayer.com Inc.

All rights reserved.

Ads by Google