Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chenghui Ren, Luyi Mo, Ben Kao, Reynold Cheng, David Cheung The University of Hong Kong {chren, lymo, kao, ckcheng,

Similar presentations


Presentation on theme: "Chenghui Ren, Luyi Mo, Ben Kao, Reynold Cheng, David Cheung The University of Hong Kong {chren, lymo, kao, ckcheng,"— Presentation transcript:

1 Chenghui Ren, Luyi Mo, Ben Kao, Reynold Cheng, David Cheung The University of Hong Kong {chren, lymo, kao, ckcheng,

2 2 Social networks Web

3 3 Personalized PageRank Random Walk with Restart Discounted Hitting Time SALSA PageRank Measures of the importance of nodes Measures of the proximities between nodes

4 4 A common property: Computing them requires solving linear systems PR SALSA PPR DHT RWR # of nodes in the graph: n A: n x n matrix, captures the graph structure b: vector of size n, depends on the measures computed, input query vector x: vector of size n, gives the measures of the nodes in the graph

5 5 u With a probability d, transit to a neighboring node With a probability (1-d), transit to the starting node x (v) steady-state probability that we are at node v A: derived from the graph b: RWR with starting node 1 x: RWR scores

6 6 Evolving Graph Sequence (EGS) [VLDB’11] Time … measure  measure  measure  measure  … Information modeled by graph changes over time.

7 7 Wikipedia, 20,000 Wiki pages, 1000 daily snapshots Key moments: PR score changes significantly

8 8 Evolving Matrix Sequence (EMS) Objective: efficiently compute various measures over an EMS

9 9 many b’s RWR score between any two nodes  n b’s many A’s Each matrix in the EMS 1 year daily snapshots  365 A’s LU decomposition

10 10 Solving LUx 1 = b 1 Solving LUx 2 = b 2 Solving LUx q = b q Much faster than LU Forward & backward substitutions LU factors

11 11 #fill-ins: 8 (fill-in: An entry that is 0 in A but becomes non-zero in L and U)  More fill-ins will cause:  More space to store (L, U)  More time to do forward/backward substitutions in solving LUx = b  More time to do LU decomposition

12 12 #fill-ins: 8 (fill-in: An entry that is 0 in A but becomes non-zero in L and U) #fill-ins: 1

13 13  Finding the optimal ordering to minimize #fill-ins is NP-complete  Effective heuristic reordering strategies  Markowitz  AMD  Degree  … Most effective

14 14 LU decompositionLU decomposition for all A’s many b’s RWR score between any two nodes  n b’s many A’s Each matrix in the EMS 1 year daily snapshots  365 A’s Reordering + Reordering for all A’s +

15 15 How many orderings should be computed? T orderings? 1 ordering? Others? The EMS gradually evolves over time: successive graphs in Wiki share 99% of edges Can we apply incremental methods?

16 16 best ordering quality but slow Markowitz orderings

17 17 Bennett‘s Incremental LUDE [1965’] bad ordering!

18 18 Cluster 1Cluster M Tradeoff between good ordering and fast incremental LUDE

19 19 1. Structure allocation to store LU factors 2. Numerical computation Zooming in 70% Adjacency-lists structures Bennett’s incremental LU

20 20 Universal Static Structure (Able to accommodate non-zero entries of LU factors of all matrices in a cluster) Cluster

21 21 Universal Static Structure (Able to accommodate non-zero entries of LU factors of all matrices in a cluster) Cluster

22 22 Cluster 1Cluster M No structural change overhead, better ordering quality with static structure

23 23  Datasets  Two real datasets (which derive two EMS’s) ▪ Wiki (pages and their hyperlinks) default ▪ DBLP (authors and their co-authorships)  Synthetic EMSs  Settings  Java, Linux, CPU: 3.4GHz Octo- Core, Memory: 16G Dataset#snapshots|V||E 1 ||E last | Wiki100020,00056,181138,072 DBLP100097,931387,960547,164

24 24  Ordering quality  Quality-loss of an ordering O of A:  Efficiency  Speedup over BF’s execution time O * : Markowitz ordering of A # of extra fill-ins

25 25 INC applies Markowitz ordering of A1 to all matrices in the whole EMS Snapshot number Snapshot #

26 26 CINC applies Markowitz ordering of A1 to all matrices in the cluster CLUDE applies Markowitz ordering of A U to all matrices in the cluster

27 27 Reasons of the big gap between CLUDE and CINC: (1) CLUDE gives better ordering quality (2) CLUDE uses static data structures for storing the matrices’ LU factors

28 28 General observation: CLUDE gives the best ordering quality, at the same time is much faster than INC and CINC

29 29  EGS processing  Computation of shortest path distance between two nodes across a graph sequence  Computation of various measures (PR/SALSA/PPR/DHT/RWR) on single graphs  Approximation methods (power iteration, Monte Carlo) ▪ Two order of magnitude faster if A is decomposed  Sparse matrix decomposition  Maintaining measures incrementally  Approximation methods ▪ An order of magnitude faster  Graph streams  How to detect sub-graphs that change rapidly over small window of the stream  Graphs that arrive in the stream are not archived

30 30  We studied the LUDEM problem  Interesting structural analyses on a sequence of evolving graphs can be carried out efficiently  We designed CLUDE for the LUDEM problem based on matrix ordering and incremental LU decomposition  CLUDE outperformed others in terms of both ordering quality and speed

31 31 Thank you! Contact Info: Luyi Mo University of Hong Kong

32 32 LU decompositionLU decomposition for all A’s many b’smany A’s BF: T orderings (1 ordering for 1 matrix) best ordering, slow INC: 1 ordering (for all matrices) bad ordering, slow CINC: cluster-based good ordering, fast CLUDE: cluster-based, static structure good ordering, fastest

33 33 Translating the web page Publicizing the web site through newsletters Providing a rich site summary … How to evaluate the effectiveness of these actions? Actions taken Changes to PR score Google official guide

34 34 Segmentation clustering algorithm:  A cluster consists of successive snapshots  A cluster satisfies: EMS

35 35  Distributed algorithms  Key moment detection  Key moment of a measure over an EGS: the moment at which the measure score changes dramatically

36 36 It can be easily computed for symmetric matrices

37 37 Key: Control the size of the cluster  The smaller the cluster is, the higher the chance the CINC or CLUDE satisfy the quality constraint  Beta-clustering algorithms are thus proposed

38 38

39 39 In 1992, IBM and HARRIS announced their alliance to share technology HARRIS’s stock price hit a closing high shortly after the announcement


Download ppt "Chenghui Ren, Luyi Mo, Ben Kao, Reynold Cheng, David Cheung The University of Hong Kong {chren, lymo, kao, ckcheng,"

Similar presentations


Ads by Google