1 Weiren Yu 1,2, Xuemin Lin 1, Wenjie Zhang 1 1 University of New South Wales 2 NICTA, Australia Towards Efficient SimRank Computation over Large Networks.

Slides:



Advertisements
Similar presentations
Towards Data Mining Without Information on Knowledge Structure
Advertisements

TWO STEP EQUATIONS 1. SOLVE FOR X 2. DO THE ADDITION STEP FIRST
You have been given a mission and a code. Use the code to complete the mission and you will save the world from obliteration…
Advanced Piloting Cruise Plot.
Greening Backbone Networks Shutting Off Cables in Bundled Links Will Fisher, Martin Suchara, and Jennifer Rexford Princeton University.
Chapter 1 The Study of Body Function Image PowerPoint
Copyright © 2011, Elsevier Inc. All rights reserved. Chapter 5 Author: Julia Richards and R. Scott Hawley.
1 Copyright © 2010, Elsevier Inc. All rights Reserved Fig 2.1 Chapter 2.
By D. Fisher Geometric Transformations. Reflection, Rotation, or Translation 1.
Chapter 1 Image Slides Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Business Transaction Management Software for Application Coordination 1 Business Processes and Coordination.
and 6.855J Cycle Canceling Algorithm. 2 A minimum cost flow problem , $4 20, $1 20, $2 25, $2 25, $5 20, $6 30, $
Summary of Convergence Tests for Series and Solved Problems
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Title Subtitle.
My Alphabet Book abcdefghijklm nopqrstuvwxyz.
0 - 0.
DIVIDING INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
MULT. INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.
Addition Facts
Year 6 mental test 5 second questions
C1 Sequences and series. Write down the first 4 terms of the sequence u n+1 =u n +6, u 1 =6 6, 12, 18, 24.
ZMQS ZMQS
Reductions Complexity ©D.Moshkovitz.
Ken C. K. Lee, Baihua Zheng, Huajing Li, Wang-Chien Lee VLDB 07 Approaching the Skyline in Z Order 1.
BT Wholesale October Creating your own telephone network WHOLESALE CALLS LINE ASSOCIATED.
Multi-Guarded Safe Zone: An Effective Technique to Monitor Moving Circular Range Queries Presented By: Muhammad Aamir Cheema 1 Joint work with Ljiljana.
1 Column Generation. 2 Outline trim loss problem different formulations column generation the trim loss problem master problem and subproblem in column.
Chapter 18 Methodology – Monitoring and Tuning the Operational System Transparencies © Pearson Education Limited 1995, 2005.
Randomized Algorithms Randomized Algorithms CS648 1.
ABC Technology Project
Gate Sizing for Cell Library Based Designs Shiyan Hu*, Mahesh Ketkar**, Jiang Hu* *Dept of ECE, Texas A&M University **Intel Corporation.
Online Algorithm Huaping Wang Apr.21
VOORBLAD.
1 Breadth First Search s s Undiscovered Discovered Finished Queue: s Top of queue 2 1 Shortest path from s.
Squares and Square Root WALK. Solve each problem REVIEW:
Computer Science and Engineering Diversified Spatial Keyword Search On Road Networks Chengyuan Zhang 1,Ying Zhang 2,1,Wenjie Zhang 1, Xuemin Lin 3,1, Muhammad.
Do you have the Maths Factor?. Maths Can you beat this term’s Maths Challenge?
RecMax – Can we combine the power of Social Networks and Recommender Systems? Amit Goyal and L. RecMax: Exploting Recommender Systems for Fun and Profit.
YO-YO Leader Election Lijie Wang
Weiren Yu 1, Jiajin Le 2, Xuemin Lin 1, Wenjie Zhang 1 On the Efficiency of Estimating Penetrating Rank on Large Graphs 1 University of New South Wales.
Weiren Yu 1, Xuemin Lin 1, Wenjie Zhang 1, Ying Zhang 1 Jiajin Le 2, SimFusion+: Extending SimFusion Towards Efficient Estimation on Large and Dynamic.
© 2012 National Heart Foundation of Australia. Slide 2.
The x- and y-Intercepts
Lets play bingo!!. Calculate: MEAN Calculate: MEDIAN
Chapter 5 Test Review Sections 5-1 through 5-4.
SIMOCODE-DP Software.
GG Consulting, LLC I-SUITE. Source: TEA SHARS Frequently asked questions 2.
Addition 1’s to 20.
Model and Relationships 6 M 1 M M M M M M M M M M M M M M M M
25 seconds left…...
Exponential and Logarithmic Functions
Complexity ©D.Moshkovits 1 Where Can We Draw The Line? On the Hardness of Satisfiability Problems.
Week 1.
We will resume in: 25 Minutes.
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
A SMALL TRUTH TO MAKE LIFE 100%
1 Unit 1 Kinematics Chapter 1 Day
PSSA Preparation.
Choosing an Order for Joins
1/22 Worst and Best-Case Coverage in Sensor Networks Seapahn Meguerdichian, Farinaz Koushanfar, Miodrag Potkonjak, and Mani Srivastava IEEE TRANSACTIONS.
How Cells Obtain Energy from Food
A Non-Blocking Join Achieving Higher Early Result Rate with Statistical Guarantees Shimin Chen* Phillip B. Gibbons* Suman Nath + *Intel Labs Pittsburgh.
Amit Goyal Laks V. S. Lakshmanan RecMax: Exploiting Recommender Systems for Fun and Profit University of British Columbia
PRSim: Sublinear Time SimRank Computation on Large Power-Law Graphs.
Presentation transcript:

1 Weiren Yu 1,2, Xuemin Lin 1, Wenjie Zhang 1 1 University of New South Wales 2 NICTA, Australia Towards Efficient SimRank Computation over Large Networks

SimRank overview Existing computing method Our approaches Partial Sums Sharing Exponential SimRank Empirical evaluations Conclusions 2Outline

SimRank Overview Similarity Computation plays a vital role in our lives. 3 Citation GraphCollaboration Network Recommender System

SimRank Overview SimRank An appealing link-based similarity measure (KDD ’02) Basic philosophy Two vertices are similar if they are referenced by similar vertices. Two Forms Original form (KDD ’02) Matrix form (EDBT ’10) 4 damping factor in-neighbor set of node b similarity btw. nodes a and b

State-of-the-art Method (VLDB J. ’10) Partial Sum Memoization Two Limitations High computation time: O(Kdn 2 ) Low convergence rate: 5 Existing Computing Methods memoize reuse

6 Naïve vs. VLDB J.’10 Example: Compute s(a,b), s(a,d) bg g e f i Naïve Duplicate Computations bg e f i a

7 Naïve vs. VLDB J.’10 Example: Compute s(a,b), s(a,d) bg g e f i Partial Sums Memoization (VLDB J.’10) Partial Sum Memoization a reuse

8 VLDB J.’10 Limitations Example: Compute s(a,b), s(a,d), s(c,b), s(c,d) Partial Sums Duplicate Redundancy bg g e f i a bg g e f i a d

Prior Work (VLDB J. ’10) High time complexity: O(Kdn 2 ) Duplicate computation among partial sums memoization Slow (geometric) convergence rate Require K = ⌈ log C ϵ ⌉ iterations to guarantee accuracy ϵ Our Contributions Propose an adaptive clustering strategy to reduce the time from O(Kdn 2 ) to O(Kd’n 2 ), where d’≤ d. Introduce a new notion of SimRank to accelerate convergence from geometric to exponential rate. 9Motivation

Main idea share common sub-summations among partial sums Example 10 Eliminating Partial Sums Redundancy NP-hard to find “best” partition for fine-grained partial sums Duplicate Computation !! (3 additions) Existing Approach (only 2 additions) Our Approach Partition

Find:partitions for each in-neighbor set Minimize:the total cost of all partial sums Main Idea Calculate transition cost btw. in-neighbor sets Construct a weighted digraph s.t. Find a MST of to minimize the total transition cost 11 Heuristic for Finding Partitions

12Example (1) (2) (3) (4)

13Example (1) d b g f i e a (2)

(Inner) partial sums sharing Outer partial sums sharing 14 Outer Partial Sums Sharing

Existing Approach (VLDB J. ’10) For, to guarantee the accuracy, there are Our Approach For,, we need only 7 iterations. 15 Exponential SimRank Geometric Rate Exponential Rate

Key Observation The effect of C i is to reduce the contribution of long paths relative to short ones Main Idea Accelerate convergence by replacing the geometric sum of SimRank with the exponential sum. 16 Exponential SimRank. Normalized Factor Exponential Sum Geometric Sum Differential Equation Initial Condition

Recursive form for Differential SimRank Conventional computing method Euler iteration: Set, Disadvantage: Hard to determine the value of h. Our approach 17 SimRank Differential Equation

Accuracy Guarantee #-Iterations Use Lambert W function Use Log function (for ) Example (, ) 18 SimRank Differential Equation

Geometric vs. Exponential SimRank 19Comparison Closed form Series form Iterative form Accuracy (Geometric) SimRankExponential SimRank

Datasets Real graph: BERKSTAN, PATENT, DBLP (D02, D05, D08, D11) Synthetic data: SYN100K (via GTGraph generator) Compared Algorithms OIP-DSR: Differential SimRank + Partial Sums Sharing OIP-SR: Conventional SimRank + Partial Sums Sharing psum-SR [VLDB J. ’10]: Without Partial Sums Sharing mtx-SR [EDBT ’10]: Matrix-based SimRank via SVD Evaluations Efficiency: CPU time, memory space, convergence rate Effectiveness: relative order preservation of OIP-DSR 20 Experimental Settings

21 Time Efficiency

22 Space & Convergence Rate

23 Relative Order Preservation

Two efficient methods are proposed to speed up the computation of SimRank on large graphs. A novel clustering approach to eliminate duplicate computations among the partial summations. A differential SimRank model for achieving an exponential convergence rate. Empirical evaluations to show the superiority of our methods by one order of magnitude. 24Conclusions

25 Thank you! Q/A