Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sparsified Matrix Algorithms for Graph Laplacians Richard Peng Georgia Tech.

Similar presentations


Presentation on theme: "Sparsified Matrix Algorithms for Graph Laplacians Richard Peng Georgia Tech."— Presentation transcript:

1 Sparsified Matrix Algorithms for Graph Laplacians Richard Peng Georgia Tech

2 OUTLINE (Structured) Linear Systems Iterative and Direct Methods (Graph) Sparsification Sparsified Squaring Speeding up Gaussian Elimination

3 GRAPH LAPLACIANS 1 1 2 -1 -1 -1 1 0 -1 0 1 Matrices that correspond to undirected graphs Coordinates  vertices Non-zeros  edges This talk: weighted, undirected graphs, and symmetric PSD matrices

4 THIS TALK Provably efficient algorithms for graph Laplacians, with focus on solving linear systems Why linear systems? Primitive in many graph algorithms Simplest convex optimization problem Algorithms on them often generalize

5 THE LAPLACIAN PARADIGM Directly related : Elliptic systems Few iterations : Eigenvectors, Heat kernels Many iterations / modify algorithm Graph problems Image processing

6 SCIENTIFIC COMPUTING Reducible to SDD systems, M-matrices [BHV `04, DS `07] PDEs, trusses [CFMNPW`14]: Helmholtz on meshes

7 DATA ANALYSIS [ZGL `03][ZHS `05][CCLPT `15]: inference / sampling on graphical models [KMST `09, CMMP `13]: image segmentation / denoising

8 [Tutte `62] Planar graph embeddings in 2 solves [KMP `09][MST `15] random spanning trees, Õ(m 4/3 ) [DS `08, LS`13] mincost / lossy flows, Õ(mn 1/2 ) GRAPHS Õ hides factors of log c n

9 [CKMST `11] [Sherman `13][KLOS `13][P `16]: approx. undirected maxflow, Õ(m 4/3 )  Õ(m 1+ε )  Õ(m) [OSV `12]: balanced cuts, heat kernel walks, Õ(m) [Madry `13]: bipartite matching in Õ(m 10/7 ) [CMSV `16]: mincost matching and negative length shortest paths in Õ(m 10/7 ) GRAPHS, FASTER

10 WHY WORST CASE ANALYSIS? The Laplacian paradigm of designing graph algorithms Optimization Problem Linear System Solver Sequence of (adpatively) generated linear systems Main difficulties: Widely varying weights Multi-scale behavior

11 INSTANCE: ISOTONIC REGRESSION [Kyng-Rao-Sachdeva `15]: https://github.com/sachdevasushant/Isotonic /blob/master/README.mdhttps://github.com/sachdevasushant/Isotonic /blob/master/README.md : …we suggest rerunning the program a few times and/or using a different solver. An alternate solver based on incomplete Cholesky factorization is provided with the code. Numbers thanks to Kevin Deweese (UCSB)

12 OUTLINE (Structured) Linear Systems Iterative and Direct Methods (Graph) Sparsification Sparsified Squaring Speeding up Gaussian Elimination

13 LINEAR SYSTEM SOLVERS [~0] Gaussian Elimination: O(n 3 ) [Strassen `69] O(n 2.8 ) [Coppersmith-Winograd `90] O(n 2.3755 ) [Stothers `10] O(n 2.3737 ) [Vassilevska Williams`11] O(n 2.3727 ) [Hestenes-Stiefel `52] Conjugate gradient: O(nm) (?)

14 APPROACHES DirectIterative Unit stepModifying entryMatrix-vector multiply Main goalSimplify systemExplored rank space Cost per stepO(1)O(m) #StepsO(n 2.3727 )O(n) TotalO(n 2.3727 )O(nm) Performances comparable on medium sized instances: m = 10 5 takes ~ 1 second

15 EXTREME INSTANCES Highly connected, need global steps Long paths / tree, need many steps Solvers must handle both simultaneously Each easy on their own: Iterative methodDirect Method

16 SIMPLIFICATION Adjust/rescale so diagonal = I Add to diagonal to make full rank L = I – A A: Random walk

17 ITERATIVE METHODS Division with multiplication: (1 – a) -1 = 1 + a + a 2 + a 3 … Spectral theorem: this works for symmetric PSD matrices matrices well-approximated by their diagonal blocks are easy to solve If |a| ≤ ρ, κ = (1-ρ) -1 terms give good approximation to (1 – a) -1 Matrix version: L -1 = I + A + A 2 + A 3 +…

18 LOWER BOUND FOR ITERATIVE METHODS Exists G (e.g. cycle) that require Ω(n) steps Graph theoretic interpretation: each term = 1 step walk A diameter b bAbA2bA2b Closely related to Smoothness 1/2 lower bound for # of gradient steps

19 ( I – A ) -1 = I + A + A 2 + A 3 + …. = ( I + A ) ( I + A 2 ) ( I + A 4 )… DEGREE N  N OPERATIONS? Combinatorial view: A : step of random walk I – A 2 : Laplacian of the 2 step random walk Dense matrix! Repeated squaring: A 16 = (((( A 2 ) 2 ) 2 ) 2, 4 operations O(logn) terms ok Similar to multi-level methods Still a graph!

20 OUTLINE (Structured) Linear Systems Iterative and Direct Methods (Graph) Sparsification Sparsified Squaring Speeding up Gaussian Elimination

21 GRAPH SPARSIFICATION Any undirected graph can be approximated by an undirected graph with [ST `04]: O(nlog O(1) n) edges [BSS`09]: O(n) edges

22 NOTION OF APPROXIMATION Same as small relative condition number, reflexive, composes naturally A ≈ ε B if both exp(ε) A – B and exp(ε) B – A are P.S.D. Necessary condition: all cuts similar ≈ ≈

23 HOW? Simplest explanation (so far): [SS`08] importance sampling on the edges Keep edge e with probability p e, rescale if kept to maintain expectation

24 HOW TO SAMPLE? Widely used: uniform sampling Works well when data is uniform e.g. complete graph Problem: long path, removing any edge changes connectivity (can also have both in one graph)

25 THE `RIGHT’ PROBABILITIES Path + clique: 1 1/n τ : L 2 statistical leverage scores τ e = trace( L + L e ) Interpretation: effective resistance [Rudelson, Vershynin `07], [Tropp `12]: p e ≥ τ e O( logn) gives good sparsifier.

26 COMPUTING SAMPLING PROBABILITIES τ : leverage scores / effective resistance τ e = trace( M + M e ) [BSS`09][LS `15]: potential functions [ST `04][OV `11]: spectral partitioning [SS`08][CLMMPS`15]: Gaussian projections [Koutis `14]: spanners / low diameter partitions

27 OUTLINE (Structured) Linear Systems Iterative and Direct Methods (Graph) Sparsification Sparsified Squaring Speeding up Gaussian Elimination

28 SQUARING Sparsifiers (plus a few tricks) gives for any A, A ’ s.t. I – A ’ ≈ I – A 2 Plan: build algorithms around sparsifiers and identities involving I – A and I – A 2

29 SIMILAR TO ConnectivityParallel Solver Iteration A i+1 ≈ A i 2 Until | A d | small Size ReductionLow degreeSparse graph MethodDerandomizedRandomized Solution transferConnectivity ( I - A i )x i = b i Multiscale methods NC algorithm for shortest path Logspace connectivity: [Reingold `02] Deterministic squaring: [RV`05]

30 APPROXIMATE INVERSE CHAIN I - A 1 ≈ ε I – A 2 I – A 2 ≈ ε I – A 1 2 … I – A i ≈ ε I – A i-1 2 I - A d ≈ I I - A 0 I - A d ≈ I Convergence: I – A i+1 ≈ ε I – A i 2 implies | A i+1 |<| A i | 1.5 | A i | κ < 0.8: can stop at d = O(logκ)

31 ISSUE: ERROR AT EACH STEP Only have 1 – a i+1 ≈ 1 – a i 2 Solution: apply one at a time (1 – a i ) -1 = (1 + a i )(1 – a i 2 ) -1 ≈ (1 + a i )(1 – a i+1 ) -1 Induction: z i+1 ≈ (1 – a i+1 ) -1 I - A 0 I - A d ≈ I z i = (1 + a i ) z i+1 ≈ (1 + a i )(1 – a i+1 ) -1 ≈(1 – a i ) -1 Need to invoke: (1 – a) -1 = (1 + a) (1 + a 2 ) (1 + a 4 )… z d = (1 – a d ) -1 ≈ 1

32 ISSUE: MATRIX COMPOSITION In matrix setting, replacements by approximations need to be symmetric: Z ≈ Z ’  U T ZU ≈ U T Z ’ U Terms around Z ’ needs to be symmetric ( I – A i ) Z is not symmetric  Solution 1 ([PS `14]): (1 – a) -1 =1/2 ( 1 + (1 + a)(1 – a 2 ) -1 (1 + a))

33 ALGORITHM Z ’ ≈ ( 1 – A 2 ) -1 ( I – A ) -1 = ½ [ I +( 1 + A ) ( I – A 2 ) -1 ( 1 + A )] Composition: Z ≈ ( I – A ) -1 Total error = dε= O(logκε) Chain: ( I – A ’ ) -1 ≈ ( I – A i 2 ) -1 Z  ½ [ I +(1 + A ) Z ’ ( I + A )] Induction: Z ’ ≈ ( I – A ’ ) -1

34 PSEUDOCODE x = Solve( I, A 0, … A d, b) 1.For i from 1 to d, set b i = ( I + A i ) b i-1. 2.Set x d = b d. 3.For i from d - 1 downto 0, set x i = ½[b i +( I + A i )x i+1 ].

35 FACTORIZATION INTO PRODUCT [CCLPT`15] alternate step for computing matrix roots, ( I – A ) p for some |p|<1 ( I – A ) -1 = (I + A /2) ( I – 3/4 A 2 -1/4 A 3 ) -1 (I + A /2) Hard part: sparsifying I – 3/4 A 2 -1/4 A 3 3/4( I – A 2 ): same as before 1/4( I – A 3 ): cubic power

36 WHAT IS I - A 3 A : one step of random walk A 3 : 3 steps of random walk (part of) edge uv in I - A 3 Length 3 path in A : u-y-z-v Weight: A uy A yz A zv

37 PSEUDOCODE Repeat O(cmlognε -2 ) times: 1.Pick an integer 1 ≤ k ≤ c and an edge e = uv, both uniformly at random. 2.Perform (k -1)-step random walk from u. 3.Perform (r - k)-step random walk from v. 4.Add a scaled copy of the corresponding edge to the sparsifier Resembles: Local clustering Approximate triangle counting (c = 3)

38 OUTLINE (Structured) Linear Systems Iterative and Direct Methods (Graph) Sparsification Sparsified Squaring Speeding up Gaussian Elimination

39 DIRECT METHODS Row reduction Eliminate variable by subtracting equations from each other Sparse case? Effect of reduction: creates more non-zeros in matrix. Quickly get dense matrices Runtime: n steps, each O(degree 2 ), O(n 3 ) total

40 SPARSE GAUSSIAN ELIMINATION Goal: keep intermediate matrices sparse? [George `73][LRT `79]: nested dissection: O(nlogn) size inverses for planar graphs Schur Complement

41 KEY QUESTION Ways of controlling fill: Eliminate in the right order: Minimum degree heuristic Elimination / separator trees Drop entries: incomplete Cholesky Schur complement is still a graph, can also be sparsified

42 SPARSE BLOCK CHOLESKY Linear system solve reduces to: 2 solves involving top left block 1 solve on the Schur complement [KLPRS`16]: Repeatedly pivot out constant fraction of variables similar to matrix inverse via matrix multiplication (solves on red blocks)

43 TAIL RECURSION (solves on red blocks) Choose partition so top-left is easy to invert using iterative methods Recurrence: T(n) = T(0.99n) + O(nnz)

44 CHOOSING SET TO ELIMINATE α- block diagonally dominant (α-BDD) subset F: each vertex has ≥ 0.1 of total (weighed) degree going to V \ F = C Intuition: approximate independent set Identical to AMG: C: coarse grid F: fine grid - coarse Best case scenario: independent set

45 ITERATIVE METHOD ON M FF Division with multiplication: (1 – a) -1 = 1 + a + a 2 + a 3 … M FF = I – A : Row/column sum of A < 0.9 A 10t < e -t, quickly goes to 0 We had to be very careful with operators when addressing this. OPEN : random walk based view

46 Findingα-bDD subsets Pick F randomly: each u w.p. ½ Trim F: only keep good blocks Removing blocks from F can only decrease inner degree of remaining blocks Linearity of expectation: 1/4 of all blocks kept w.p. 1/2 half of u’s neighbors are not picked Markov inequality: u picked, and good w.p. ≥ 1/4

47 OVERALL CALL ROUTINE Cost with O(n) sized sparse approximations: T(n) = T(0.99n) + O(n) = O(n) 2 solves involving top left block: O(nnz) 1 solve on the Schur complement: T(0.99n)

48 KYNG-SACHDEVA `16 (https://arxiv.org/abs/1605.02353) Per-entry pivoting, almost identical to incomplete LU

49 ONGOING WORK Connection to multigrid / multiscale? Other low factor width matrices: Multi-commodity flows? Linear elasticity problems? General PSD Linear Systems? Extension to convex optimization?


Download ppt "Sparsified Matrix Algorithms for Graph Laplacians Richard Peng Georgia Tech."

Similar presentations


Ads by Google