Sparsified Matrix Algorithms for Graph Laplacians Richard Peng Georgia Tech.

Sparsified Matrix Algorithms for Graph Laplacians Richard Peng Georgia Tech

OUTLINE (Structured) Linear Systems Iterative and Direct Methods (Graph) Sparsification Sparsified Squaring Speeding up Gaussian Elimination

GRAPH LAPLACIANS 1 1 2 -1 -1 -1 1 0 -1 0 1 Matrices that correspond to undirected graphs Coordinates  vertices Non-zeros  edges This talk: weighted, undirected graphs, and symmetric PSD matrices

THIS TALK Provably efficient algorithms for graph Laplacians, with focus on solving linear systems Why linear systems? Primitive in many graph algorithms Simplest convex optimization problem Algorithms on them often generalize

THE LAPLACIAN PARADIGM Directly related : Elliptic systems Few iterations : Eigenvectors, Heat kernels Many iterations / modify algorithm Graph problems Image processing

SCIENTIFIC COMPUTING Reducible to SDD systems, M-matrices [BHV `04, DS `07] PDEs, trusses [CFMNPW`14]: Helmholtz on meshes

DATA ANALYSIS [ZGL `03][ZHS `05][CCLPT `15]: inference / sampling on graphical models [KMST `09, CMMP `13]: image segmentation / denoising

[Tutte `62] Planar graph embeddings in 2 solves [KMP `09][MST `15] random spanning trees, Õ(m 4/3 ) [DS `08, LS`13] mincost / lossy flows, Õ(mn 1/2 ) GRAPHS Õ hides factors of log c n

[CKMST `11] [Sherman `13][KLOS `13][P `16]: approx. undirected maxflow, Õ(m 4/3 )  Õ(m 1+ε )  Õ(m) [OSV `12]: balanced cuts, heat kernel walks, Õ(m) [Madry `13]: bipartite matching in Õ(m 10/7 ) [CMSV `16]: mincost matching and negative length shortest paths in Õ(m 10/7 ) GRAPHS, FASTER

WHY WORST CASE ANALYSIS? The Laplacian paradigm of designing graph algorithms Optimization Problem Linear System Solver Sequence of (adpatively) generated linear systems Main difficulties: Widely varying weights Multi-scale behavior

INSTANCE: ISOTONIC REGRESSION [Kyng-Rao-Sachdeva `15]: https://github.com/sachdevasushant/Isotonic /blob/master/README.mdhttps://github.com/sachdevasushant/Isotonic /blob/master/README.md : …we suggest rerunning the program a few times and/or using a different solver. An alternate solver based on incomplete Cholesky factorization is provided with the code. Numbers thanks to Kevin Deweese (UCSB)

LINEAR SYSTEM SOLVERS [~0] Gaussian Elimination: O(n 3 ) [Strassen `69] O(n 2.8 ) [Coppersmith-Winograd `90] O(n 2.3755 ) [Stothers `10] O(n 2.3737 ) [Vassilevska Williams`11] O(n 2.3727 ) [Hestenes-Stiefel `52] Conjugate gradient: O(nm) (?)

APPROACHES DirectIterative Unit stepModifying entryMatrix-vector multiply Main goalSimplify systemExplored rank space Cost per stepO(1)O(m) #StepsO(n 2.3727 )O(n) TotalO(n 2.3727 )O(nm) Performances comparable on medium sized instances: m = 10 5 takes ~ 1 second

EXTREME INSTANCES Highly connected, need global steps Long paths / tree, need many steps Solvers must handle both simultaneously Each easy on their own: Iterative methodDirect Method

SIMPLIFICATION Adjust/rescale so diagonal = I Add to diagonal to make full rank L = I – A A: Random walk

ITERATIVE METHODS Division with multiplication: (1 – a) -1 = 1 + a + a 2 + a 3 … Spectral theorem: this works for symmetric PSD matrices matrices well-approximated by their diagonal blocks are easy to solve If |a| ≤ ρ, κ = (1-ρ) -1 terms give good approximation to (1 – a) -1 Matrix version: L -1 = I + A + A 2 + A 3 +…

LOWER BOUND FOR ITERATIVE METHODS Exists G (e.g. cycle) that require Ω(n) steps Graph theoretic interpretation: each term = 1 step walk A diameter b bAbA2bA2b Closely related to Smoothness 1/2 lower bound for # of gradient steps

( I – A ) -1 = I + A + A 2 + A 3 + …. = ( I + A ) ( I + A 2 ) ( I + A 4 )… DEGREE N  N OPERATIONS? Combinatorial view: A : step of random walk I – A 2 : Laplacian of the 2 step random walk Dense matrix! Repeated squaring: A 16 = (((( A 2 ) 2 ) 2 ) 2, 4 operations O(logn) terms ok Similar to multi-level methods Still a graph!

GRAPH SPARSIFICATION Any undirected graph can be approximated by an undirected graph with [ST `04]: O(nlog O(1) n) edges [BSS`09]: O(n) edges

NOTION OF APPROXIMATION Same as small relative condition number, reflexive, composes naturally A ≈ ε B if both exp(ε) A – B and exp(ε) B – A are P.S.D. Necessary condition: all cuts similar ≈ ≈

HOW? Simplest explanation (so far): [SS`08] importance sampling on the edges Keep edge e with probability p e, rescale if kept to maintain expectation

HOW TO SAMPLE? Widely used: uniform sampling Works well when data is uniform e.g. complete graph Problem: long path, removing any edge changes connectivity (can also have both in one graph)

THE `RIGHT’ PROBABILITIES Path + clique: 1 1/n τ : L 2 statistical leverage scores τ e = trace( L + L e ) Interpretation: effective resistance [Rudelson, Vershynin `07], [Tropp `12]: p e ≥ τ e O( logn) gives good sparsifier.

COMPUTING SAMPLING PROBABILITIES τ : leverage scores / effective resistance τ e = trace( M + M e ) [BSS`09][LS `15]: potential functions [ST `04][OV `11]: spectral partitioning [SS`08][CLMMPS`15]: Gaussian projections [Koutis `14]: spanners / low diameter partitions

SQUARING Sparsifiers (plus a few tricks) gives for any A, A ’ s.t. I – A ’ ≈ I – A 2 Plan: build algorithms around sparsifiers and identities involving I – A and I – A 2

SIMILAR TO ConnectivityParallel Solver Iteration A i+1 ≈ A i 2 Until | A d | small Size ReductionLow degreeSparse graph MethodDerandomizedRandomized Solution transferConnectivity ( I - A i )x i = b i Multiscale methods NC algorithm for shortest path Logspace connectivity: [Reingold `02] Deterministic squaring: [RV`05]

APPROXIMATE INVERSE CHAIN I - A 1 ≈ ε I – A 2 I – A 2 ≈ ε I – A 1 2 … I – A i ≈ ε I – A i-1 2 I - A d ≈ I I - A 0 I - A d ≈ I Convergence: I – A i+1 ≈ ε I – A i 2 implies | A i+1 |<| A i | 1.5 | A i | κ < 0.8: can stop at d = O(logκ)

ISSUE: ERROR AT EACH STEP Only have 1 – a i+1 ≈ 1 – a i 2 Solution: apply one at a time (1 – a i ) -1 = (1 + a i )(1 – a i 2 ) -1 ≈ (1 + a i )(1 – a i+1 ) -1 Induction: z i+1 ≈ (1 – a i+1 ) -1 I - A 0 I - A d ≈ I z i = (1 + a i ) z i+1 ≈ (1 + a i )(1 – a i+1 ) -1 ≈(1 – a i ) -1 Need to invoke: (1 – a) -1 = (1 + a) (1 + a 2 ) (1 + a 4 )… z d = (1 – a d ) -1 ≈ 1

ISSUE: MATRIX COMPOSITION In matrix setting, replacements by approximations need to be symmetric: Z ≈ Z ’  U T ZU ≈ U T Z ’ U Terms around Z ’ needs to be symmetric ( I – A i ) Z is not symmetric  Solution 1 ([PS `14]): (1 – a) -1 =1/2 ( 1 + (1 + a)(1 – a 2 ) -1 (1 + a))

ALGORITHM Z ’ ≈ ( 1 – A 2 ) -1 ( I – A ) -1 = ½ [ I +( 1 + A ) ( I – A 2 ) -1 ( 1 + A )] Composition: Z ≈ ( I – A ) -1 Total error = dε= O(logκε) Chain: ( I – A ’ ) -1 ≈ ( I – A i 2 ) -1 Z  ½ [ I +(1 + A ) Z ’ ( I + A )] Induction: Z ’ ≈ ( I – A ’ ) -1

PSEUDOCODE x = Solve( I, A 0, … A d, b) 1.For i from 1 to d, set b i = ( I + A i ) b i-1. 2.Set x d = b d. 3.For i from d - 1 downto 0, set x i = ½[b i +( I + A i )x i+1 ].

FACTORIZATION INTO PRODUCT [CCLPT`15] alternate step for computing matrix roots, ( I – A ) p for some |p|<1 ( I – A ) -1 = (I + A /2) ( I – 3/4 A 2 -1/4 A 3 ) -1 (I + A /2) Hard part: sparsifying I – 3/4 A 2 -1/4 A 3 3/4( I – A 2 ): same as before 1/4( I – A 3 ): cubic power

WHAT IS I - A 3 A : one step of random walk A 3 : 3 steps of random walk (part of) edge uv in I - A 3 Length 3 path in A : u-y-z-v Weight: A uy A yz A zv

PSEUDOCODE Repeat O(cmlognε -2 ) times: 1.Pick an integer 1 ≤ k ≤ c and an edge e = uv, both uniformly at random. 2.Perform (k -1)-step random walk from u. 3.Perform (r - k)-step random walk from v. 4.Add a scaled copy of the corresponding edge to the sparsifier Resembles: Local clustering Approximate triangle counting (c = 3)

DIRECT METHODS Row reduction Eliminate variable by subtracting equations from each other Sparse case? Effect of reduction: creates more non-zeros in matrix. Quickly get dense matrices Runtime: n steps, each O(degree 2 ), O(n 3 ) total

SPARSE GAUSSIAN ELIMINATION Goal: keep intermediate matrices sparse? [George `73][LRT `79]: nested dissection: O(nlogn) size inverses for planar graphs Schur Complement

KEY QUESTION Ways of controlling fill: Eliminate in the right order: Minimum degree heuristic Elimination / separator trees Drop entries: incomplete Cholesky Schur complement is still a graph, can also be sparsified

SPARSE BLOCK CHOLESKY Linear system solve reduces to: 2 solves involving top left block 1 solve on the Schur complement [KLPRS`16]: Repeatedly pivot out constant fraction of variables similar to matrix inverse via matrix multiplication (solves on red blocks)

TAIL RECURSION (solves on red blocks) Choose partition so top-left is easy to invert using iterative methods Recurrence: T(n) = T(0.99n) + O(nnz)

CHOOSING SET TO ELIMINATE α- block diagonally dominant (α-BDD) subset F: each vertex has ≥ 0.1 of total (weighed) degree going to V \ F = C Intuition: approximate independent set Identical to AMG: C: coarse grid F: fine grid - coarse Best case scenario: independent set

ITERATIVE METHOD ON M FF Division with multiplication: (1 – a) -1 = 1 + a + a 2 + a 3 … M FF = I – A : Row/column sum of A < 0.9 A 10t < e -t, quickly goes to 0 We had to be very careful with operators when addressing this. OPEN : random walk based view

Findingα-bDD subsets Pick F randomly: each u w.p. ½ Trim F: only keep good blocks Removing blocks from F can only decrease inner degree of remaining blocks Linearity of expectation: 1/4 of all blocks kept w.p. 1/2 half of u’s neighbors are not picked Markov inequality: u picked, and good w.p. ≥ 1/4

OVERALL CALL ROUTINE Cost with O(n) sized sparse approximations: T(n) = T(0.99n) + O(n) = O(n) 2 solves involving top left block: O(nnz) 1 solve on the Schur complement: T(0.99n)

KYNG-SACHDEVA `16 (https://arxiv.org/abs/1605.02353) Per-entry pivoting, almost identical to incomplete LU

ONGOING WORK Connection to multigrid / multiscale? Other low factor width matrices: Multi-commodity flows? Linear elasticity problems? General PSD Linear Systems? Extension to convex optimization?

Sparsified Matrix Algorithms for Graph Laplacians Richard Peng Georgia Tech.

Similar presentations

Presentation on theme: "Sparsified Matrix Algorithms for Graph Laplacians Richard Peng Georgia Tech."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Sparsified Matrix Algorithms for Graph Laplacians Richard Peng Georgia Tech.

Similar presentations

Presentation on theme: "Sparsified Matrix Algorithms for Graph Laplacians Richard Peng Georgia Tech."— Presentation transcript:

Similar presentations

About project

Feedback