Presentation is loading. Please wait.

Presentation is loading. Please wait.

High Performance Linear System Solvers with Focus on Graph Laplacians

Similar presentations


Presentation on theme: "High Performance Linear System Solvers with Focus on Graph Laplacians"— Presentation transcript:

1 High Performance Linear System Solvers with Focus on Graph Laplacians
Richard Peng Georgia Tech Based on recent works joint with: Serban Stan (Yale), Haoran Xu (MIT), Shen Chen Xu (CMU), Saurabh Sawlani (GaTech) John Gilbert (UCSB), Kevin Deweese (UCSB), Gary Miller (CMU),Hui Han Chin (CMU) 1

2 OUtline Flows and Lx = b Benchmarks and evaluations
Solvers using tree data structures Numerics of combinatorial preconditioning

3 Large Scale Problems Physical Simulation Network Analysis Optimization
Need for efficiency in: Scientific computing Graph theory Machine learning

4 Combinatorial Flows (for unweighted, undirected graphs)
Given s, t, find the maximum number of disjoint s-t paths Applications: Clustering Image processing Scheduling s t Dual: separate s and t by removing fewest edges s t

5 Scientific Flows (for unweighted, undirected graphs)
Given s, t, find the flow with minimum energy 1Ω 3Ω Energy: Σe re fe2 s 2Ω t Dual: compute voltages, linear system Lx = b ? -1 1 graph Laplacian Matrix L Diagonal: degree Off-diagonal: -edge weights =

6 few iterations of Lx = b [Tutte `61]: graph drawing, embeddings
[ZGL `03], [ZHS `05]: inference on graphical models Inverse powering: eigenvectors / heat kernel: [AM `85] spectral clustering [OSV `12]: balanced cuts [SM `01][KMST `09]: image segmentation

7 Solving Lx = b Multigrid methods widely used in scientific computing
Good runtimes for systems with as many as 109 nonzeros MATLAB: pcg(L, ichol(L), b, ε) ‘works’ for 106 nonzeros [ST`04]: O(mlogcnlog(1/ε)) time 2004 – 2014: c halved every 2 years logc: 70 32 15 6 2 1 1/2 year: 2004 2006 2008 2009 2010 2011 2014

8 Many iterations of Lx = b
[Karmarkar, Ye, Renegar, Nesterov, Nemirovski …]: convex optimization via. solving O(m1/2) linear systems [DS `08]: interior point method for 1-commodity flow problems on graphs only requires solving linear systems in graph Laplacians. Best theoretical bounds for: bipartite matching, mincost flow, approx. k-commodity flow

9 OUtline Flows and Lx = b Benchmarks and evaluations
Solvers using tree data structures Numerics of combinatorial preconditioning

10 [KRS`15]: Isotonic Regression
README file we suggest rerunning the program a few times and / or using a different solver. An alternate solver based on incomplete Cholesky is provided with the code. Label vertices of a directed graph: Minimize ║s - x║p If u  v is an edge, then xu ≥ xv 1.5 2 1.5 1 3 3 [KRS`15]: use interior point methods to solve this convex program (or LP if p = 1)

11 [KRS`15] w. Different Solvers
README file we suggest rerunning the program a few times and / or using a different solver. An alternate solver based on incomplete Cholesky is provided with the code.

12 New ways of using Solvers
Sequence of (adaptively) generated linear systems: Problem Widely varying weights Multiscale behavior Difficulties of combinatorial problems on graphs Solver Usually preconditioned conjugate gradient (PCG) Preconditioner: Main focus of these solvers, [Vaidya `91]: combinatorial preconditioners via graph theory

13 New benchmarks Structured graphs Grids / cubes Cayley graphs
Graph products Hard graph problems Maxflow problems from DIMACS implementation challenges Linear systems arising from second-order optimization (IPM)

14 OUtline Flows and Lx = b Benchmarks and evaluations
Solvers using tree data structures Numerics of combinatorial preconditioning

15 Fast Tree-based Solvers
Gradually transform a tree-based solution to a solution on the entire graph Claim: these ideas lead to code that can solve any Lx = b with 109 edges in ≤ 10 seconds on ≤ 64 cores Method Cycle Toggle Ultrasparsifier Cost / Iter logn m + (m/k)2 # Iters mlog1/2nlog(1/ε) k1/2log(1/ε) Related to SGD Grad. descent Step uses Data structures Mat-Vec multiply

16 Cycle Toggling Find a good tree.
Pick one off tree edge e at a time, take cycle to minimum energy state. [KOSZ `13]: mlogn toggles, each costing O(logn) [LS `13]: CG-like acceleration to O(mlog1/2n) toggles

17 Choice of Tree Performances depends on total stretch:
length of tree path / length of edge Unweighted case: length of tree path stretch = 3 Runtime Total Stretch AKPW `91 O(m log log n) mo(1) Bartal 96, 98 O(m log n) O(log n log log n) FRT `03 O(m log3 n) O(log n) EEST `05 O(m log 2n O(log2 n log log n) ABN `08 O(log n (log log n)3) AN `12 In case of unit weights, can view as length of tree path stretch = 2

18 Moving Pieces vs Trees: MST / bottom-up / top-down / adaptive
Data structures: offline / static / dynamic Numerics: batched / local, accelerated / CG Initialization: tree solution / recursive

19 Benchmark for tree based Algos: Heavy Path Graphs
Pick a Hamiltonian path, weight all other edges so each has stretch 1 Bad case for PCG, `easy’ for tree data structures

20 Cycle Toggling vs. CG https://arxiv.org/abs/1609.02957

21 Variants of Cycle Toggling

22 OUtline Flows and Lx = b Benchmarks and evaluations
Solvers using tree data structures Numerics of combinatorial preconditioning

23 Numerics? PCG with a low-stretch tree as preconditioner
Measure residue error after i iters of PCG using C++ double (53 bits) or MPRF (1024 bits). More bits = better convergence TreePCG does better under exact arithmetic

24 WHY? Relative eigenvalues 0 < λmin ≤ … ≤ λn
Bound: for ANY k, k + (λn-k / λmin)1/2 [SW `09]: low stretch trees gives: 1 ≤ λmin Σ λi ≤ total stretch ≈ m Numerically stable bound: k = n  m1/2 Best bound: k = m1/3  m1/3 Polynomial interpolation produces large coefficients

25 One Fix: Batched Processing
Add some edges to a tree to form a `batched’ preconditioner Use exact methods on preconditioner [Vaidya `91]: MST + edges [KMP`10]: O(mlog2n/k) edges  k1/2 iters Optimize: m5/4log1/2n

26 Augtree vs TreePCG Preliminary Results
Augmented Tree PCG # iter precon time total time 2MeshUnweightedUniformStretch1e6 168 2.4043s s 38 s s 1382 19.675s s 84 s s 2MeshUniformStretch1e6 139 s s 30 s s 2MeshExpStretch1e6 1482 s s 72 s s 3MeshUnweightedUniformStretch1e6 461 6.2914s s 55 s s 3MeshUnweightedExpStretch1e6 2745 s s 137 s s 3MeshWeightedUniformStretch1e6 407 s s 44 s s 3MeshWeightedExpStretch1e6 2212 29.915s s 120 s s Not included: time to eliminate on the tree: we need a better sparse direct method

27 Ongoing / Future Works Better performances of Lx = b packages?
Can numerical precision be analyzed through the graph theoretic components? Primal-dual view of numerical precision? Repos & Papers:


Download ppt "High Performance Linear System Solvers with Focus on Graph Laplacians"

Similar presentations


Ads by Google