Multifaceted Algorithm Design Richard Peng M.I.T..

Slides:

Advertisements

Similar presentations

Numerical Linear Algebra in the Streaming Model Ken Clarkson - IBM David Woodruff - IBM.

Advertisements

05/11/2005 Carnegie Mellon School of Computer Science Aladdin Lamps 05 Combinatorial and algebraic tools for multigrid Yiannis Koutis Computer Science.

Partitional Algorithms to Detect Complex Clusters

Fast Regression Algorithms Using Spectral Graph Theory Richard Peng.

Accelerated, Parallel and PROXimal coordinate descent IPAM February 2014 APPROX Peter Richtárik (Joint work with Olivier Fercoq - arXiv: )

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.

The Stability of a Good Clustering Marina Meila University of Washington

The Combinatorial Multigrid Solver Yiannis Koutis, Gary Miller Carnegie Mellon University TexPoint fonts used in EMF. Read the TexPoint manual before you.

1 EE5900 Advanced Embedded System For Smart Infrastructure Static Scheduling.

Uniform Sampling for Matrix Approximation Michael Cohen, Yin Tat Lee, Cameron Musco, Christopher Musco, Richard Peng, Aaron Sidford M.I.T.

“Devo verificare un’equivalenza polinomiale…Che fò? Fò dù conti” (Prof. G. Di Battista)

Multicut Lower Bounds via Network Coding Anna Blasiak Cornell University.

An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)

Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.

Graph Laplacian Regularization for Large-Scale Semidefinite Programming Kilian Weinberger et al. NIPS 2006 presented by Aggeliki Tsoli.

Preconditioning in Expectation Richard Peng Joint with Michael Cohen (MIT), Rasmus Kyng (Yale), Jakub Pachocki (CMU), and Anup Rao (Yale) MIT CMU theory.

SDD Solvers: Bridging theory and practice Yiannis Koutis University of Puerto Rico, Rio Piedras joint with Gary Miller, Richard Peng Carnegie Mellon University.

10/11/2001Random walks and spectral segmentation1 CSE 291 Fall 2001 Marina Meila and Jianbo Shi: Learning Segmentation by Random Walks/A Random Walks View.

Graph Sparsifiers by Edge-Connectivity and Random Spanning Trees Nick Harvey U. Waterloo Department of Combinatorics and Optimization Joint work with Isaac.

Graph Sparsifiers: A Survey Nick Harvey Based on work by: Batson, Benczur, de Carli Silva, Fung, Hariharan, Harvey, Karger, Panigrahi, Sato, Spielman,

Lecture 21: Spectral Clustering

Graph Sparsifiers: A Survey Nick Harvey UBC Based on work by: Batson, Benczur, de Carli Silva, Fung, Hariharan, Harvey, Karger, Panigrahi, Sato, Spielman,

Graph Sparsifiers by Edge-Connectivity and Random Spanning Trees Nick Harvey University of Waterloo Department of Combinatorics and Optimization Joint.

Sampling from Gaussian Graphical Models via Spectral Sparsification Richard Peng M.I.T. Joint work with Dehua Cheng, Yu Cheng, Yan Liu and Shanghua Teng.

Sampling: an Algorithmic Perspective Richard Peng M.I.T.

Approximate Undirected Maximum Flows in O(m polylog(n)) Time

Recent Development on Elimination Ordering Group 1.

Computing Sketches of Matrices Efficiently & (Privacy Preserving) Data Mining Petros Drineas Rensselaer Polytechnic Institute (joint.

Algebraic Structures and Algorithms for Matching and Matroid Problems Nick Harvey.

1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 6 May 7, 2006

Randomness in Computation and Communication Part 1: Randomized algorithms Lap Chi Lau CSE CUHK.

L p Row Sampling by Lewis Weights Richard Peng Joint with Michael Cohen (M.I.T.) M.I.T.

Sketching as a Tool for Numerical Linear Algebra David Woodruff IBM Almaden.

Domain decomposition in parallel computing Ashok Srinivasan Florida State University COT 5410 – Spring 2004.

Yiannis Koutis , U of Puerto Rico, Rio Piedras

Graph Sparsifiers Nick Harvey University of British Columbia Based on joint work with Isaac Fung, and independent work of Ramesh Hariharan & Debmalya Panigrahi.

Complexity of direct methods n 1/2 n 1/3 2D3D Space (fill): O(n log n)O(n 4/3 ) Time (flops): O(n 3/2 )O(n 2 ) Time and space to solve any problem on any.

Institute for Advanced Study, April Sushant Sachdeva Princeton University Joint work with Lorenzo Orecchia, Nisheeth K. Vishnoi Linear Time Graph.

Graph Algorithms. Definitions and Representation An undirected graph G is a pair (V,E), where V is a finite set of points called vertices and E is a finite.

Graph Sparsifiers Nick Harvey Joint work with Isaac Fung TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A.

Spectrally Thin Trees Nick Harvey University of British Columbia Joint work with Neil Olver (MIT  Vrije Universiteit) TexPoint fonts used in EMF. Read.

Learning Spectral Clustering, With Application to Speech Separation F. R. Bach and M. I. Jordan, JMLR 2006.

Implicit Hitting Set Problems Richard M. Karp Erick Moreno Centeno DIMACS 20 th Anniversary.

New algorithms for Disjoint Paths and Routing Problems

Graph Partitioning using Single Commodity Flows

Graphs, Vectors, and Matrices Daniel A. Spielman Yale University AMS Josiah Willard Gibbs Lecture January 6, 2016.

1 Algebraic and combinatorial tools for optimal multilevel algorithms Yiannis Koutis Carnegie Mellon University.

Iterative Row Sampling Richard Peng Joint work with Mu Li (CMU) and Gary Miller (CMU) CMU  MIT.

Spectral Clustering Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.

Sketching complexity of graph cuts Alexandr Andoni joint work with: Robi Krauthgamer, David Woodruff.

Algorithm Frameworks Using Adaptive Sampling Richard Peng Georgia Tech.

Sparsified Matrix Algorithms for Graph Laplacians Richard Peng Georgia Tech.

Laplacian Matrices of Graphs: Algorithms and Applications ICML, June 21, 2016 Daniel A. Spielman.

Web Mining Link Analysis Algorithms Page Rank. Ranking web pages  Web pages are not equally “important” v  Inlinks.

Laplacian Matrices of Graphs: Algorithms and Applications ICML, June 21, 2016 Daniel A. Spielman.

Spectral Methods for Dimensionality

High Performance Linear System Solvers with Focus on Graph Laplacians

Richard Peng Georgia Tech Michael Cohen Jon Kelner John Peebles

Resparsification of Graphs

Efficient methods for finding low-stretch spanning trees

Parallel Algorithm Design using Spectral Graph Theory

Density Independent Algorithms for Sparsifying

MST in Log-Star Rounds of Congested Clique

Structural Properties of Low Threshold Rank Graphs

CIS 700: “algorithms for Big Data”

Matrix Martingales in Randomized Numerical Linear Algebra

CSCI B609: “Foundations of Data Science”

A Numerical Analysis Approach to Convex Optimization

Lecture 15: Least Square Regression Metric Embeddings

On Solving Linear Systems in Sublinear Time

Presentation transcript:

Multifaceted Algorithm Design Richard Peng M.I.T.

LARGE SCALE PROBLEMS Emphasis on efficient algorithms in: Scientific computing Graph theory (randomized) numerical routines Network Analysis Physical Simulation Optimization

WELL STUDIED QUESTIONS Scientific computing: fast solvers for structured linear systems Graphs / combinatorics: network flow problems Randomized algorithms: subsampling matrices and optimization formulations B B’

MY REPRESENTATIVE RESULTS B B’ Current fastest sequential and parallel solvers for linear systems in graph Laplacians matrices First nearly-linear time algorithm for approximate undirected maxflow First near-optimal routine for row sampling matrices in a 1-norm preserving manner

RECURRING IDEAS Can solve a problem by iteratively solving several similar instances Approximations lead to better approximations Larger problems can be approximated by smaller ones Approximator Data

MY APPROACH TO ALGORITHM DESIGN Numerical analysis / Optimization Statistics / Randomized algorithms Problems at their intersection Identify problems that arise at the intersection of multiple areas and study them from multiple angles Combinatorics / Discrete algorithms This talk: structure- preserving sampling

SAMPLING Classical use in statistics: Extract info from a large data set Directly output result (estimator) Sampling from matrices, networks, and optimization problems: Often compute on the sample Need to preserve more structure B B’

PRESERVING GRAPH STRUCTURES Undirected graph, n vertices, m < n 2 edges Is n 2 edges (dense) sometimes necessary? For some information, e.g. connectivity: encoded by spanning forest, < n edges Deterministic, O(m) time algorithm : questions

MORE INTRICATE STRUCTURES k-connectivity: # of disjoint paths between s-t [Benczur-Karger `96]: for ANY G, can sample to get H with O(nlogn) edges s.t. G ≈ H on all cuts Stronger: weights of all 2 n cuts in graphs Cut: # of edges leaving a subset of vertices s t Menger’s theorem / maxflow- mincut : previous works ≈: multiplicative approximation

HOW TO SAMPLE? Widely used: uniform sampling Works well when data is uniform e.g. complete graph Problem: long path, removing any edge changes connectivity (can also have both in one graph) More systematic view of sampling?

ALGEBRAIC REPRESENTATION OF GRAPHS n rows / columns O(m) non-zeros 1 1 n vertices m edges graph Laplacian Matrix L Diagonal: degree Off-diagonal: -edge weights Edge-vertex incidence matrix: B eu =-1/1 if u is endpoint of e 0 otherwise m rows n columns L is the Gram matrix of B, L = B T B

x v =0 SPECTRAL SIMILARITY Numerical analysis: L G ≈ L H if x T L G x ≈ x T L H x for all vectors x x = {0, 1} V : G ≈ H on all cuts x u =1 x z =1 (1-0) 2 =1 (1-1) 2 =0 Gram matrix: L G = B G T B G  x T L G x =║ B G x ║ 2 2 B eu =-1/1 if u is endpoint of e 0 otherwise ║ B G x ║ 2 ≈║ B H x ║ 2 ∀ x ║ y i ║ 2 2 =Σ i y i 2 For edge e = uv, ( B e: x ) 2 = ( x u – x v ) 2 ║ B G x ║ 2 2 = size of cut given by x

n n ALGEBRAIC VIEW OF SAMPLING EDGES B’ B L 2 Row sampling: Given B with m>>n, sample a few rows to form B ’ s.t.║ Bx ║ 2 ≈║ B’x ║ 2 ∀ x Note: normally use A instead of B, n and d instead of m and n m ≈n

IMPORTANCE SAMPLING Issue: only one non-zero row Keep a row, b i, with probability p i, rescale if kept to maintain expectation Uniform sampling: p i = 1/k for a factor k size reduction norm sampling: p i =m/k║ b i ║ 2 2 / ║ B ║ F 2 Issue: column with one entry

THE `RIGHT’ PROBABILITIES Only one non-zero rowColumn with one entry n/m 1 Path + clique: 1 1/n b i : row i of B, L = B T B τ : L 2 statistical leverage scores τ i = b i T ( B T B ) -1 b i = ║ b i ║ 2 L -1

L 2 MATRIX-CHERNOFF BOUNDS [Foster `49] Σ i τ i = rank ≤ n  O( nlogn) rows [Rudelson, Vershynin `07], [Tropp `12]: sampling with p i ≥ τ i O( logn) gives B’ s.t. ║ Bx ║ 2 ≈║ B’x ║ 2 ∀ x w.h.p. τ : L 2 statistical leverage scores τ i = b i T ( B T B ) -1 b i = ║ b i ║ 2 L -1 Near optimal: L 2 -row samples of B Graph sparsifiers In practice O(logn)  5 usually suffices can also improve via derandomization

MY APPROACH TO ALGORITHM DESIGN Extend insights gained from studying problems at the intersection of multiple areas back to these areas Combinatorics / Discrete algorithms Numerical analysis / Optimization Statistics / Randomized algorithms Problems at their intersection Algorithmic extensions of structure-preserving sampling Maximum flow Solving linear systems Preserving L 1 -structures

SUMMARY Algorithm design approach: study problems at the intersection of areas, and extend insights back. Can sparsify objects via importance sampling.

Graph Laplacian Diagonal: degree Off-diagonal: - weight Combinatorics / Discrete algorithms Numerical analysis / Optimization Solvers for linear systems involving graph Laplacians Lx = b Current fastest sequential and parallel solvers for linear systems in graph Laplacians Application: estimate all τ i = ║ b i ║ 2 L -1 by solving O(logn) linear systems Directly related to: Elliptic problems SDD, M, and H-matrices Statistics / Randomized algorithms

ALGORITHMS FOR Lx = b Given any graph Laplacian L with n vertices and m edges, any vector b, find vector x s.t. Lx = b [Vaidya `89]: use graph theory! 2014: 1/2 loglog plot of c: 2011: : 2 [Spielman-Teng `04]: O(mlog c n) [P-Spielman `14]: alternate, fully parallelizable approach : my results 2006: : : : 6 : previous works : questions

ITERATIVE METHODS Division using multiplication I + A + A 2 + A 3 + …. = ( I – A ) -1 = L -1 Spectral theorem: can view as scalars Simplification: assume L = I – A, A : transition matrix of random walk Richardson iteration: truncate to i terms, Approximate x = ( I – A ) -1 b with x (i) = ( I + A + … A i ) b

RICHARDSON ITERATION #terms needed lower bounded by information propagation A diameter b Highly connected graphs: few terms ok bAbA2bA2b Need n matrix operations? Evaluation (Horner’s rule): ( I + A + A 2 ) b = A ( Ab + b ) + b i terms: x (0) = b, x (i + 1) = Ax (i) + b i matrix-vector multiplications Can interpret as gradient descent

( I – A ) -1 = I + A + A 2 + A 3 + …. = ( I + A ) ( I + A 2 ) ( I + A 4 )… DEGREE N  N OPERATIONS? Combinatorial view: A : step of random walk I – A 2 : Laplacian of the 2 step random walk Dense matrix! Repeated squaring: A 16 = (((( A 2 ) 2 ) 2 ) 2, 4 operations O(logn) terms ok Similar to multi-level methods Still a graph Laplacian! Can sparsify!

REPEATED SPARSE SQUARING Combining known tools: efficiently sparsify I – A 2 without computing A 2 ( I – A ) -1 = ( I + A ) ( I + A 2 ) ( I + A 4 )… [P-Spielman `14] approximate L -1 with O(logn) sparse matrices key ideas: modify factorization to allow gradual introduction and control of error

SUMMARY Algorithm design approach: study problems at the intersection of areas, and extend insights back. Can sparsify objects via importance sampling. Solve Lx=b efficiently via sparsified squaring.

FEW ITERATIONS OF Lx = b [Tutte `61]: graph drawing, embeddings [ZGL `03], [ZHS `05]: inference on graphical models Inverse powering: eigenvectors / heat kernel: [AM `85] spectral clustering [OSV `12]: balanced cuts [SM `01][KMST `09]: image segmentation [CFMNPW`14]: Helmholtz decomp. on 3D mesh

MANY ITERATIONS OF Lx = b [Karmarkar, Ye, Renegar, Nesterov, Nemirovski …]: convex optimization via. solving O(m 1/2 ) linear systems [DS `08]: optimization on graphs  Laplacian systems [KM `09][MST`14]: random spanning trees [CKMST `11]: faster approx maximum flow [KMP `12]: multicommodity flow

MAXFLOW Combinatorics / Discrete algorithms Numerical analysis / Optimization Statistics / Randomized algorithms Maximum flow First O(mpolylog(n)) time algorithm for approximate undirected maxflow

(for unweighted, undirected graphs) MAXIMUM FLOW PROBLEM s t s t Given s, t, find the maximum number of disjoint s-t paths Dual: separate s and t by removing fewest edges Applications: Clustering Image processing Scheduling

WHAT MAKES MAXFLOW HARD Highly connected: route up to n paths Long paths: a step may involve n vertices Goal: handle both and do better than many steps × long paths = n 2 Each ‘easy’ on their own

ALGORITHMS FOR FLOWS Current fastest maxflow algorithms: Exact (weakly-polytime): invoke Lx = b Approximate: modify algorithms for Lx = b [P`14]: (1 – ε)-approx maxflow in O(mlog c nε -2 ) time Ideas introduced: 1980: dynamic trees 1970s: Blocking flows 1986: dual algorithms 1989: connections to Lx = b 2013: modify Lx = b 2010: few calls to Lx = b

Algebraic formulation of min s-t cut: Minimize ║ Bx ║ 2 subject to x s = 0, x t = 1 and x integral MAXIMUM FLOW IN ALMOST LINEAR TIME [Madry `10]: finding O(m 1+θ ) sized approximator that require O(m θ ) calls in O(m 1+θ ) time (for any θ > 0) Approximator Maxflow [Racke-Shah-Taubig `14] O(n) sized approximator that require O(log c n) iterations via solving maxflows on graphs of total size O(mlog c n) Maxflow Approximator O(m 1+2θ ε -2 ) timeO(mlog c nε -2 ) time? Algebraic formulation of min s-t cut: Minimize ║ Bx ║ 1 subject to x s = 0, x t = 1 ║ * ║ 1 : 1-norm, sum of absolute values [Sherman `13] [Kelner-Lee-Orecchia-Sidford `13]: can find approximate maxflow iteratively via several calls to a structure approximator

ALGORITHMIC SOLUTION Ultra-sparsifier (e.g. [Koutis-Miller-P `10]): for any k, can find H close to G, but equivalent to graph of size O(m/k) ` ` Maxflow Absorb additional (small) error via more calls to approximator Recurse on instances with smaller total size, total cost: O(mlog c n) Key step: vertex reductions via edge reductions[P`14]: build approximator on the smaller graph [CLMPPS`15]: extends to numerical data, has close connections to variants of Nystrom’s method

SUMMARY Algorithm design approach: study problems at the intersection of areas, and extend insights back. Can sparsify objects via importance sampling. Solve Lx = b efficiently via sparsified squaring. Approximate maximum flow routines and structure approximators can be constructed recursively from each other via graph sparsifiers.

RANDOMIZED NUMERICAL LINEAR ALGEBRA Combinatorics / Discrete algorithms Numerical analysis / Optimization Statistics / Randomized algorithms L 1 -preserving row sampling B B’ First near-optimal routine for row sampling matrices in a 1-norm preserving manner

║y║1║y║1 ║y║2║y║2 GENERALIZATION Generalization of row sampling: given A, q, find A ’ s.t.║ Ax ║ q ≈║ A’x ║ q ∀ x 1-norm: standard for representing cuts, used in sparse recovery / robust regression Applications (for general A ): Feature selection Low rank approximation / PCA q-norm: ║ y ║ q = (Σ| y i | q ) 1/q

Omitting corresponding empirical studies ROW SAMPLING ROUTINES #rows for q=2 #rows for q=1 Runtime Dasgupta et al. `09n 2.5 mn 5 Magdon-Ismail `10nlog 2 nmn 2 Sohler-Woodruff `11n 3.5 mn ω-1+θ Drineas et al. `12nlognmnlogn Clarkson et al. `12n 4.5 log 1.5 nmnlogn Clarkson-Woodruff `12n 2 lognn8n8 nnz Mahoney-Meng `12n2n2 n 3.5 nnz+n 6 Nelson-Nguyen `12n 1+θ nnz Li et.`13, Cohen et al. 14nlognn 3.66 nnz+n ω+θ [Naor `11][Matousek `97]: on graphs, L 2 approx  L q approx ∀ 1 ≤ q ≤ 2 How special are graphs? A ’ s.t.║ Ax ║ q ≈║ A’x ║ q ∀ x nnz: # of non-zeros in A How special is L 2 ?

L 1 ROW SAMPLING L 1 Lewis weights ([Lewis `78]): w s.t. w i 2 = a i T ( A T W -1 A ) -1 a i Recursive definition! [Sampling with p i ≥ w i O( logn) gives ║ Ax ║ 1 ≈ ║ A’x ║ 1 ∀ x Can check: Σ i w i ≤ n  O(nlogn) rows [Talagrand `90, “Embedding subspaces of L 1 into L N 1 ” ] can be analyzed as row-sampling / sparsification

[COHEN-P `14] Update w on LHS with w on RHS w ’ i  ( a i T ( A T W -1 A ) -1 a i ) 1/2 qPrevious # of rowsNew # RowsRuntime 1n 2.5 nlognnnz+n ω+θ 1 < q < 2n q/2+2 nlogn(loglogn) 2 nnz+n ω+θ 2 < qn q+1 n p/2 lognnnz+n q/2+O(1) Converges in loglogn steps: analyze A T W -1 A spectrally Aside: similar to iterative reweighted least squares Elementary, optimization motivated proof of w.h.p. concentration for L 1

SUMMARY Algorithm design approach: study problems at the intersection of areas, and extend insights back. Can sparsify objects via importance sampling. Solve Lx = b efficiently via sparsified squaring. Approximate maximum flow routines and cut- approximators can be constructed recursively from each other via graph sparsifiers. Wider ranges of structures can be sparsified, key statistical quantities can be computed iteratively.

I’VE ALSO WORKED ON Dynamic graph data structures Graph partitioning Parallel algorithms Image processing Anomaly / sybil detection in graphs

FUTURE WORK: LINEAR SYSTEM SOLVERS Wider classes of linear systems Relation to optimization / learning Combinatorics / Discrete algorithms Numerical analysis / Optimization Statistics / Randomized algorithms Solvers for linear systems involving graph Laplacians

FUTURE WORK: COMBINATORIAL OPTIMIZATION Faster algorithms for more classical algorithmic graph theory problems? Combinatorics / Discrete algorithms Numerical analysis / Optimization Statistics / Randomized algorithms Maximum flow

FUTURE WORK: RANDOMIZED NUMERICAL LINEAR ALGEBRA Other algorithmic applications of Lewis weights? Low-rank approximation in L 1 ? O(n)-sized L 1 -preserving row samples? (these exist for L 2 ) Combinatorics / Discrete algorithms Numerical analysis / Optimization Statistics / Randomized algorithms L 1 -preserving row sampling B B’

SUMMARY Combinatorics / Discrete algorithms Numerical analysis / Optimization Statistics / Randomized algorithms Problems at their intersection B B’ Links to arXiv manuscripts and videos of more detailed talks are at: math.mit.edu/~rpeng/