Multifaceted Algorithm Design Richard Peng M.I.T..

Multifaceted Algorithm Design Richard Peng M.I.T.

LARGE SCALE PROBLEMS Emphasis on efficient algorithms in: Scientific computing Graph theory (randomized) numerical routines Network Analysis Physical Simulation Optimization

WELL STUDIED QUESTIONS Scientific computing: fast solvers for structured linear systems Graphs / combinatorics: network flow problems Randomized algorithms: subsampling matrices and optimization formulations B B’

MY REPRESENTATIVE RESULTS B B’ Current fastest sequential and parallel solvers for linear systems in graph Laplacians matrices First nearly-linear time algorithm for approximate undirected maxflow First near-optimal routine for row sampling matrices in a 1-norm preserving manner

RECURRING IDEAS Can solve a problem by iteratively solving several similar instances Approximations lead to better approximations Larger problems can be approximated by smaller ones Approximator Data

MY APPROACH TO ALGORITHM DESIGN Numerical analysis / Optimization Statistics / Randomized algorithms Problems at their intersection Identify problems that arise at the intersection of multiple areas and study them from multiple angles Combinatorics / Discrete algorithms This talk: structure- preserving sampling

SAMPLING Classical use in statistics: Extract info from a large data set Directly output result (estimator) Sampling from matrices, networks, and optimization problems: Often compute on the sample Need to preserve more structure B B’

PRESERVING GRAPH STRUCTURES Undirected graph, n vertices, m < n 2 edges Is n 2 edges (dense) sometimes necessary? For some information, e.g. connectivity: encoded by spanning forest, < n edges Deterministic, O(m) time algorithm : questions

MORE INTRICATE STRUCTURES k-connectivity: # of disjoint paths between s-t [Benczur-Karger `96]: for ANY G, can sample to get H with O(nlogn) edges s.t. G ≈ H on all cuts Stronger: weights of all 2 n cuts in graphs Cut: # of edges leaving a subset of vertices s t Menger’s theorem / maxflow- mincut : previous works ≈: multiplicative approximation

HOW TO SAMPLE? Widely used: uniform sampling Works well when data is uniform e.g. complete graph Problem: long path, removing any edge changes connectivity (can also have both in one graph) More systematic view of sampling?

ALGEBRAIC REPRESENTATION OF GRAPHS n rows / columns O(m) non-zeros 1 1 n vertices m edges graph Laplacian Matrix L Diagonal: degree Off-diagonal: -edge weights Edge-vertex incidence matrix: B eu =-1/1 if u is endpoint of e 0 otherwise m rows n columns L is the Gram matrix of B, L = B T B 2 -1 -1 -1 1 0 -1 0 1 1 -1 0 -1 0 1

x v =0 SPECTRAL SIMILARITY Numerical analysis: L G ≈ L H if x T L G x ≈ x T L H x for all vectors x x = {0, 1} V : G ≈ H on all cuts x u =1 x z =1 (1-0) 2 =1 (1-1) 2 =0 Gram matrix: L G = B G T B G  x T L G x =║ B G x ║ 2 2 B eu =-1/1 if u is endpoint of e 0 otherwise ║ B G x ║ 2 ≈║ B H x ║ 2 ∀ x ║ y i ║ 2 2 =Σ i y i 2 For edge e = uv, ( B e: x ) 2 = ( x u – x v ) 2 ║ B G x ║ 2 2 = size of cut given by x

n n ALGEBRAIC VIEW OF SAMPLING EDGES B’ B L 2 Row sampling: Given B with m>>n, sample a few rows to form B ’ s.t.║ Bx ║ 2 ≈║ B’x ║ 2 ∀ x Note: normally use A instead of B, n and d instead of m and n m 0 -1 0 0 0 1 0 0 -5 0 0 0 5 0 ≈n

IMPORTANCE SAMPLING Issue: only one non-zero row Keep a row, b i, with probability p i, rescale if kept to maintain expectation Uniform sampling: p i = 1/k for a factor k size reduction norm sampling: p i =m/k║ b i ║ 2 2 / ║ B ║ F 2 Issue: column with one entry

THE `RIGHT’ PROBABILITIES Only one non-zero rowColumn with one entry 0010000100 n/m 1 Path + clique: 1 1/n b i : row i of B, L = B T B τ : L 2 statistical leverage scores τ i = b i T ( B T B ) -1 b i = ║ b i ║ 2 L -1

L 2 MATRIX-CHERNOFF BOUNDS [Foster `49] Σ i τ i = rank ≤ n  O( nlogn) rows [Rudelson, Vershynin `07], [Tropp `12]: sampling with p i ≥ τ i O( logn) gives B’ s.t. ║ Bx ║ 2 ≈║ B’x ║ 2 ∀ x w.h.p. τ : L 2 statistical leverage scores τ i = b i T ( B T B ) -1 b i = ║ b i ║ 2 L -1 Near optimal: L 2 -row samples of B Graph sparsifiers In practice O(logn)  5 usually suffices can also improve via derandomization

MY APPROACH TO ALGORITHM DESIGN Extend insights gained from studying problems at the intersection of multiple areas back to these areas Combinatorics / Discrete algorithms Numerical analysis / Optimization Statistics / Randomized algorithms Problems at their intersection Algorithmic extensions of structure-preserving sampling Maximum flow Solving linear systems Preserving L 1 -structures

SUMMARY Algorithm design approach: study problems at the intersection of areas, and extend insights back. Can sparsify objects via importance sampling.

Graph Laplacian Diagonal: degree Off-diagonal: - weight Combinatorics / Discrete algorithms Numerical analysis / Optimization Solvers for linear systems involving graph Laplacians Lx = b Current fastest sequential and parallel solvers for linear systems in graph Laplacians Application: estimate all τ i = ║ b i ║ 2 L -1 by solving O(logn) linear systems Directly related to: Elliptic problems SDD, M, and H-matrices Statistics / Randomized algorithms

ALGORITHMS FOR Lx = b Given any graph Laplacian L with n vertices and m edges, any vector b, find vector x s.t. Lx = b [Vaidya `89]: use graph theory! 2014: 1/2 loglog plot of c: 2011: 1 2010: 2 [Spielman-Teng `04]: O(mlog c n) [P-Spielman `14]: alternate, fully parallelizable approach : my results 2006: 32 2004: 70 2009: 15 2010: 6 : previous works : questions

ITERATIVE METHODS Division using multiplication I + A + A 2 + A 3 + …. = ( I – A ) -1 = L -1 Spectral theorem: can view as scalars Simplification: assume L = I – A, A : transition matrix of random walk Richardson iteration: truncate to i terms, Approximate x = ( I – A ) -1 b with x (i) = ( I + A + … A i ) b

RICHARDSON ITERATION #terms needed lower bounded by information propagation A diameter b Highly connected graphs: few terms ok bAbA2bA2b Need n matrix operations? Evaluation (Horner’s rule): ( I + A + A 2 ) b = A ( Ab + b ) + b i terms: x (0) = b, x (i + 1) = Ax (i) + b i matrix-vector multiplications Can interpret as gradient descent

( I – A ) -1 = I + A + A 2 + A 3 + …. = ( I + A ) ( I + A 2 ) ( I + A 4 )… DEGREE N  N OPERATIONS? Combinatorial view: A : step of random walk I – A 2 : Laplacian of the 2 step random walk Dense matrix! Repeated squaring: A 16 = (((( A 2 ) 2 ) 2 ) 2, 4 operations O(logn) terms ok Similar to multi-level methods Still a graph Laplacian! Can sparsify!

REPEATED SPARSE SQUARING Combining known tools: efficiently sparsify I – A 2 without computing A 2 ( I – A ) -1 = ( I + A ) ( I + A 2 ) ( I + A 4 )… [P-Spielman `14] approximate L -1 with O(logn) sparse matrices key ideas: modify factorization to allow gradual introduction and control of error

SUMMARY Algorithm design approach: study problems at the intersection of areas, and extend insights back. Can sparsify objects via importance sampling. Solve Lx=b efficiently via sparsified squaring.

FEW ITERATIONS OF Lx = b [Tutte `61]: graph drawing, embeddings [ZGL `03], [ZHS `05]: inference on graphical models Inverse powering: eigenvectors / heat kernel: [AM `85] spectral clustering [OSV `12]: balanced cuts [SM `01][KMST `09]: image segmentation [CFMNPW`14]: Helmholtz decomp. on 3D mesh

MANY ITERATIONS OF Lx = b [Karmarkar, Ye, Renegar, Nesterov, Nemirovski …]: convex optimization via. solving O(m 1/2 ) linear systems [DS `08]: optimization on graphs  Laplacian systems [KM `09][MST`14]: random spanning trees [CKMST `11]: faster approx maximum flow [KMP `12]: multicommodity flow

MAXFLOW Combinatorics / Discrete algorithms Numerical analysis / Optimization Statistics / Randomized algorithms Maximum flow First O(mpolylog(n)) time algorithm for approximate undirected maxflow

(for unweighted, undirected graphs) MAXIMUM FLOW PROBLEM s t s t Given s, t, find the maximum number of disjoint s-t paths Dual: separate s and t by removing fewest edges Applications: Clustering Image processing Scheduling

WHAT MAKES MAXFLOW HARD Highly connected: route up to n paths Long paths: a step may involve n vertices Goal: handle both and do better than many steps × long paths = n 2 Each ‘easy’ on their own

ALGORITHMS FOR FLOWS Current fastest maxflow algorithms: Exact (weakly-polytime): invoke Lx = b Approximate: modify algorithms for Lx = b [P`14]: (1 – ε)-approx maxflow in O(mlog c nε -2 ) time Ideas introduced: 1980: dynamic trees 1970s: Blocking flows 1986: dual algorithms 1989: connections to Lx = b 2013: modify Lx = b 2010: few calls to Lx = b

Algebraic formulation of min s-t cut: Minimize ║ Bx ║ 2 subject to x s = 0, x t = 1 and x integral MAXIMUM FLOW IN ALMOST LINEAR TIME [Madry `10]: finding O(m 1+θ ) sized approximator that require O(m θ ) calls in O(m 1+θ ) time (for any θ > 0) Approximator Maxflow [Racke-Shah-Taubig `14] O(n) sized approximator that require O(log c n) iterations via solving maxflows on graphs of total size O(mlog c n) Maxflow Approximator O(m 1+2θ ε -2 ) timeO(mlog c nε -2 ) time? Algebraic formulation of min s-t cut: Minimize ║ Bx ║ 1 subject to x s = 0, x t = 1 ║ * ║ 1 : 1-norm, sum of absolute values [Sherman `13] [Kelner-Lee-Orecchia-Sidford `13]: can find approximate maxflow iteratively via several calls to a structure approximator

ALGORITHMIC SOLUTION Ultra-sparsifier (e.g. [Koutis-Miller-P `10]): for any k, can find H close to G, but equivalent to graph of size O(m/k) ` ` Maxflow Absorb additional (small) error via more calls to approximator Recurse on instances with smaller total size, total cost: O(mlog c n) Key step: vertex reductions via edge reductions[P`14]: build approximator on the smaller graph [CLMPPS`15]: extends to numerical data, has close connections to variants of Nystrom’s method

SUMMARY Algorithm design approach: study problems at the intersection of areas, and extend insights back. Can sparsify objects via importance sampling. Solve Lx = b efficiently via sparsified squaring. Approximate maximum flow routines and structure approximators can be constructed recursively from each other via graph sparsifiers.

RANDOMIZED NUMERICAL LINEAR ALGEBRA Combinatorics / Discrete algorithms Numerical analysis / Optimization Statistics / Randomized algorithms L 1 -preserving row sampling B B’ First near-optimal routine for row sampling matrices in a 1-norm preserving manner

║y║1║y║1 ║y║2║y║2 GENERALIZATION Generalization of row sampling: given A, q, find A ’ s.t.║ Ax ║ q ≈║ A’x ║ q ∀ x 1-norm: standard for representing cuts, used in sparse recovery / robust regression Applications (for general A ): Feature selection Low rank approximation / PCA q-norm: ║ y ║ q = (Σ| y i | q ) 1/q

Omitting corresponding empirical studies ROW SAMPLING ROUTINES #rows for q=2 #rows for q=1 Runtime Dasgupta et al. `09n 2.5 mn 5 Magdon-Ismail `10nlog 2 nmn 2 Sohler-Woodruff `11n 3.5 mn ω-1+θ Drineas et al. `12nlognmnlogn Clarkson et al. `12n 4.5 log 1.5 nmnlogn Clarkson-Woodruff `12n 2 lognn8n8 nnz Mahoney-Meng `12n2n2 n 3.5 nnz+n 6 Nelson-Nguyen `12n 1+θ nnz Li et.`13, Cohen et al. 14nlognn 3.66 nnz+n ω+θ [Naor `11][Matousek `97]: on graphs, L 2 approx  L q approx ∀ 1 ≤ q ≤ 2 How special are graphs? A ’ s.t.║ Ax ║ q ≈║ A’x ║ q ∀ x nnz: # of non-zeros in A How special is L 2 ?

L 1 ROW SAMPLING L 1 Lewis weights ([Lewis `78]): w s.t. w i 2 = a i T ( A T W -1 A ) -1 a i Recursive definition! [Sampling with p i ≥ w i O( logn) gives ║ Ax ║ 1 ≈ ║ A’x ║ 1 ∀ x Can check: Σ i w i ≤ n  O(nlogn) rows [Talagrand `90, “Embedding subspaces of L 1 into L N 1 ” ] can be analyzed as row-sampling / sparsification

[COHEN-P `14] Update w on LHS with w on RHS w ’ i  ( a i T ( A T W -1 A ) -1 a i ) 1/2 qPrevious # of rowsNew # RowsRuntime 1n 2.5 nlognnnz+n ω+θ 1 < q < 2n q/2+2 nlogn(loglogn) 2 nnz+n ω+θ 2 < qn q+1 n p/2 lognnnz+n q/2+O(1) Converges in loglogn steps: analyze A T W -1 A spectrally Aside: similar to iterative reweighted least squares Elementary, optimization motivated proof of w.h.p. concentration for L 1

SUMMARY Algorithm design approach: study problems at the intersection of areas, and extend insights back. Can sparsify objects via importance sampling. Solve Lx = b efficiently via sparsified squaring. Approximate maximum flow routines and cut- approximators can be constructed recursively from each other via graph sparsifiers. Wider ranges of structures can be sparsified, key statistical quantities can be computed iteratively.

I’VE ALSO WORKED ON Dynamic graph data structures Graph partitioning Parallel algorithms Image processing Anomaly / sybil detection in graphs

FUTURE WORK: LINEAR SYSTEM SOLVERS Wider classes of linear systems Relation to optimization / learning Combinatorics / Discrete algorithms Numerical analysis / Optimization Statistics / Randomized algorithms Solvers for linear systems involving graph Laplacians

FUTURE WORK: COMBINATORIAL OPTIMIZATION Faster algorithms for more classical algorithmic graph theory problems? Combinatorics / Discrete algorithms Numerical analysis / Optimization Statistics / Randomized algorithms Maximum flow

FUTURE WORK: RANDOMIZED NUMERICAL LINEAR ALGEBRA Other algorithmic applications of Lewis weights? Low-rank approximation in L 1 ? O(n)-sized L 1 -preserving row samples? (these exist for L 2 ) Combinatorics / Discrete algorithms Numerical analysis / Optimization Statistics / Randomized algorithms L 1 -preserving row sampling B B’

SUMMARY Combinatorics / Discrete algorithms Numerical analysis / Optimization Statistics / Randomized algorithms Problems at their intersection B B’ Links to arXiv manuscripts and videos of more detailed talks are at: math.mit.edu/~rpeng/

Multifaceted Algorithm Design Richard Peng M.I.T..

Similar presentations

Presentation on theme: "Multifaceted Algorithm Design Richard Peng M.I.T.."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Multifaceted Algorithm Design Richard Peng M.I.T..

Similar presentations

Presentation on theme: "Multifaceted Algorithm Design Richard Peng M.I.T.."— Presentation transcript:

Similar presentations

About project

Feedback