Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sampling: an Algorithmic Perspective Richard Peng M.I.T.

Similar presentations


Presentation on theme: "Sampling: an Algorithmic Perspective Richard Peng M.I.T."— Presentation transcript:

1 Sampling: an Algorithmic Perspective Richard Peng M.I.T.

2 OUTLINE Structure preserving sampling Sampling as a recursive ‘driver’ Sampling the inaccessible What can sampling preserve?

3 RANDOM SAMPLING Collection of many objects Pick a small subset of them Goal: Estimate quantities Small approximates Use in algorithms

4 SAMPLING CAN APPROXIMATE Point sets Matrices Graphs Gradients

5 PRESERVING GRAPH STRUCTURES Undirected graph, n vertices, m < n 2 edges Is n 2 edges (dense) sometimes necessary? For some information, e.g. connectivity: encoded by spanning forest, < n edges Deterministic, O(m) time algorithm : questions

6 MORE INTRICATE STRUCTURES k-connectivity: # of disjoint paths between s-t [Benczur-Karger `96]: for ANY G, can sample to get H with O(nlogn) edges s.t. G ≈ H on all cuts Stronger: weights of all 2 n cuts in graphs Cut: # of edges leaving a subset of vertices s t Menger’s theorem / maxflow- mincut : previous works ≈: multiplicative approximation

7 n n MORE GENERAL: ROW SAMPLING A’ A L 2 Row sampling: Given A with m>>n, sample a few rows to form A ’ s.t.║ Ax ║ 2 ≈║ A’x ║ 2 ∀ x m 0 -1 0 0 0 1 0 0 -5 0 0 0 5 0 ≈n ║ Ax ║ p : finite dimensional Banach space Sampling: embedding Banach spaces e.g. [BLM `89], [Talagrand `90]

8 HOW TO SAMPLE? Widely used: uniform sampling Works well when data is uniform e.g. complete graph Problem: long path, removing any edge changes connectivity (can also have both in one graph) More systematic view of sampling?

9 SPECTRAL SPARSIFICATION VIA EFFECTIVE RESISTANCE [Spielman-Srivastava `08]: suffices to sample with probabilities at least O(logn) times weight times effective resistance Effective resistance: commute time / m Statistical leverage score in unweighted graphs

10 L 2 MATRIX-CHERNOFF BOUNDS [Foster `49] Σ i τ i = rank ≤ n  O( nlogn) rows [Rudelson, Vershynin `07], [Tropp `12]: sampling with p i ≥ τ i O( logn) gives B’ s.t. ║ Bx ║ 2 ≈║ B’x ║ 2 ∀ x w.h.p. τ : L 2 statistical leverage scores τ i = b i T ( B T B ) -1 b i = ║ b i ║ 2 L -1 Near optimal: L 2 -row samples of B Graph sparsifiers In practice O(logn)  5 usually suffices can also improve via derandomization

11 THE `RIGHT’ PROBABILITIES Only one non-zero rowColumn with one entry 0010000100 n/m 1 Path + clique: 1 1/n τ : L 2 statistical leverage scores τ i = b i T ( B T B ) -1 b i = ║ b i ║ 2 L -1 Any good upper bounds to τ i lead to size reductions

12 OUTLINE Structure preserving sampling Sampling as a recursive ‘driver’ Sampling the inaccessible What can sampling preserve?

13 ALGORITHMIC TEMPLATES W-cycle: T(m) = 2T(m/2) + O(m) V-cycle: T(m) = T(m/2) + O(m) Instances: Sorting FFT Voronoi / Delaunay Instances: Selection Parallel indep. Set Routing

14 Difficulty: Exists many non-separable graphs Easy to compose hard instances EFFICIENT GRAPH ALGORITHMS Partition via separators

15 SIZE REDUCTION Ultra-sparsifier: for any k, can find H ≈ k G that’s tree + O(mlog c n/k) edges ` ` e.g. [Koutis-Miller-P `10]: obtain crude estimates on τ i via a tree H equivalent to graph of size O(mlog c n/k) Picking k > log c n gives reductions : my results

16 INSTANCE: Lx = b Input : graph Laplacian L, vector b Output : x ≈ ε L + b Runtimes [KMP `10, `11]: O(mlogn) work, O(m 1/3 ) depth [CKPPR`14, CMPPX`14]: O(mlog 1/2 n) work, O(m 1/3 ) depth Note: L + : pseudo-inverse Approximate solution Omitting log(1/ε) + recursive Chebyshev iteration: T(m) = k 1/2 (T(mlog c n/k) + O(m))

17 INSTANCE: INPUT-SPARSITY TIME NUMERICAL ALGORITHMS Similar: Nystrom method sample post-process [Li-Miller-P 13]: Create smaller approximation Recurse on it Bring solution back

18 INSTANCE: APPROX MAXFLOW Absorb additional (small) error via more calls to approximator Recurse on instances with smaller total size, total cost: O(mlog c n) [P`14]: build approximator on the smaller graph [Racke-Shah-Taubig `14] good approximator by solving maxflows [Sherman `13] [KLOS `14]: structure approximators  fast maxflow routines

19 OUTLINE Structure preserving sampling Sampling as a recursive ‘driver’ Sampling the inaccessible What can sampling preserve?

20 DENSE OBJECTS Matrix inverse Schur complement K-step random walks Cost-prohibitive to store Application of separators Directly access sparse approximates?

21 TWO STEP RANDOM WALKS A : step of random walk Still a graph, can sparsify! A 2 : 2 step random walk

22 WHAT THIS ENABLED [P-Spielman `14] use this to approximate ( I – A ) -1 = ( I + A ) ( I + A 2 ) ( I + A 4 )… Similar to multi-level methods Skipping: control / propagation of error Combining known tools: efficiently sparsify I – A 2 without computing A 2 [Cheng-Cheng-Liu-P-Teng `15]: sparsified Newton’s method for matrix roots and Gaussian sampling

23 MATRIX SQUARING ConnectivityMore general Iteration A i+1 ≈ A i 2 I - A i+1 ≈ I - A i 2 Until ║ A d ║ small Size ReductionLow degreeSparse graph MethodDerandomizedRandomized Solution transferConnectivitySolution vectors NC algorithm for shortest path Logspace connectivity: [Reingold `02] Deterministic squaring: [Rozenman-Vadhan `05]

24 LONGER RANDOM WALKS A : one step of random walk A 3 : 3 steps of random walk (part of) edge uv in A 3 Length 3 path in A : u-y-z-v

25 PSEUDOCODE Repeat O(cmlognε -2 ) times: 1.Uniformly randomly pick 1 ≤ k ≤ c and edge e = uv 2.Perform (k -1)-step random walk from u. 3.Perform (r - k)-step random walk from v. 4.Add a scaled copy of the edge to the sparsifier Resembles: Local clustering Approximate triangle counting (c = 3) [Cheng-Cheng-Liu-P-Teng `15]: combine this with repeated squaring to approximate any random walk polynomial in nearyl-linear time.

26 GAUSSIAN ELIMINATION [Lee-P-Spielman, in progress] approximate such circuits in O(mlog c n) time Partial state of Gaussian elimination: linear system on a subset of variables Graph theoretic interpretation: equivalent circuit on boundaries, Y-Δ transform

27 WHAT THIS ENABLES [Lee-P-Spielman, in progress] O(n) time approximate Cholesky factorization for graph Laplacians [Lee-Sun, `15] constructible in nearly-linear work

28 OUTLINE Structure preserving sampling Sampling as a recursive ‘driver’ Sampling the inaccessible What can sampling preserve?

29 MORE GENERAL STRUCTURES Non-linear structures Directed constraints: Ax ≤ b

30 ║y║1║y║1 ║y║2║y║2 OTHER NORMS Generalization of row sampling: given A, q, find A ’ s.t.║ Ax ║ q ≈║ A’x ║ q ∀ x 1-norm: standard for representing cuts, used in sparse recovery / robust regression Applications (for general A ): Feature selection Low rank approximation / PCA q-norm: ║ y ║ q = (Σ| y i | q ) 1/q

31 L 1 ROW SAMPLING L 1 Lewis weights ([Lewis `78]): w s.t. w i 2 = a i T ( A T W -1 A ) -1 a i Recursive definition! [Sampling with p i ≥ w i O( logn) gives ║ Ax ║ 1 ≈ ║ A’x ║ 1 ∀ x Can check: Σ i w i ≤ n  O(nlogn) rows [Talagrand `90, “Embedding subspaces of L 1 into L N 1 ” ] can be analyzed as row-sampling / sparsification [Cohen-P `15] w ’ i  ( a i T ( A T W -1 A ) -1 a i ) 1/2 Converges in loglogn steps

32 WHERE THIS FITS IN #rows for q=2 #rows for q=1 Runtime Dasgupta et al. `09n 2.5 mn 5 Magdon-Ismail `10nlog 2 nmn 2 Sohler-Woodruff `11n 3.5 mn ω-1+θ Drineas et al. `12nlognmnlogn Clarkson et al. `12n 4.5 log 1.5 nmnlogn Clarkson-Woodruff `12n 2 lognn8n8 nnz Mahoney-Meng `12n2n2 n 3.5 nnz+n 6 Nelson-Nguyen `12n 1+θ nnz Li et.`13nlognn 3.66 nnz+n ω+θ Cohen et al. 14, Cohen-P `15 nlogn nnz+n ω+θ [Cohen-P `15] Elementary, optimization motivated proof of w.h.p. concentration for L 1

33 CONNECTION TO LEARNING THEORY Sparsely-used Dictionary Learning: given Y, find A, X so that ║ Y - AX ║ is small and X is sparse [Spielman-Wang-Wright `12]: L 1 regression solves this using about n 2 samples [Luh-Vu `15]: generic chaining: O(nlog 4 n) samples suffice Proof in [Cohen-P `15] gives O(nlog 2 n) samples Key: if X satisfies the Bernoulli-Subgaussian model, then ║ Xy ║ 1 is close to expectation for all y ‘Right’ bound should be O(nlogn)

34 UNSPARSIFIABLE INSTANCE Complete bipartite graph: Removing any edge u  v makes v unreaclable from u Preserve less structure?

35 WEAKER REQUIREMENT Sample only needs to make gains in some directions Q1Q1 P Q2Q2 [Cohen-Kyng-Pachocki-P-Rao `14]: point-wise convergence without matrix concentration

36 UNIFORM SAMPLING? Nystrom method (on matrices): Pick random subset of data Compute on subset Post-process result Post-processing: Theoretical works before us: copy x over Practical: projection, least-squares fitting [CLMMPS `15]: half the rows as A ’ gives good sampling probabilities for A that sum to ≤ 2n How powerful is (recursive) post-processing?

37 WHY IS THIS EFFECTIVE? Needle in a haystack: only d dimensions, can’t have too many, easy to find via post-process Hay in a haystack: half the data should still contain some info

38 FUTURE WORK More concretely: More sparsification based algorithms? E.g. multi-grid maxflow? Sampling directed graphs Hardness results? What structures can sampling preserve? What do sampling need to preserve?


Download ppt "Sampling: an Algorithmic Perspective Richard Peng M.I.T."

Similar presentations


Ads by Google