Presentation is loading. Please wait.

Presentation is loading. Please wait.

Fast Regression Algorithms Using Spectral Graph Theory Richard Peng.

Similar presentations


Presentation on theme: "Fast Regression Algorithms Using Spectral Graph Theory Richard Peng."— Presentation transcript:

1 Fast Regression Algorithms Using Spectral Graph Theory Richard Peng

2 OUTLINE Regression: why and how Spectra: fast solvers Graphs: tree embeddings

3 LEARNING / INFERENCE Find (hidden) pattern in (noisy) data Output:Input signal, s:

4 REGRESSION p ≥ 1: convex Convex constraints e.g. linear equalities Mininimize: |x| p Subject to: constraints on x minimize

5 APPLICATION 0: LASSO Widely used in practice: Structured output Robust to noise [Tibshirani `96]: Min |x| 1 s.t. A x = s AxAx

6 APPLICATION 1: IMAGES No bears were harmed in the making of these slides Poisson image processing MinΣ i~j ∈ E (x i -x j -s i~j ) 2

7 APPLICATION 2: MIN CUT Remove fewest edges to separate vertices s and t MinΣ ij ∈ E |x i -x j | s.t. x s =0, x t =1 s t 010 00 1 11 Fractional solution = integral solution

8 REGRESSION ALGORITHMS Convex optimization 1940~1960: simplex, tractable 1960~1980: ellipsoid, poly time 1980~2000: interior point, efficient Õ(m 1/2 ) interior steps m = # non-zeros Õ hides log factors minimize

9 EFFICIENCY MATTERS m > 10 6 for most images Even bigger (10 9 ): Videos 3D medical data

10 Õ(m 1/2 ) KEY SUBROUTINE Each step of interior point algorithms finds a step direction minimize Linear system solves

11 MORE REASONS FOR FAST SOLVERS [Boyd-Vanderberghe `04], Figure 11.20 : The growth in the average number of Newton iterations (on randomly generated SDPs)… is very small

12 LINEAR SYSTEM SOLVERS [1 st century CE] Gaussian Elimination: O(m 3 ) [Strassen `69] O(m 2.8 ) [Coppersmith-Winograd `90] O(m 2.3755 ) [Stothers `10] O(m 2.3737 ) [Vassilevska Williams`11] O(m 2.3727 ) Total: > m 2

13 NOT FAST  NOT USED: Preferred in practice: coordinate descent, subgradient methods Solution quality traded for time

14 FAST GRAPH BASED L 2 REGRESSION [SPIELMAN-TENG ‘04] Input : Linear system where A is related to graphs, b Output : Solution to A x=b Runtime : Nearly Linear, Õ(m)

15 GRAPHS USING ALGEBRA Fast convergence + Low cost per step = state of the art algorithms

16 LAPLACIAN PARADIGM [Daitch-Spielman `08] : mincost fow [Christiano-Kelner-Mądry-Spielman-Teng `11] : approx maximum flow /min cut

17 EXTENSION 1 [Chin-Mądry-Miller-P `12]: regression, image processing, grouped L 2

18 EXTENSION 2 [Kelner-Miller-P `12] : k-commodity flow Dual: k-variate labeling of graphs s t

19 EXTENSION 3 [Miller-P `13] : faster for structured images / separable graphs

20 NEED: FAST LINEAR SYSTEM SOLVERS Implication of fast solvers: Fast regression routines Parallel, work efficient graph algorithms minimize

21 OTHER APPLICATIONS [Tutte `66] : planar embedding [Boman-Hendrickson-Vavasis`04] : PDEs [Orecchia-Sachedeva-Vishnoi`12] : balanced cut / graph separator

22 OUTLINE Regression: why and how Spectra: Linear system solvers Graphs: tree embeddings

23 PROBLEM Given: matrix A, vector b Size of A : n-by-n m non-zeros

24 SPECIAL STRUCTURE OF A A = Deg – Adj Deg : diag(degree) Adj : adjacency matrix [Gremban-Miller `96]: extensions to SDD matrices ` A ij =deg(i) if i=j w(ij) otherwise

25 UNSTRUCTURED GRAPHS Social network Intermediate systems of other algorithms are almost adversarial

26 NEARLY LINEAR TIME SOLVERS [SPIELMAN-TENG ‘04] Input : n by n graph Laplacian A with m non-zeros, vector b Where : b = A x for some x Output : Approximate solution x’ s.t. |x-x’| A <ε|x| A Runtime : Nearly Linear. O(m log c n log(1/ε)) expected runtime is cost per bit of accuracy. Error in the A -norm: |y| A =√y T A y.

27 HOW MANY LOGS Runtime : O(mlog c n log(1/ ε)) Value of c: I don’t know  [Spielman]: c≤70 [Koutis]: c≤15 [Miller]: c≤32 [Teng]: c≤12 [Orecchia]: c≤6 When n = 10 6, log 6 n > 10 6

28 PRACTICAL NEARLY LINEAR TIME SOLVERS [KOUTIS-MILLER-P `10] Input : n by n graph Laplacian A with m non-zeros, vector b Where : b = A x for some x Output : Approximate solution x’ s.t. |x-x’| A <ε|x| A Runtime : O(mlog 2 n log(1/ ε)) runtime is cost per bit of accuracy. Error in the A -norm: |y| A =√y T A y.

29 PRACTICAL NEARLY LINEAR TIME SOLVERS [KOUTIS-MILLER-P `11] Input : n by n graph Laplacian A with m non-zeros, vector b Where : b = A x for some x Output : Approximate solution x’ s.t. |x-x’| A <ε|x| A Runtime : O(mlogn log(1/ ε)) runtime is cost per bit of accuracy. Error in the A -norm: |y| A =√y T A y.

30 STAGES OF THE SOLVER Iterative Methods Spectral Sparsifiers Low Stretch Spanning Trees

31 ITERATIVE METHODS Numerical analysis: Can solve systems in A by iteratively solving spectrally similar, but easier, B

32 WHAT IS SPECTRALLY SIMILAR? A ≺ B ≺ k A for some small k Ideas from scalars hold! A ≺ B : for any vector x, |x| A 2 < |x| B 2 [Vaidya `91] : Since A is a graph, B should be too! [Vaidya `91] : Since G is a graph, H should be too!

33 `EASIER’ H Goal: H with fewer edges that’s similar to G Ways of easier: Fewer vertices Fewer edges Can reduce vertex count if edge count is small

34 GRAPH SPARSIFIERS Sparse equivalents of graphs that preserve something Spanners: distance, diameter. Cut sparsifier: all cuts. What we need: spectrum

35 WHAT WE NEED: ULTRASPARSIFIERS [Spielman-Teng `04] : ultrasparsifiers with n- 1+O(mlog p n/k) edges imply solvers with O(mlog p n) running time. Given: G with n vertices, m edges parameter k Output: H with n vertices, n-1+O(mlog p n/k) edges Goal: G ≺ H ≺ kG ``

36 EXAMPLE: COMPLETE GRAPH O(nlogn) random edges (with scaling) suffice w.h.p.

37 GENERAL GRAPH SAMPLING MECHANISM For edge e, flip coin Pr(keep) = P(e) Rescale to maintain expectation Number of edges kept: ∑ e P(e) Also need to prove concentration

38 EFFECTIVE RESISTANCE View the graph as a circuit R(u,v) = Pass 1 unit of current from u to v, measure resistance of circuit `

39 EE101 Effective resistance in general: solve G x = e uv, where e uv is indicator vector, R(u,v) = x u – x v. `

40 (REMEDIAL?) EE101 Single edge: R(e) = 1/w(e) Series: R(u, v) = R(e 1 ) + … + R(e l ) ` w1w1 ` uv uv w1w1 w2w2 R(u, v) = 1/w 1 R(u, v) = 1/w 1 + 1/w 2

41 SPECTRAL SPARSIFICATION BY EFFECTIVE RESISTANCE [Spielman-Srivastava `08] : Setting P(e) to W(e)R(u,v)O(logn) gives G ≺ H ≺ 2G* *Ignoring probabilistic issues [Foster `49] : ∑ e W(e)R(e) = n-1 Spectral sparsifier with O(nlogn) edges Ultrasparsifier? Solver???

42 THE CHICKEN AND EGG PROBLEM How to find effective resistance? [Spielman-Srivastava `08] : use solver [Spielman-Teng `04] : need sparsifier

43 OUR WORK AROUND Use upper bounds of effective resistance, R’(u,v) Modify the problem

44 RAYLEIGH’S MONOTONICITY LAW Rayleigh’s Monotonicity Law: R(u, v) only increase when edges are removed ` Calculate effective resistance w.r.t. a tree T

45 SAMPLING PROBABILITIES ACCORDING TO TREE Sample Probability: edge weight times effective resistance of tree path ` Goal: small total stretch stretch

46 GOOD TREES EXIST Every graph has a spanning tree with total stretch O(mlogn) O(mlog 2 n) edges, too many! ∑ e W(e)R’(e) = O(mlogn) Hiding loglogn

47 ‘GOOD’ TREE??? Unit weight case: stretch ≥ 1 for all edges ` Stretch = 1+1 = 2

48 WHAT ARE WE MISSING? Need: G ≺ H ≺ k G n-1+O(mlog p n/k) edges Generated: G ≺ H ≺ 2 G n-1+O(mlog 2 n) edges `` Haven’t used k! 

49 USE K, SOMEHOW Tree is good! Increase weights of tree edges by factor of k ` G ≺ G’ ≺ k G

50 RESULT Tree heavier by factor of k Tree effective resistance decrease by factor of k ` Stretch = 1/k+1/k = 2/k

51 NOW SAMPLE? Expected in H: Tree edges: n-1 Off tree edges: O(mlog 2 n/k) ` Total: n- 1+O(mlog 2 n/k)

52 BUT WE CHANGED G! G ≺ G’ ≺ k G G’ ≺ H ≺ 2 G’ ` G ≺ H ≺ 2k G

53 WHAT WE NEED: ULTRASPARSIFIERS [Spielman-Teng `04] : ultrasparsifiers with n-1+O(mlog p n/k) edges imply solvers with O(mlog p n) running time. Given: G with n vertices, m edges parameter k Output: H with n vertices, n-1+O(mlog p n/k) edges Goal: G ≺ H ≺ kG `` G ≺ H ≺ 2k G n-1+O(mlog 2 n/k) edges

54 Input: Graph Laplacian G Compute low stretch tree T of G T  ( log 2 n) T H  G + T H  Sample T (H) Solve G by iterating on H and solving recursively, but reuse T PSEUDOCODE OF O(MLOGN) SOLVER

55 EXTENSIONS / GENERALIZATIONS [Koutis-Levin-P `12] : sparsify mildly dense graphs in O(m) time [Miller-P `12] : general matrices: find ‘simpler’ matrix that’s similar in O(m+n 2.38+a ) time. ``

56 SUMMARY OF SOLVERS Spectral graph theory allows one to find similar, easier to solve graphs Backbone: good trees ``

57 SOLVERS USING GRAPH THEORY Fast solvers for graph Laplacians use combinatorial graph theory

58 OUTLINE Regression: why and how Spectra: linear system solvers Graphs: tree embeddings

59 LOW STRETCH SPANNING TREE Sampling probability: edge weight times effective resistance of tree path Unit weight case: length of tree path Low stretch spanning tree: small total stretch

60 DIFFERENT THAN USUAL TREES n 1/2 -by-n 1/2 unit weighted mesh stretch(e)= O(1)total stretch = Ω(n 3/2 )stretch(e)=Ω(n 1/2 ) ‘haircomb’ is both shortest path and max weight spanning tree

61 A BETTER TREE FOR THE GRID Recursive ‘C’

62 LOW STRETCH SPANNING TREES [Elkin-Emek-Spielman-Teng `05], [Abraham-Bartal-Neiman `08]: Any graph has a spanning tree with total stretch O(mlogn) Hiding loglogn

63 ISSUE: RUNNING TIME Algorithms given by [Elkin-Emek-Spielman-Teng `05], [Abraham-Bartal-Neiman `08] take O(nlog 2 n+mlogn) time Reason: O(logn) shortest paths

64 SPEED UP [Koutis-Miller-P `11] : Round edge weights to powers of 2 k=logn, total work = O(mlogn) [Orlin-Madduri-Subramani-Williamson `10]: Shortest path on graphs with k distinct weights can run in O(mlog m/n k) time Hiding loglogn, we actually improve these

65 [Blelloch-Gupta-Koutis-Miller-P- Tangwongsan. `11] : current framework parallelizes to O(m 1/3+a ) depth Combine with Laplacian paradigm  fast parallel graph algorithms `` PARALLEL ALGORITHM?

66 Before this work: parallel time > state of the art sequential time Our result: parallel work close to sequential, and O(m 2/3 ) time PARALLEL GRAPH ALGORITHMS?

67 FUNDAMENTAL PROBLEM Long standing open problem: theoretical speedups for BFS / shortest path in directed graphs Sequential algorithms are too fast!

68 First step of framework by [Elkin-Emek-Spielman-Teng `05] : `` PARALLEL ALGORITHM?  shortest path 

69 Workaround: use earlier algorithm by [Alon-Karp-Peleg-West `95] Idea: repeated clustering Based on ideas from [Cohen `93, `00] for approximating shortest path PARALLEL TREE EMBEDDING

70

71 THE BIG PICTURE Need fast linear system solvers for graph regression Need combinatorial graph algorithms for fast solvers minimize

72 ONGOING / FUTURE WORK Better regression? Faster/parallel solver? Sparse approximate (pseudo) inverse? Other types of systems?

73 THANK YOU! Questions?


Download ppt "Fast Regression Algorithms Using Spectral Graph Theory Richard Peng."

Similar presentations


Ads by Google