Download presentation

Presentation is loading. Please wait.

Published byDesirae Lucia Modified over 2 years ago

1
Fast Regression Algorithms Using Spectral Graph Theory Richard Peng

2
OUTLINE Regression: why and how Spectra: fast solvers Graphs: tree embeddings

3
LEARNING / INFERENCE Find (hidden) pattern in (noisy) data Output:Input signal, s:

4
REGRESSION p ≥ 1: convex Convex constraints e.g. linear equalities Mininimize: |x| p Subject to: constraints on x minimize

5
APPLICATION 0: LASSO Widely used in practice: Structured output Robust to noise [Tibshirani `96]: Min |x| 1 s.t. A x = s AxAx

6
APPLICATION 1: IMAGES No bears were harmed in the making of these slides Poisson image processing MinΣ i~j ∈ E (x i -x j -s i~j ) 2

7
APPLICATION 2: MIN CUT Remove fewest edges to separate vertices s and t MinΣ ij ∈ E |x i -x j | s.t. x s =0, x t =1 s t 010 00 1 11 Fractional solution = integral solution

8
REGRESSION ALGORITHMS Convex optimization 1940~1960: simplex, tractable 1960~1980: ellipsoid, poly time 1980~2000: interior point, efficient Õ(m 1/2 ) interior steps m = # non-zeros Õ hides log factors minimize

9
EFFICIENCY MATTERS m > 10 6 for most images Even bigger (10 9 ): Videos 3D medical data

10
Õ(m 1/2 ) KEY SUBROUTINE Each step of interior point algorithms finds a step direction minimize Linear system solves

11
MORE REASONS FOR FAST SOLVERS [Boyd-Vanderberghe `04], Figure 11.20 : The growth in the average number of Newton iterations (on randomly generated SDPs)… is very small

12
LINEAR SYSTEM SOLVERS [1 st century CE] Gaussian Elimination: O(m 3 ) [Strassen `69] O(m 2.8 ) [Coppersmith-Winograd `90] O(m 2.3755 ) [Stothers `10] O(m 2.3737 ) [Vassilevska Williams`11] O(m 2.3727 ) Total: > m 2

13
NOT FAST NOT USED: Preferred in practice: coordinate descent, subgradient methods Solution quality traded for time

14
FAST GRAPH BASED L 2 REGRESSION [SPIELMAN-TENG ‘04] Input : Linear system where A is related to graphs, b Output : Solution to A x=b Runtime : Nearly Linear, Õ(m)

15
GRAPHS USING ALGEBRA Fast convergence + Low cost per step = state of the art algorithms

16
LAPLACIAN PARADIGM [Daitch-Spielman `08] : mincost fow [Christiano-Kelner-Mądry-Spielman-Teng `11] : approx maximum flow /min cut

17
EXTENSION 1 [Chin-Mądry-Miller-P `12]: regression, image processing, grouped L 2

18
EXTENSION 2 [Kelner-Miller-P `12] : k-commodity flow Dual: k-variate labeling of graphs s t

19
EXTENSION 3 [Miller-P `13] : faster for structured images / separable graphs

20
NEED: FAST LINEAR SYSTEM SOLVERS Implication of fast solvers: Fast regression routines Parallel, work efficient graph algorithms minimize

21
OTHER APPLICATIONS [Tutte `66] : planar embedding [Boman-Hendrickson-Vavasis`04] : PDEs [Orecchia-Sachedeva-Vishnoi`12] : balanced cut / graph separator

22
OUTLINE Regression: why and how Spectra: Linear system solvers Graphs: tree embeddings

23
PROBLEM Given: matrix A, vector b Size of A : n-by-n m non-zeros

24
SPECIAL STRUCTURE OF A A = Deg – Adj Deg : diag(degree) Adj : adjacency matrix [Gremban-Miller `96]: extensions to SDD matrices ` A ij =deg(i) if i=j w(ij) otherwise

25
UNSTRUCTURED GRAPHS Social network Intermediate systems of other algorithms are almost adversarial

26
NEARLY LINEAR TIME SOLVERS [SPIELMAN-TENG ‘04] Input : n by n graph Laplacian A with m non-zeros, vector b Where : b = A x for some x Output : Approximate solution x’ s.t. |x-x’| A <ε|x| A Runtime : Nearly Linear. O(m log c n log(1/ε)) expected runtime is cost per bit of accuracy. Error in the A -norm: |y| A =√y T A y.

27
HOW MANY LOGS Runtime : O(mlog c n log(1/ ε)) Value of c: I don’t know [Spielman]: c≤70 [Koutis]: c≤15 [Miller]: c≤32 [Teng]: c≤12 [Orecchia]: c≤6 When n = 10 6, log 6 n > 10 6

28
PRACTICAL NEARLY LINEAR TIME SOLVERS [KOUTIS-MILLER-P `10] Input : n by n graph Laplacian A with m non-zeros, vector b Where : b = A x for some x Output : Approximate solution x’ s.t. |x-x’| A <ε|x| A Runtime : O(mlog 2 n log(1/ ε)) runtime is cost per bit of accuracy. Error in the A -norm: |y| A =√y T A y.

29
PRACTICAL NEARLY LINEAR TIME SOLVERS [KOUTIS-MILLER-P `11] Input : n by n graph Laplacian A with m non-zeros, vector b Where : b = A x for some x Output : Approximate solution x’ s.t. |x-x’| A <ε|x| A Runtime : O(mlogn log(1/ ε)) runtime is cost per bit of accuracy. Error in the A -norm: |y| A =√y T A y.

30
STAGES OF THE SOLVER Iterative Methods Spectral Sparsifiers Low Stretch Spanning Trees

31
ITERATIVE METHODS Numerical analysis: Can solve systems in A by iteratively solving spectrally similar, but easier, B

32
WHAT IS SPECTRALLY SIMILAR? A ≺ B ≺ k A for some small k Ideas from scalars hold! A ≺ B : for any vector x, |x| A 2 < |x| B 2 [Vaidya `91] : Since A is a graph, B should be too! [Vaidya `91] : Since G is a graph, H should be too!

33
`EASIER’ H Goal: H with fewer edges that’s similar to G Ways of easier: Fewer vertices Fewer edges Can reduce vertex count if edge count is small

34
GRAPH SPARSIFIERS Sparse equivalents of graphs that preserve something Spanners: distance, diameter. Cut sparsifier: all cuts. What we need: spectrum

35
WHAT WE NEED: ULTRASPARSIFIERS [Spielman-Teng `04] : ultrasparsifiers with n- 1+O(mlog p n/k) edges imply solvers with O(mlog p n) running time. Given: G with n vertices, m edges parameter k Output: H with n vertices, n-1+O(mlog p n/k) edges Goal: G ≺ H ≺ kG ``

36
EXAMPLE: COMPLETE GRAPH O(nlogn) random edges (with scaling) suffice w.h.p.

37
GENERAL GRAPH SAMPLING MECHANISM For edge e, flip coin Pr(keep) = P(e) Rescale to maintain expectation Number of edges kept: ∑ e P(e) Also need to prove concentration

38
EFFECTIVE RESISTANCE View the graph as a circuit R(u,v) = Pass 1 unit of current from u to v, measure resistance of circuit `

39
EE101 Effective resistance in general: solve G x = e uv, where e uv is indicator vector, R(u,v) = x u – x v. `

40
(REMEDIAL?) EE101 Single edge: R(e) = 1/w(e) Series: R(u, v) = R(e 1 ) + … + R(e l ) ` w1w1 ` uv uv w1w1 w2w2 R(u, v) = 1/w 1 R(u, v) = 1/w 1 + 1/w 2

41
SPECTRAL SPARSIFICATION BY EFFECTIVE RESISTANCE [Spielman-Srivastava `08] : Setting P(e) to W(e)R(u,v)O(logn) gives G ≺ H ≺ 2G* *Ignoring probabilistic issues [Foster `49] : ∑ e W(e)R(e) = n-1 Spectral sparsifier with O(nlogn) edges Ultrasparsifier? Solver???

42
THE CHICKEN AND EGG PROBLEM How to find effective resistance? [Spielman-Srivastava `08] : use solver [Spielman-Teng `04] : need sparsifier

43
OUR WORK AROUND Use upper bounds of effective resistance, R’(u,v) Modify the problem

44
RAYLEIGH’S MONOTONICITY LAW Rayleigh’s Monotonicity Law: R(u, v) only increase when edges are removed ` Calculate effective resistance w.r.t. a tree T

45
SAMPLING PROBABILITIES ACCORDING TO TREE Sample Probability: edge weight times effective resistance of tree path ` Goal: small total stretch stretch

46
GOOD TREES EXIST Every graph has a spanning tree with total stretch O(mlogn) O(mlog 2 n) edges, too many! ∑ e W(e)R’(e) = O(mlogn) Hiding loglogn

47
‘GOOD’ TREE??? Unit weight case: stretch ≥ 1 for all edges ` Stretch = 1+1 = 2

48
WHAT ARE WE MISSING? Need: G ≺ H ≺ k G n-1+O(mlog p n/k) edges Generated: G ≺ H ≺ 2 G n-1+O(mlog 2 n) edges `` Haven’t used k!

49
USE K, SOMEHOW Tree is good! Increase weights of tree edges by factor of k ` G ≺ G’ ≺ k G

50
RESULT Tree heavier by factor of k Tree effective resistance decrease by factor of k ` Stretch = 1/k+1/k = 2/k

51
NOW SAMPLE? Expected in H: Tree edges: n-1 Off tree edges: O(mlog 2 n/k) ` Total: n- 1+O(mlog 2 n/k)

52
BUT WE CHANGED G! G ≺ G’ ≺ k G G’ ≺ H ≺ 2 G’ ` G ≺ H ≺ 2k G

53
WHAT WE NEED: ULTRASPARSIFIERS [Spielman-Teng `04] : ultrasparsifiers with n-1+O(mlog p n/k) edges imply solvers with O(mlog p n) running time. Given: G with n vertices, m edges parameter k Output: H with n vertices, n-1+O(mlog p n/k) edges Goal: G ≺ H ≺ kG `` G ≺ H ≺ 2k G n-1+O(mlog 2 n/k) edges

54
Input: Graph Laplacian G Compute low stretch tree T of G T ( log 2 n) T H G + T H Sample T (H) Solve G by iterating on H and solving recursively, but reuse T PSEUDOCODE OF O(MLOGN) SOLVER

55
EXTENSIONS / GENERALIZATIONS [Koutis-Levin-P `12] : sparsify mildly dense graphs in O(m) time [Miller-P `12] : general matrices: find ‘simpler’ matrix that’s similar in O(m+n 2.38+a ) time. ``

56
SUMMARY OF SOLVERS Spectral graph theory allows one to find similar, easier to solve graphs Backbone: good trees ``

57
SOLVERS USING GRAPH THEORY Fast solvers for graph Laplacians use combinatorial graph theory

58
OUTLINE Regression: why and how Spectra: linear system solvers Graphs: tree embeddings

59
LOW STRETCH SPANNING TREE Sampling probability: edge weight times effective resistance of tree path Unit weight case: length of tree path Low stretch spanning tree: small total stretch

60
DIFFERENT THAN USUAL TREES n 1/2 -by-n 1/2 unit weighted mesh stretch(e)= O(1)total stretch = Ω(n 3/2 )stretch(e)=Ω(n 1/2 ) ‘haircomb’ is both shortest path and max weight spanning tree

61
A BETTER TREE FOR THE GRID Recursive ‘C’

62
LOW STRETCH SPANNING TREES [Elkin-Emek-Spielman-Teng `05], [Abraham-Bartal-Neiman `08]: Any graph has a spanning tree with total stretch O(mlogn) Hiding loglogn

63
ISSUE: RUNNING TIME Algorithms given by [Elkin-Emek-Spielman-Teng `05], [Abraham-Bartal-Neiman `08] take O(nlog 2 n+mlogn) time Reason: O(logn) shortest paths

64
SPEED UP [Koutis-Miller-P `11] : Round edge weights to powers of 2 k=logn, total work = O(mlogn) [Orlin-Madduri-Subramani-Williamson `10]: Shortest path on graphs with k distinct weights can run in O(mlog m/n k) time Hiding loglogn, we actually improve these

65
[Blelloch-Gupta-Koutis-Miller-P- Tangwongsan. `11] : current framework parallelizes to O(m 1/3+a ) depth Combine with Laplacian paradigm fast parallel graph algorithms `` PARALLEL ALGORITHM?

66
Before this work: parallel time > state of the art sequential time Our result: parallel work close to sequential, and O(m 2/3 ) time PARALLEL GRAPH ALGORITHMS?

67
FUNDAMENTAL PROBLEM Long standing open problem: theoretical speedups for BFS / shortest path in directed graphs Sequential algorithms are too fast!

68
First step of framework by [Elkin-Emek-Spielman-Teng `05] : `` PARALLEL ALGORITHM? shortest path

69
Workaround: use earlier algorithm by [Alon-Karp-Peleg-West `95] Idea: repeated clustering Based on ideas from [Cohen `93, `00] for approximating shortest path PARALLEL TREE EMBEDDING

71
THE BIG PICTURE Need fast linear system solvers for graph regression Need combinatorial graph algorithms for fast solvers minimize

72
ONGOING / FUTURE WORK Better regression? Faster/parallel solver? Sparse approximate (pseudo) inverse? Other types of systems?

73
THANK YOU! Questions?

Similar presentations

OK

Satyen Kale (Yahoo! Research) Joint work with Sanjeev Arora (Princeton)

Satyen Kale (Yahoo! Research) Joint work with Sanjeev Arora (Princeton)

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on pre-ignition procedure One act play ppt on ipad Flat panel display ppt online Ppt on 1200 kv ac transmission line Ppt on boilers operations manager Ppt on bio battery download Ppt on sound navigation and ranging system mechanic Pps to ppt online converter Ppt on network security threats Ppt on sports day ideas