Sampling from Gaussian Graphical Models via Spectral Sparsification Richard Peng M.I.T. Joint work with Dehua Cheng, Yu Cheng, Yan Liu and Shanghua Teng.

Slides:

Advertisements

Similar presentations

1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.

Advertisements

Numerical Linear Algebra in the Streaming Model Ken Clarkson - IBM David Woodruff - IBM.

Sublinear Algorithms … Lecture 23: April 20.

Lecture 19: Parallel Algorithms

Uniform Sampling for Matrix Approximation Michael Cohen, Yin Tat Lee, Cameron Musco, Christopher Musco, Richard Peng, Aaron Sidford M.I.T.

Solving linear systems through nested dissection Noga Alon Tel Aviv University Raphael Yuster University of Haifa.

An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)

Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.

1 Parallel Algorithms II Topics: matrix and graph algorithms.

Solving Linear Systems (Numerical Recipes, Chap 2)

Iterative methods TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A A A A.

Lecture 17 Introduction to Eigenvalue Problems

SDD Solvers: Bridging theory and practice Yiannis Koutis University of Puerto Rico, Rio Piedras joint with Gary Miller, Richard Peng Carnegie Mellon University.

(Omer Reingold, 2005) Speaker: Roii Werner TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A AA A A A A AA A.

Graph Sparsifiers: A Survey Nick Harvey Based on work by: Batson, Benczur, de Carli Silva, Fung, Hariharan, Harvey, Karger, Panigrahi, Sato, Spielman,

Graph Sparsifiers: A Survey Nick Harvey UBC Based on work by: Batson, Benczur, de Carli Silva, Fung, Hariharan, Harvey, Karger, Panigrahi, Sato, Spielman,

Graph Sparsifiers by Edge-Connectivity and Random Spanning Trees Nick Harvey University of Waterloo Department of Combinatorics and Optimization Joint.

Graph Sparsifiers by Edge-Connectivity and Random Spanning Trees Nick Harvey U. Waterloo C&O Joint work with Isaac Fung TexPoint fonts used in EMF. Read.

Sampling: an Algorithmic Perspective Richard Peng M.I.T.

1 Lecture 25: Parallel Algorithms II Topics: matrix, graph, and sort algorithms Tuesday presentations:  Each group: 10 minutes  Describe the problem,

Lecture 21: Parallel Algorithms

EXPANDER GRAPHS Properties & Applications. Things to cover ! Definitions Properties Combinatorial, Spectral properties Constructions “Explicit” constructions.

1 Parallel Algorithms III Topics: graph and sort algorithms.

1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 6 May 7, 2006

Expanders Eliyahu Kiperwasser. What is it? Expanders are graphs with no small cuts. The later gives several unique traits to such graph, such as: – High.

Randomness in Computation and Communication Part 1: Randomized algorithms Lap Chi Lau CSE CUHK.

Undirected ST-Connectivity In Log Space

Accelerating Simulated Annealing for the Permanent and Combinatorial Counting Problems.

MATH 685/ CSI 700/ OR 682 Lecture Notes Lecture 6. Eigenvalue problems.

Yiannis Koutis , U of Puerto Rico, Rio Piedras

1 A fast algorithm for Maximum Subset Matching Noga Alon & Raphael Yuster.

Graph Sparsifiers Nick Harvey University of British Columbia Based on joint work with Isaac Fung, and independent work of Ramesh Hariharan & Debmalya Panigrahi.

Dense subgraphs of random graphs Uriel Feige Weizmann Institute.

Institute for Advanced Study, April Sushant Sachdeva Princeton University Joint work with Lorenzo Orecchia, Nisheeth K. Vishnoi Linear Time Graph.

1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.

Graph Algorithms. Definitions and Representation An undirected graph G is a pair (V,E), where V is a finite set of points called vertices and E is a finite.

Expanders via Random Spanning Trees R 許榮財 R 黃佳婷 R 黃怡嘉.

Linear algebra: matrix Eigen-value Problems Eng. Hassan S. Migdadi Part 1.

Scientific Computing General Least Squares. Polynomial Least Squares Polynomial Least Squares: We assume that the class of functions is the class of all.

Multifaceted Algorithm Design Richard Peng M.I.T..

Motion Estimation Today’s Readings Trucco & Verri, 8.3 – 8.4 (skip 8.3.3, read only top half of p. 199) Newton's method Wikpedia page

Artur Czumaj DIMAP DIMAP (Centre for Discrete Maths and it Applications) Computer Science & Department of Computer Science University of Warwick Testing.

Unique Games Approximation Amit Weinstein Complexity Seminar, Fall 2006 Based on: “Near Optimal Algorithms for Unique Games" by M. Charikar, K. Makarychev,

Graphs, Vectors, and Matrices Daniel A. Spielman Yale University AMS Josiah Willard Gibbs Lecture January 6, 2016.

Complexity and Efficient Algorithms Group / Department of Computer Science Testing the Cluster Structure of Graphs Christian Sohler joint work with Artur.

Monte Carlo Linear Algebra Techniques and Their Parallelization Ashok Srinivasan Computer Science Florida State University

Motion Estimation Today’s Readings Trucco & Verri, 8.3 – 8.4 (skip 8.3.3, read only top half of p. 199) Newton's method Wikpedia page

Presented by Alon Levin

Sketching complexity of graph cuts Alexandr Andoni joint work with: Robi Krauthgamer, David Woodruff.

Algorithm Frameworks Using Adaptive Sampling Richard Peng Georgia Tech.

Sparsified Matrix Algorithms for Graph Laplacians Richard Peng Georgia Tech.

Laplacian Matrices of Graphs: Algorithms and Applications ICML, June 21, 2016 Daniel A. Spielman.

Monte Carlo Linear Algebra Techniques and Their Parallelization Ashok Srinivasan Computer Science Florida State University

Laplacian Matrices of Graphs: Algorithms and Applications ICML, June 21, 2016 Daniel A. Spielman.

Richard Peng Georgia Tech Michael Cohen Jon Kelner John Peebles

Resparsification of Graphs

Hans Bodlaender, Marek Cygan and Stefan Kratsch

Efficient methods for finding low-stretch spanning trees

Approximating the MST Weight in Sublinear Time

Solving Linear Systems Ax=b

From dense to sparse and back again: On testing graph properties (and some properties of Oded)

Path Coupling And Approximate Counting

Background: Lattices and the Learning-with-Errors problem

Lecture 22: Parallel Algorithms

Density Independent Algorithms for Sparsifying

Matrix Martingales in Randomized Numerical Linear Algebra

On the effect of randomness on planted 3-coloring models

Bart M. P. Jansen Jesper Nederlof

Math review - scalars, vectors, and matrices

On Solving Linear Systems in Sublinear Time

Presentation transcript:

Sampling from Gaussian Graphical Models via Spectral Sparsification Richard Peng M.I.T. Joint work with Dehua Cheng, Yu Cheng, Yan Liu and Shanghua Teng (U.S.C.)

OUTLINE Gaussian sampling, linear systems, matrix-roots Sparse factorizations of L p Sparsification of random walk polynomials

SAMPLING FROM GRAPHICAL MODELS Joint probability distribution between entries of n-dimensional random variables x graphical models: encode as local dependencies via graph Sampling: pick a uniformly random point from the model

APPLICATIONS Often need many samples Rejection / importance sampling Estimation of quantities on the samples Ideal sampling routine: Efficient, parallel Use limited randomness

PREVIOUS WORKS Instance of Markov Chain Monte-Carlo Parallel sampling algorithm: [Gonzalez-Low-Gretton-Guestrin `11]: coloring. [Niu-Recht-Re-Wright `11] Hogwild: go lock-free [Williamson-Dubey-Xing `13]: auxiliary variables. Gibbs sampling: locally resample each variable from the joint distribution given by its neighbors

GAUSSIAN GRAPHICAL MODELS AND LINEAR SYSTEMS Joint distribution specified by a precision matrix, M -1 Goal: sample from Gaussian distribution N(0, M -1 ) Gibbs sampling: resample based on neighbors Iterative methods: x ’  x + α Mx Also recomputing on neighbors Usually denoted as Λ -1

CONNECTION TO SOLVING LINEAR SYSTEMS [Johnson, Saunderson, Willsky `13]: if the precision matrix M is (generalized) diagonally dominant, then Hogwild Gibbs sampling converges 1 1 n vertices m edges Further simplification: graph Laplacian Matrix L Diagonal: degree Off-diagonal: -edge weights Much more restrictive than the `graph’ in graphical models! n rows / columns O(m) non-zeros

LOCAL METHODS #steps required lower bounded by information propagation M diameter b bMbMb M2bM2b Need n matrix operations? What if we have more powerful algorithmic primitives?

ALGEBRAIC PRIMITIVE Goal: generate random variable from the Gaussian distribution N(0, L -1 ) Can generate uniform Gaussians, N(0, I) Need: efficiently evaluable linear operator C s.t. C T C = L -1 x ~ N(0, I), y = Cx y ~ N(0, C T C ) Assume L is full rank for simplicity

DIRECT SOLUTION: Factorize L = B T B Set C = L -1 B T CC T = L -1 B T ( L -1 B T ) T = L -1 B T BL -1 = L -1 Factorization + black-box access solvers gives sampling algorithm B : Edge-vertex incidence matrix: B eu =-1/1 if u is endpoint of e 0 otherwise

PARALLEL SAMPLING ROUTINE [P-Spielman `14]: Z ≈ ε L -1 in polylog depth and nearly-linear work ≈: spectral similarity, A ≈ k B iff ∀ x we have: e -k x T Ax ≤ x T Bx ≤ e k x T Ax Can use B ‘in place’ of A Can also boost accuracy Parallel sampling routine C corresponding to : y ’  B T y x  solve( L, y’ ) gives L ≈ C T C

RANDOMNESS REQUIREMENT Sample y from N(0, I ) y ’  B t y x  solve( L, y’ ) return x B : m – by – n matrix, m = # of edges Optimal randomness requirement: n  C that is a square matrix Fewer random variables? y needs to be a m-dimensional Gaussians (can get to O(nlogn) with some work)

GENERALIZATIONS Lower Randomness Requirement: L ≈ C T C where C is a square matrix Application of matrix roots: ‘half a step’ of a random walk Can also view as matrix square root? Z s.t. Z ≈ L -1/2 ? Z s.t. Z ≈ L -1/3 ? ≈: spectral approximation Akin to QR factorization Alternate definition of square-root:

OUR RESULT Input : graph Laplacian L with condition number κ, parameter -1 ≤ p ≤ 1 Output : Access to square operator C s.t. C T C ≈ ε L p Cost : O(log c1 m log c2 κ ε -4 ) time O(m log c1 m log c2 κ ε -4 ) work κ : condition number, closely related to bit-complexity of solve( L, b ) Extends to symmetric diagonally dominant (SDD) matrices

SUMMARY Gaussian sampling closely related to linear system solves and matrix p th roots Can approximately factor L p into a product of sparse matrices Random walk polynomials can be sparsified by sampling random walks

OUTLINE Gaussian sampling, linear systems, matrix-roots Sparse factorizations of L p Sparsification of random walk polynomials

SIMPLIFICATION Adjust/rescale so diagonal = I Add to diagonal to make full rank L = I – A A: Random walk, ║ A ║ < 1

PROBLEM Each step: pass information to neighbor A diameter IAA2A2 Need A diameter Given random walk matrix A, parameter p, produce easily evaluable C s.t. C T C ≈ ( I – A ) p Evaluate using O(diameter) matrix operations? Local approach for p = -1: I + A + A 2 + A 3 + …. = ( I – A ) -1

FASTER INFORMATION PROPAGATION Recall: ║ A ║ < 1, I - A n 3 ≈ I if A corresponds to random walk on unweighted graph Repeated squaring: A 16 = (((( A 2 ) 2 ) 2 ) 2, 4 operations Framework from [P-Spielman `14]: Reducing ( I – A ) p to computing ( I – A 2 ) p O(logκ) reduction steps suffice

SQUARING  DENSE GRAPHS?!? [ST `04][SS`08][OV `11] + some modifications, or [Koutis `14]: O(nlog c n ε -2 ) entries, efficient, parallel [BSS`09, ALZ `14]: O(nε -2 ) entries, but quadratic cost Graph sparsification: sparse A ’ s.t. I - A ’ ≈ ε I – A 2 Also preserves p th powers

ABSORBING ERRORS Direct factorization: ( I – A ) -1 = ( I + A ) ( I – A 2 ) -1 Simplification: work with p = -1 Have: I – A ’ ≈ I – A 2 Implies: ( I – A ’) -1 ≈ ( I – A 2 ) -1 But NOT: ( I + A ) ( I – A ’) -1 ≈ ( I + A ) ( I – A 2 ) -1 Incorporation of matrix approximations need to be symmetric: X ≈ X ’  U T XU ≈ U T X ’ U Instead use: ( I – A ) -1 = ( I + A ) 1/2 ( I – A 2 ) -1 ( I + A ) 1/2 ≈ ( I + A ) 1/2 ( I – A ’) -1 ( I + A ) 1/2

SIMILAR TO ConnectivityOur Algorithm Iteration A i+1 ≈ A i 2 I - A i+1 ≈ I - A i 2 Until ║ A d ║ small Size ReductionLow degreeSparse graph MethodDerandomizedRandomized Solution transferConnectivitySolution vectors Multiscale methods NC algorithm for shortest path Logspace connectivity: [Reingold `02] Deterministic squaring: [Rozenman-Vadhan `05]

EVALUATING ( I + A ) 1/2 ? Well-conditioned matrix Mclaurin series expansion, approximated well by a low degree polynomial T 1/2 ( A i ) A 1 ≈ A 0 2: Eigenvalues between [0,1] Eigenvalues of I + A i in [1,2] when i > 0 Doesn’t work for ( I + A 0 ) 1/2 : eigenvalues of A 0 can be -1 ( I – A ’) -1 ≈ ( I + A ) 1/2 ( I – A ’) -1 ( I + A ) 1/2

MODIFIED IDENTITY ( I – A ) -1 = ( I + A /2) 1/2 ( I – A /2 - A 2 /2) -1 ( I + A /2) 1/2 Modified reduction: I – A i+1 ≈ I – A /2 - A 2 /2 I + A i /2 has eigenvalues in [1/2, 3/2] Can approximate (to very high accuracy) with low degree polynomial / Mclaurin series, T 1/2 ( A i /2)

APPROX. FACTORIZATION CHAIN For p th root (-1 ≤ p ≤1): T p/2 ( A 0 /2)T p/2 ( A 1 /2) …T p/2 ( A d /2) I - A 1 ≈ ε I – A /2 - A 2 /2 I – A 2 ≈ ε I – A 1 /2 - A 1 2 … I – A i ≈ ε I – A i-1 /2 - A i-1 2 /2 I - A d ≈ I I - A 0 I - A d ≈ I d = O(logκ) ( I – A i ) -1 ≈ T 1/2 ( A i /2) ( I – A i+1 ) -1 T 1/2 ( A i /2) C i = T 1/2 ( A i /2) T 1/2 ( A 1 /2)…T 1/2 ( A d /2) gives (I – A i ) -1 ≈ C i T C i,

WORKING AROUND EXPANSIONS Alternate reduction step: ( I – A ) -1 = (I + A /2) ( I – 3/4 A 2 -1/4 A 3 ) -1 (I + A /2) Composition now done with I + A /2, easy Hard part: finding sparse approximation to I – 3/4 A 2 -1/4 A 3 3/4( I – A 2 ): same as before 1/4( I – A 3 ): cubic power

GENERALIZATION TO PTH POWER ( I – A ) p = (I + k A ) ((1 + k A ) 2/p ( I – A )) p ( I + k A ) Intuition: scalar operations commute, cancel away extra outer terms with inner ones Can show: if 2/p is integer and k > 2/p, (1 + k A ) 2/p ( I – A ) is a combination of ( I – A c ) for integer c up to 2/p Difficulty: sparsifying ( I – A c ) for large values of c

SUMMARY Gaussian sampling closely related to linear system solves and matrix p th roots Can approximately factor L p into a product of sparse matrices

OUTLINE Gaussian sampling, linear systems, matrix-roots Sparse factorizations of L p Sparsification of random walk polynomials

SPECTRAL SPARSIFICATION VIA EFFECTIVE RESISTANCE [Spielman-Srivastava `08]: suffices to sample with probabilities at least O(logn) times weight times effective resistance Issues: I - A 3 is dense Need to sample without explicitly generating all edges / resistances Aka. sample with logn A uv R(u, v) Two step approach: get sparsifier with edge count close to m, then run full sparsifier

TWO STEP APPROACH FOR I – X 2 A : 1 step of random walk A 2 : 2 steps of random walk [P-Spielman `14]: for a fix midpoint, edges of A 2, form a (weighted) complete graph Replace with expanders  O(mlogn) edges Run black-box sparsifier

I - A 3 A : one step of random walk A 3 : 3 steps of random walk (part of) edge uv in I - A 3 Length 3 path in A : u-y-z-v Weight: A uy A yz A zv

BOUND RESISTANCE ON I - A Rayleigh’s monotonicity law: resistances in subgraphs of I – A are good upper bounds Can check: I - A ≈ 3 I - A 3 Resistance between u and v in I - A gives upper bound for sampling probability Bound R(u, v) using length 3 path in A, u-y-z-v: Sampling probability = logn × w() × R () Spectral theorem: can work as scalars

SAMPLING DISTRIBUTION Weight: A uy A yz A zv Probability: A yz A zv + A uv A zv + A uv A yz Sampling probability = logn × w() × R () Resistance: 1/ A uv + 1/ A yz + 1/ A zv A uy A yz A zv

ONE TERM AT A TIME Probability of picking uyzv: A yz A zv + A uv A zv + A uv A yz Interepratation: pick edge uy, take 2 steps of random walk, then sample edge in A 3 corresponding to uyzv Total for a fixed choice fo uy: Σ zv A yz A zv = Σ z A yz (Σ v A zv ) A : random walk transition probability ≤ Σ z A yz ≤ 1≤ 1 total over all choices of uy: m

MIDDLE TERM Interpretation: pick edge yz, take one step from y to get u, one step from z to get edge uyzv from A 3 Total: m again A uv A yz handled similarly O(mlogn) size approximation to I - A 3 in O(mlogn) time Can then further sparsify in nearly-liner time Probability of picking uyzv: A yz A zv + A uv A zv + A uv A yz

EXTENSIONS I - A k in O(mklog c n) time Even power: I – A ≈ I - A 2 does not hold But I – A 2 ≈ 2 I - A 4, certify via 2 step matrix, same algorithm I - A k in O(mlogklog c n) time when k is a multiple of 4

SUMMARY Gaussian sampling closely related to linear system solves and matrix p th roots Can approximately factor L p into a product of sparse matrices Random walk polynomials can be sparsified by sampling random walks

OPEN QUESTIONS Generalizations: Batch sampling? Connections to multigrid/multiscale methods? Other functionals of L ? Sparsification of random walk polynomials: Degree n polynomials in nearly-linear time? Positive and negative coefficients? Connections with other algorithms based on sampling random walks?

THANK YOU! Questions? Manuscripts on arXiv: