Monte Carlo Linear Algebra Techniques and Their Parallelization Ashok Srinivasan Computer Science Florida State University www.cs.fsu.edu/~asriniva.

Slides:

Advertisements

Similar presentations

PRAGMA – 9 V.S.S.Sastry School of Physics University of Hyderabad 22 nd October, 2005.

Advertisements

Linear Time Methods for Propagating Beliefs Min Convolution, Distance Transforms and Box Sums Daniel Huttenlocher Computer Science Department December,

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.

Fast Algorithms For Hierarchical Range Histogram Constructions

Use of Kalman filters in time and frequency analysis John Davis 1st May 2011.

Modeling and Simulation Monte carlo simulation 1 Arwa Ibrahim Ahmed Princess Nora University.

1 The Monte Carlo method. 2 (0,0) (1,1) (-1,-1) (-1,1) (1,-1) 1 Z= 1 If  X 2 +Y 2  1 0 o/w (X,Y) is a point chosen uniformly at random in a 2  2 square.

Randomized Algorithms Kyomin Jung KAIST Applied Algorithm Lab Jan 12, WSAC

Rayan Alsemmeri Amseena Mansoor. LINEAR SYSTEMS Jacobi method is used to solve linear systems of the form Ax=b, where A is the square and invertible.

Theoretical Program Checking Greg Bronevetsky. Background The field of Program Checking is about 13 years old. Pioneered by Manuel Blum, Hal Wasserman,

Belief Propagation by Jakob Metzler. Outline Motivation Pearl’s BP Algorithm Turbo Codes Generalized Belief Propagation Free Energies.

11 - Markov Chains Jim Vallandingham.

More on Rankings. Query-independent LAR Have an a-priori ordering of the web pages Q: Set of pages that contain the keywords in the query q Present the.

BAYESIAN INFERENCE Sampling techniques

1 CE 530 Molecular Simulation Lecture 8 Markov Processes David A. Kofke Department of Chemical Engineering SUNY Buffalo

CSCI 317 Mike Heroux1 Sparse Matrix Computations CSCI 317 Mike Heroux.

HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.

Non-Linear Statistical Static Timing Analysis for Non-Gaussian Variation Sources Lerong Cheng 1, Jinjun Xiong 2, and Prof. Lei He 1 1 EE Department, UCLA.

Weiping Shi Department of Computer Science University of North Texas HiCap: A Fast Hierarchical Algorithm for 3D Capacitance Extraction.

Sparse Matrix Algorithms CS 524 – High-Performance Computing.

Graphical Models for Protein Kinetics Nina Singhal CS374 Presentation Nov. 1, 2005.

Avoiding Communication in Sparse Iterative Solvers Erin Carson Nick Knight CS294, Fall 2011.

Efficient Estimation of Emission Probabilities in profile HMM By Virpi Ahola et al Reviewed By Alok Datar.

Design of Curves and Surfaces by Multi Objective Optimization Rony Goldenthal Michel Bercovier School of Computer Science and Engineering The Hebrew University.

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.

Monica Garika Chandana Guduru. METHODS TO SOLVE LINEAR SYSTEMS Direct methods Gaussian elimination method LU method for factorization Simplex method of.

Monte Carlo Methods in Partial Differential Equations.

Separate multivariate observations

Domain decomposition in parallel computing Ashok Srinivasan Florida State University COT 5410 – Spring 2004.

Photo-realistic Rendering and Global Illumination in Computer Graphics Spring 2012 Stochastic Radiosity K. H. Ko School of Mechatronics Gwangju Institute.

1 CE 530 Molecular Simulation Lecture 7 David A. Kofke Department of Chemical Engineering SUNY Buffalo

Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.

1 Statistical Mechanics and Multi- Scale Simulation Methods ChBE Prof. C. Heath Turner Lecture 11 Some materials adapted from Prof. Keith E. Gubbins:

Solving Scalar Linear Systems Iterative approach Lecture 15 MA/CS 471 Fall 2003.

9. Convergence and Monte Carlo Errors. Measuring Convergence to Equilibrium Variation distance where P 1 and P 2 are two probability distributions, A.

Module 1: Statistical Issues in Micro simulation Paul Sousa.

Yaomin Jin Design of Experiments Morris Method.

Statistical Sampling-Based Parametric Analysis of Power Grids Dr. Peng Li Presented by Xueqian Zhao EE5970 Seminar.

Tarek A. El-Moselhy and Luca Daniel

Markov Chain Monte Carlo and Gibbs Sampling Vasileios Hatzivassiloglou University of Texas at Dallas.

Basic Numerical Procedure

Suppressing Random Walks in Markov Chain Monte Carlo Using Ordered Overrelaxation Radford M. Neal 발표자 : 장 정 호.

Author: B. C. Bromley Presented by: Shuaiyuan Zhou Quasi-random Number Generators for Parallel Monte Carlo Algorithms.

On the Use of Sparse Direct Solver in a Projection Method for Generalized Eigenvalue Problems Using Numerical Integration Takamitsu Watanabe and Yusaku.

Photo-realistic Rendering and Global Illumination in Computer Graphics Spring 2012 Stochastic Radiosity K. H. Ko School of Mechatronics Gwangju Institute.

Parallel Solution of the Poisson Problem Using MPI

Case Study in Computational Science & Engineering - Lecture 5 1 Iterative Solution of Linear Systems Jacobi Method while not converged do { }

CS 484. Iterative Methods n Gaussian elimination is considered to be a direct method to solve a system. n An indirect method produces a sequence of values.

Mixture Kalman Filters by Rong Chen & Jun Liu Presented by Yusong Miao Dec. 10, 2003.

Seminar on random walks on graphs Lecture No. 2 Mille Gandelsman,

The Markov Chain Monte Carlo Method Isabelle Stanton May 8, 2008 Theory Lunch.

Domain decomposition in parallel computing Ashok Srinivasan Florida State University.

CS774. Markov Random Field : Theory and Application Lecture 15 Kyomin Jung KAIST Oct

By: Jesse Ehlert Dustin Wells Li Zhang Iterative Aggregation/Disaggregation(IAD)

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

Introduction to Sampling Methods Qi Zhao Oct.27,2004.

WORKSHOP ERCIM Global convergence for iterative aggregation – disaggregation method Ivana Pultarova Czech Technical University in Prague, Czech Republic.

CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov

A Parallel Hierarchical Solver for the Poisson Equation Seung Lee Deparment of Mechanical Engineering

Multipole-Based Preconditioners for Sparse Linear Systems. Ananth Grama Purdue University. Supported by the National Science Foundation.

Monte Carlo Linear Algebra Techniques and Their Parallelization Ashok Srinivasan Computer Science Florida State University

The Monte Carlo Method/ Markov Chains/ Metropolitan Algorithm from sec in “Adaptive Cooperative Systems” -summarized by Jinsan Yang.

Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.

Hidden Markov Models BMI/CS 576

Stochastic Streams: Sample Complexity vs. Space Complexity

Solving Linear Systems Ax=b

Path Coupling And Approximate Counting

Remember that our objective is for some density f(y|) for observations where y and  are vectors of data and parameters,  being sampled from a prior.

Solving Linear Systems: Iterative Methods and Sparse Systems

Presentation transcript:

Monte Carlo Linear Algebra Techniques and Their Parallelization Ashok Srinivasan Computer Science Florida State University

Outline Background –Monte Carlo Matrix-vector multiplication –MC linear solvers Non-diagonal splitting –Dense implementation –Sparse implementation Parallelization Conclusions and future work

Background MC linear solvers are old! –von Neumann and Ulam (1950) –Were not competitive with deterministic techniques Advantages of MC –Can give approximate solutions fast Feasible in applications such as preconditioning, graph partitioning, information retrieval, pattern recognition, etc –Can yield selected components fast –Are very latency tolerant

Matrix-vector multiplication Compute C j h, C  R nxn –Choose probability and weight matrices such that C ij = P ij W ij and h i = p i w i –Take a random walk, based on these probabilities –Define random variables X i X 0 = w 0, and X i = X i-1 W k i k i-1 E(X j  ik j ) = (C j h) i –Each random walk can be used to estimate the k j th component of C j h Convergence rate independent of n k0k0 Pk1k0Pk1k0 Update (C j h) k j kjkj k2k2 k1k1 Pk2k1Pk2k1 pk0pk0

Matrix-vector multiplication... continued  m j=0 C j h too can be similarly estimated  m j=0 (BC) j Bh will be needed by us –It can be estimated using probabilities on both matrices, B and C –Length of random walk is twice that for the previous case k0k0 Pk1k0Pk1k0 Update (C j h) k 2j+1 k 2m+1 k2k2 k1k1 Pk2k1Pk2k1 pk0pk0 Pk3k2Pk3k2 k3k3

MC linear solvers Solve Ax = b –Split A as: A = N – M –Write the fixed point iteration x m+1 = N -1 Mx m + N -1 b = Cx m + h –If we choose x 0 = h, then we get x m =  m j=0 C j h –Estimate the above using the Markov chain technique mentioned earlier

Current techniques –Choose N = a diagonal matrix Ensures efficient computation of C C is sparse when A is sparse Example: N = Diagonal of A yields the Jacobi iteration, and the corresponding MC estimate

Properties of MC linear solvers MC techniques estimate the result of a stationary iteration –Errors from the iterative process –Errors from MC Reduce the error by –Variance reduction techniques –Residual correction –Choose a better iterative scheme!

Non-diagonal splitting Observations –It is possible to construct an efficient MC technique for specific splittings, even if explicit construction of C were computationally expensive –It may be possible to implicitly represent C sparsely, even if C is not actually sparse

Our example Choose N to be the diagonal and sub- diagonal of A d 1 s 2 d 2 s n d n N = * * * * * * * * * * * N -1 = Computing N -1 C is too expensive –Compute x m =  m j=0 (N -1 M) j N -1 b instead

Computing N -1 Using O(n) storage and precomputation time, any element of N -1 can be computed in constant time –Define T(1) = 1, T(i+1) = T(i) s i+1 /d i+1 –N -1 ij = 0, if i < j 1/d i, if i =j (-1) i-j /d j T(i)/T(j), otherwise The entire N -1, if needed, can be computed in O(n 2 ) time

Dense implementation Compute N -1 and store in O(n 2 ) space Choose probabilities proportional to the weight of the elements Use the alias method to sample –Precomputation time proportional to the number of elements –Constant time to generate each sample Estimate  m j=0 (N -1 M) j N -1 b

Experimental results Walk length = 2

Sparse implementation We cannot use O(n 2 ) space or time! Sparse implementation for M is simple Sparse representation of N -1 –Choose P ij = 0, if i < j 1/(n-j+1) otherwise Sampled from the uniform distribution –Choose W ij = N -1 ij P ij Constant time to determine any W ij Minor modifications needed when s i = 0

Parallelization Proc 1 RNG 1 Proc 2 RNG 2 Proc 3 RNG MC is “embarrassingly” parallel Identical algorithms are run independently on each processor, with the random number sequences alone being different

MPI vs OpenMP on Origin 2000 Cache misses cause poor performance of the OpenMP parallelization

Conclusions and future work Demonstrated that is possible to have effective MC implementations with non- diagonal splittings too Need to extend this to better iterative schemes Non-replicated parallelization needs to be considered