An Investigation into Concurrent Expectation Propagation

Slides:



Advertisements
Similar presentations
Bayesian Belief Propagation
Advertisements

Exact Inference. Inference Basic task for inference: – Compute a posterior distribution for some query variables given some observed evidence – Sum out.
Variational Methods for Graphical Models Micheal I. Jordan Zoubin Ghahramani Tommi S. Jaakkola Lawrence K. Saul Presented by: Afsaneh Shirazi.
CS498-EA Reasoning in AI Lecture #15 Instructor: Eyal Amir Fall Semester 2011.
Unsupervised Learning Clustering K-Means. Recall: Key Components of Intelligent Agents Representation Language: Graph, Bayes Nets, Linear functions Inference.
SE263 Video Analytics Course Project Initial Report Presented by M. Aravind Krishnan, SERC, IISc X. Mei and H. Ling, ICCV’09.
Constrained Approximate Maximum Entropy Learning (CAMEL) Varun Ganapathi, David Vickrey, John Duchi, Daphne Koller Stanford University TexPoint fonts used.
1 Chapter 1 Why Parallel Computing? An Introduction to Parallel Programming Peter Pacheco.
Junction Trees And Belief Propagation. Junction Trees: Motivation What if we want to compute all marginals, not just one? Doing variable elimination for.
Dynamic Bayesian Networks (DBNs)
Loopy Belief Propagation a summary. What is inference? Given: –Observabled variables Y –Hidden variables X –Some model of P(X,Y) We want to make some.
Convergent Message-Passing Algorithms for Inference over General Graphs with Convex Free Energies Tamir Hazan, Amnon Shashua School of Computer Science.
Introduction to Sampling based inference and MCMC Ata Kaban School of Computer Science The University of Birmingham.
CS774. Markov Random Field : Theory and Application Lecture 04 Kyomin Jung KAIST Sep
Junction Trees: Motivation Standard algorithms (e.g., variable elimination) are inefficient if the undirected graph underlying the Bayes Net contains cycles.
Distributed Message Passing for Large Scale Graphical Models Alexander Schwing Tamir Hazan Marc Pollefeys Raquel Urtasun CVPR2011.
CS 584. A Parallel Programming Model We need abstractions to make it simple. The programming model needs to fit our parallel machine model. Abstractions.
Course overview Tuesday lecture –Those not presenting turn in short review of a paper using the method being discussed Thursday computer lab –Turn in short.
Map-Reduce and Parallel Computing for Large-Scale Media Processing Youjie Zhou.
. Expressive Graphical Models in Variational Approximations: Chain-Graphs and Hidden Variables Tal El-Hay & Nir Friedman School of Computer Science & Engineering.
Computer vision: models, learning and inference
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
CuMAPz: A Tool to Analyze Memory Access Patterns in CUDA
Computation and Minimax Risk The most challenging topic… Some recent progress: –tradeoffs between time and accuracy via convex relaxations (Chandrasekaran.
Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Charu Aggarwal + * Department of Computer Science, University of Texas at Dallas + IBM T. J. Watson.
Adaptive CSMA under the SINR Model: Fast convergence using the Bethe Approximation Krishna Jagannathan IIT Madras (Joint work with) Peruru Subrahmanya.
Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Michael Baron + * Department of Computer Science, University of Texas at Dallas + Department of Mathematical.
Undirected Models: Markov Networks David Page, Fall 2009 CS 731: Advanced Methods in Artificial Intelligence, with Biomedical Applications.
GraphLab: how I understood it with sample code Aapo Kyrola, Carnegie Mellon Univ. Oct 1, 2009.
1 Variable Elimination Graphical Models – Carlos Guestrin Carnegie Mellon University October 11 th, 2006 Readings: K&F: 8.1, 8.2, 8.3,
Tokyo Institute of Technology, Japan Yu Nishiyama and Sumio Watanabe Theoretical Analysis of Accuracy of Gaussian Belief Propagation.
Daphne Koller Message Passing Belief Propagation Algorithm Probabilistic Graphical Models Inference.
An Introduction to Variational Methods for Graphical Models
Approximate Inference: Decomposition Methods with Applications to Computer Vision Kyomin Jung ( KAIST ) Joint work with Pushmeet Kohli (Microsoft Research)
Single-Pass Belief Propagation
CSUDH Fall 2015 Instructor: Robert Spengler
Mean field approximation for CRF inference
Today Graphical Models Representing conditional dependence graphically
Markov Networks: Theory and Applications Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208
GPGPU Performance and Power Estimation Using Machine Learning Gene Wu – UT Austin Joseph Greathouse – AMD Research Alexander Lyashevsky – AMD Research.
S. Pardi Frascati, 2012 March GPGPU Evaluation – First experiences in Napoli Silvio Pardi.
Integrative Genomics I BME 230. Probabilistic Networks Incorporate uncertainty explicitly Capture sparseness of wiring Incorporate multiple kinds of data.
Computer Graphics Graphics Hardware
Introduction to Sampling based inference and MCMC
Lecture 7: Constrained Conditional Models
Introduction of BP & TRW-S
The distributive property of multiplication
The University of Adelaide, School of Computer Science
Introduction to Parallelism.
Amir Kamil and Katherine Yelick
Bucket Renormalization for Approximate Inference
Localizing the Delaunay Triangulation and its Parallel Implementation
Markov Networks.
Data Structures and Algorithms in Parallel Computing
CS 179 Project Intro.
Craig Schroeder October 26, 2004
Bayesian Models in Machine Learning
CSCE569 Parallel Computing
CAP 5636 – Advanced Artificial Intelligence
COMP60611 Fundamentals of Parallel and Distributed Systems
Bucket Renormalization for Approximate Inference
Minimum Spanning Tree Optimizations
Computer Graphics Graphics Hardware
CS 188: Artificial Intelligence
≠ Particle-based Variational Inference for Continuous Systems
Expectation-Maximization & Belief Propagation
GANG: Detecting Fraudulent Users in OSNs
BP in Practice Message Passing Inference Probabilistic Graphical
Multidisciplinary Optimization
Presentation transcript:

An Investigation into Concurrent Expectation Propagation David Hall, Alex Kantchelian CS252 5/4/2012

Graphical Models Variable X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 Edge

Graphical Models φ( ) ψ( , ) X8 Variable Potential X12 X8 φ( ) X8 Variable Potential ψ( , ) X12 X8 Edge Potential

Graphical Models Robotics, Vision, Natural Language Processing, Comp Bio, Data Mining http://scripts.mit.edu/~zong/wpress/wp-content/uploads/images/voting.jpg, http://www.stanford.edu/~montanar/TEACHING/Stat375/stat375.html,

Graphical Models: Inference Main tasks are: Determine most likely configuration of variables Usually NP-Hard Determine Z or marginal distributions p(x1) Usually #P-Hard

Approximate Inference Many kinds! Basic goal: approximate the sum with something simpler. We focus on Expectation Propagation.

Basic Question Most inference algorithms usually defined sequentially. Update one potential at a time. But we’d like to use them in parallel. Models get bigger, more intricate. Computers getting more parallel. How do they perform? Can we construct an algorithm with better performance?

Expectation Propagation Coupling

Expectation Propagation Coupling

Expectation Propagation

Expectation Propagation

Expectation Propagation

Expectation Propagation

Expectation Propagation …

Expectation Propagation project( )

Expectation Propagation project( ) repeat!

Parallel EP proj( ) proj( ) proj( ) X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11

Parallel EP proj( ) proj( ) proj( ) X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11

Potential Problem EP an approximation that might not converge Multiple local optima likely Hypothesis: Unrestricted concurrency exacerbates multiple optima problem Different subgraphs attracted to different optima

Convex EP New algorithm By naively splitting, EP overcounts graph structure Downweight graph structure: guaranteed single fixed point The algorithm is more approximate, and may still not converge. Surprisingly, convexification is achieved by adding hysteresis to the updates.

Experiments Basic questions: How does EP perform when naively parallelized? Accuracy Convergence Speed GPU, CPU via OpenCL AMD Radeon HD 6490M (i.e. what’s in our macbooks) 800mhz gpu Core i7 2 Ghz

Experiments Ising Model Graph Conditions Edge Potential Conditions: Attractive edge potentials Repulsive edge potentials Mixed Variable Potential Conditions: On-biased variable potentials Off-biased variable potentials Neutral Variables are either 0 or 1.

Accuracy

*pseudo-convexified

Runtime

Convergence

Conclusion Investigated behavior of EP under a variety of conditions Introduced a new algorithm Convex EP Better convergence properties in large graphs when used in parallel Found that a combination of Convex EP and EP was actually best.

Future Work Different graph topologies “Structured” approximations Different kinds of distributions