An Investigation into Concurrent Expectation Propagation David Hall, Alex Kantchelian CS252 5/4/2012
Graphical Models Variable X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 Edge
Graphical Models φ( ) ψ( , ) X8 Variable Potential X12 X8 φ( ) X8 Variable Potential ψ( , ) X12 X8 Edge Potential
Graphical Models Robotics, Vision, Natural Language Processing, Comp Bio, Data Mining http://scripts.mit.edu/~zong/wpress/wp-content/uploads/images/voting.jpg, http://www.stanford.edu/~montanar/TEACHING/Stat375/stat375.html,
Graphical Models: Inference Main tasks are: Determine most likely configuration of variables Usually NP-Hard Determine Z or marginal distributions p(x1) Usually #P-Hard
Approximate Inference Many kinds! Basic goal: approximate the sum with something simpler. We focus on Expectation Propagation.
Basic Question Most inference algorithms usually defined sequentially. Update one potential at a time. But we’d like to use them in parallel. Models get bigger, more intricate. Computers getting more parallel. How do they perform? Can we construct an algorithm with better performance?
Expectation Propagation Coupling
Expectation Propagation Coupling
Expectation Propagation
Expectation Propagation
Expectation Propagation
Expectation Propagation
Expectation Propagation …
Expectation Propagation project( )
Expectation Propagation project( ) repeat!
Parallel EP proj( ) proj( ) proj( ) X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11
Parallel EP proj( ) proj( ) proj( ) X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11
Potential Problem EP an approximation that might not converge Multiple local optima likely Hypothesis: Unrestricted concurrency exacerbates multiple optima problem Different subgraphs attracted to different optima
Convex EP New algorithm By naively splitting, EP overcounts graph structure Downweight graph structure: guaranteed single fixed point The algorithm is more approximate, and may still not converge. Surprisingly, convexification is achieved by adding hysteresis to the updates.
Experiments Basic questions: How does EP perform when naively parallelized? Accuracy Convergence Speed GPU, CPU via OpenCL AMD Radeon HD 6490M (i.e. what’s in our macbooks) 800mhz gpu Core i7 2 Ghz
Experiments Ising Model Graph Conditions Edge Potential Conditions: Attractive edge potentials Repulsive edge potentials Mixed Variable Potential Conditions: On-biased variable potentials Off-biased variable potentials Neutral Variables are either 0 or 1.
Accuracy
*pseudo-convexified
Runtime
Convergence
Conclusion Investigated behavior of EP under a variety of conditions Introduced a new algorithm Convex EP Better convergence properties in large graphs when used in parallel Found that a combination of Convex EP and EP was actually best.
Future Work Different graph topologies “Structured” approximations Different kinds of distributions