Ahsanul Haque , Swarup Chandra , Latifur Khan * and Charu Aggarwal + * Department of Computer Science, University of Texas at Dallas + IBM T. J. Watson.

Slides:

Advertisements

Similar presentations

Bayesian Belief Propagation

Advertisements

Scaling Up Graphical Model Inference

Exact Inference. Inference Basic task for inference: – Compute a posterior distribution for some query variables given some observed evidence – Sum out.

Variational Methods for Graphical Models Micheal I. Jordan Zoubin Ghahramani Tommi S. Jaakkola Lawrence K. Saul Presented by: Afsaneh Shirazi.

SE263 Video Analytics Course Project Initial Report Presented by M. Aravind Krishnan, SERC, IISc X. Mei and H. Ling, ICCV’09.

INTRODUCTION TO MACHINE LEARNING Bayesian Estimation.

Dynamic Bayesian Networks (DBNs)

MPE, MAP AND APPROXIMATIONS Lecture 10: Statistical Methods in AI/ML Vibhav Gogate The University of Texas at Dallas Readings: AD Chapter 10.

Undirected Probabilistic Graphical Models (Markov Nets) (Slides from Sam Roweis)

Introduction to Belief Propagation and its Generalizations. Max Welling Donald Bren School of Information and Computer and Science University of California.

Challenges posed by Structural Equation Models Thomas Richardson Department of Statistics University of Washington Joint work with Mathias Drton, UC Berkeley.

1 Social Influence Analysis in Large-scale Networks Jie Tang 1, Jimeng Sun 2, Chi Wang 1, and Zi Yang 1 1 Dept. of Computer Science and Technology Tsinghua.

Overview of Inference Algorithms for Bayesian Networks Wei Sun, PhD Assistant Research Professor SEOR Dept. & C4I Center George Mason University, 2009.

Bayesian network inference

Date:2011/06/08 吳昕澧 BOA: The Bayesian Optimization Algorithm.

1 Graphical Models in Data Assimilation Problems Alexander Ihler UC Irvine Collaborators: Sergey Kirshner Andrew Robertson Padhraic Smyth.

Design Patterns for Efficient Graph Algorithms in MapReduce Jimmy Lin and Michael Schatz University of Maryland Tuesday, June 29, 2010 This work is licensed.

British Museum Library, London Picture Courtesy: flickr.

CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of.

Parallel K-Means Clustering Based on MapReduce The Key Laboratory of Intelligent Information Processing, Chinese Academy of Sciences Weizhong Zhao, Huifang.

CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.

AND/OR Search for Mixed Networks #CSP Robert Mateescu ICS280 Spring Current Topics in Graphical Models Professor Rina Dechter.

Online Stacked Graphical Learning Zhenzhen Kou +, Vitor R. Carvalho *, and William W. Cohen + Machine Learning Department + / Language Technologies Institute.

Approximate Inference 2: Monte Carlo Markov Chain

Design Patterns for Efficient Graph Algorithms in MapReduce Jimmy Lin and Michael Schatz University of Maryland MLG, January, 2014 Jaehwan Lee.

Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.

6. Experimental Analysis Visible Boltzmann machine with higher-order potentials: Conditional random field (CRF): Exponential random graph model (ERGM):

Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Michael Baron + * Department of Computer Science, University of Texas at Dallas + Department of Mathematical.

Undirected Models: Markov Networks David Page, Fall 2009 CS 731: Advanced Methods in Artificial Intelligence, with Biomedical Applications.

Direct Message Passing for Hybrid Bayesian Networks Wei Sun, PhD Assistant Research Professor SFL, C4I Center, SEOR Dept. George Mason University, 2009.

1 Variable Elimination Graphical Models – Carlos Guestrin Carnegie Mellon University October 11 th, 2006 Readings: K&F: 8.1, 8.2, 8.3,

Problems in large-scale computer vision David Crandall School of Informatics and Computing Indiana University.

Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 28 of 41 Friday, 22 October.

Today Ensemble Methods. Recap of the course. Classifier Fusion

Inference Complexity As Learning Bias Daniel Lowd Dept. of Computer and Information Science University of Oregon Joint work with Pedro Domingos.

Learning With Bayesian Networks Markus Kalisch ETH Zürich.

Probabilistic Networks Chapter 14 of Dechter’s CP textbook Speaker: Daniel Geschwender April 1, 2013 April 1&3, 2013DanielG--Probabilistic Networks1.

Two Approximate Algorithms for Belief Updating Mini-Clustering - MC Robert Mateescu, Rina Dechter, Kalev Kask. "Tree Approximation for Belief Updating",

1 CMSC 671 Fall 2001 Class #21 – Tuesday, November 13.

Probabilistic Graphical Models seminar 15/16 ( ) Haim Kaplan Tel Aviv University.

Wei Sun and KC Chang George Mason University March 2008 Convergence Study of Message Passing In Arbitrary Continuous Bayesian.

Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:

Inference Algorithms for Bayes Networks

ApproxHadoop Bringing Approximations to MapReduce Frameworks

PhD Dissertation Defense Scaling Up Machine Learning Algorithms to Handle Big Data BY KHALIFEH ALJADDA ADVISOR: PROFESSOR JOHN A. MILLER DEC-2014 Computer.

Christopher M. Bishop, Pattern Recognition and Machine Learning 1.

CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov

Daphne Koller Overview Conditional Probability Queries Probabilistic Graphical Models Inference.

Today Graphical Models Representing conditional dependence graphically

MapReduce Basics Chapter 2 Lin and Dyer & /tutorial/

1 Relational Factor Graphs Lin Liao Joint work with Dieter Fox.

Markov Networks: Theory and Applications Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208

1 Structure Learning (The Good), The Bad, The Ugly Inference Graphical Models – Carlos Guestrin Carnegie Mellon University October 13 th, 2008 Readings:

Daphne Koller Overview Maximum a posteriori (MAP) Probabilistic Graphical Models Inference.

1 Variable Elimination Graphical Models – Carlos Guestrin Carnegie Mellon University October 15 th, 2008 Readings: K&F: 8.1, 8.2, 8.3,

SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.

Dependency Networks for Inference, Collaborative filtering, and Data Visualization Heckerman et al. Microsoft Research J. of Machine Learning Research.

COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn University

Accelerating K-Means Clustering with Parallel Implementations and GPU Computing Janki Bhimani Miriam Leeser Ningfang Mi

CS498-EA Reasoning in AI Lecture #19 Professor: Eyal Amir Fall Semester 2011.

BAHIR DAR UNIVERSITY Institute of technology Faculty of Computing Department of information technology Msc program Distributed Database Article Review.

Inference in Bayesian Networks

Exact Inference Continued

A Distributed Bucket Elimination Algorithm

People Forecasting Where people are going?

Class #19 – Tuesday, November 3

Learning Probabilistic Graphical Models Overview Learning Problems.

Expectation-Maximization & Belief Propagation

Overview: Chapter 2 Localization and Tracking

Iterative Join Graph Propagation

Presentation transcript:

Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Charu Aggarwal + * Department of Computer Science, University of Texas at Dallas + IBM T. J. Watson Research Center, Yorktown NY, USA Distributed Adaptive Importance Sampling on Graphical Models using MapReduce This material is based upon work supported by University Of Texas at Dallas

2 Agenda University Of Texas at Dallas Brief overview on Inference techniques Problem Proposed Approaches Experiments Discussion

3 Agenda University Of Texas at Dallas Brief overview on Inference techniques Problem Proposed Approaches Experiments Discussion

4 Graphical Models University Of Texas at Dallas A probabilistic graphical model G is a collection of functions over a set of random variables. Generally represented as a network of nodes: Each node denoting a random variable (e.g., data feature). Each edge denotes relationship between two random variables. Two types of representations: Bayesian network is represented by directed graph. Markov network is represented by undirected graph.

5 Example Graphical Model University Of Texas at Dallas Inference is needed to evaluate Probability of Evidence, Prior and Posterior Marginal, Most Probable Explanation (MPE) and Maximum a Posteriori (MAP) queries. Probability of Evidence needs to be evaluated in classification problems. AC  (A,C) A B C D E F  (A,C)  (C,E)  (D,F)  (B,D)  (C,D)  (A,B)  (E,F) Sample Factor:

6 Exact Inference University Of Texas at Dallas Exact Inference algorithms, e.g., Variable Elimination provide accurate results for Probability of Evidence. Challenges: Exponential time and space complexity. Computationally intractable on large graphs. Approximate Inference algorithms are used widely in practice to evaluate queries within resource limit. Sampling based, e.g., Gibbs Sampling, Importance Sampling. Propagation based, e.g., Iterative Join Graph Propagation.

7 Adaptive Importance Sampling (AIS) University Of Texas at Dallas

8 RB-AIS University Of Texas at Dallas We focus on a special type of AIS in this paper, called Rao- Blackwellized Adaptive Importance Sampling (RB-AIS). In RB-AIS, a set of variables, X w ⊂ X \ X e (called w-cutset variables) are sampled. X w is chosen in such a way that Exact Inference over X \ X w, X e is tractable. Large |X w | results in quicker evaluation of query but more erroneous result. Small |X w | results in more accurate result but takes more time. Trade off! V. Gogate and R. Dechter, “Approximate inference algorithms for hybrid bayesian networks with discrete constraints.” in UAI. AUAI Press, 2005, pp. 209–216.

9 RB-AIS : Steps University Of Texas at Dallas Start Initial Q on X w Generate Samples Calculate Sample Weights Update Q and Z Converge? End Yes No

10 Agenda University Of Texas at Dallas Brief overview on Inference techniques Problem Proposed Approaches Experiments Discussion

11 Problem University Of Texas at Dallas Real world applications require good quality result within the time constraint. Typically, real world networks are large and complex (i.e., large tree width). For instance, if we want to model facebook users using graphical models, it will have billions of nodes in it! Even RB-AIS may run out of time to provide a quality estimate within the time limit. For instance, RB-AIS takes more than 6 hours to find out a single probability of evidence on a network having only 67 nodes and 271 factors.

12 Agenda University Of Texas at Dallas Brief overview on Inference techniques Problem Proposed Approaches Experiments Discussion

13 Challenges University Of Texas at Dallas To design a parallel and distributed approach for RB-AIS, following challenges need to be addressed: RB-AIS updates Q periodically. Since values of Q and Z at iteration i depends on those values at iteration i -1, a proper synchronization mechanism is needed. Distributing the task of sample generation on X w over the worker nodes.

14 Proposed Approaches University Of Texas at Dallas We design and implement two MapReduce based approaches for distributed and parallel computation of inference queries using RB-AIS. Distributed Sampling in Mappers (DSM) Parallel sampling. Sequential weight calculation. Distributed Weight Calculation in Mappers (DWCM) Sequential sampling Parallel weight calculation.

15 Distributed Sampling in Mappers (DSM) University Of Texas at Dallas Reducer 1 ( X 1, x 11, Q i [X 1 ] ) n ( X 1, x 1n, Q i [X 1 ] ) Shuffle and Sort: aggregate values by keys X1X1 Q i [x 1 ] Map 1 Input to i th MR job: X w, Q i X2X2 Q i [x 2 ]X3X3 Q i [x 3 ]XmXm Q i [x m ] Z 1 ( X 2, x 21, Q i [X 2 ] ) n ( X 2, x 2n, Q i [X 2 ] ) 1 ( X 3, x 31, Q i [X 3 ] ) n ( X 3, x 3n, Q i [X 3 ] ) 1 ( X m, x m1, Q i [X m ] ) n ( X m, x mn, Q i [X m ] ) s ( X 1, x 1s, Q[X 1 ] )( X 2, x 2s, Q[X 2 ] ) ( X 3, x 3s, Q[X 3 ] ) ( X m, x ms, Q[X m ] ) Update Z, and Q i to Q i+1 Z X1X1 Q i+1 [x 1 ]X2X2 Q i+1 [x 2 ]X3X3 Q i+1 [x 3 ]XmXm Q i+1 [x m ]Z Combine x 1s, x 2s …x ms to form x s, where s = {1,2…n} Map 2Map 3Map m

16 Distributed Weight Calculation in Mappers (DWCM) University Of Texas at Dallas Reducer wv Shuffle and Sort: aggregate values by keys x1x1 Q i [X w =x 1 ] Map 1 Input to i th MR job: X w, List[x] Z wv Update Z and Q i to Q i+1 Z Map 2Map 3Map n x2x2 Q i [X w =x 2 ]x3x3 Q i [X w =x 3 ]xnxn Q i [X w =x n ] wv x1x1 Q i+1 [X w =x 1 ] Z x2x2 Q i+1 [X w =x 2 ]x3x3 Q i+1 [X w =x 3 ]xnxn Q i [X w =x n ]

17 Agenda University Of Texas at Dallas Brief overview on Inference techniques Problem Proposed Approaches Experiments Discussion

18 Setup University Of Texas at Dallas Performance Metrics: Speedup = T sq /T d T sq = Execution time of sequential approach. T d = Execution time of distributed approach. Scaleup = T s /T p T s = Execution time using single Mapper. T p = Execution time using multiple Mappers. Hadoop version data nodes, 1 name node. Each machine has 2.2GHz processor and 4 GB of RAM. Network Number of Nodes Number of Factors 54.wcsp [1] wcsp [1] wcsp [1] [1] “The probabilistic inference challenge (pic2011),” , last updated on

19 Speedup University Of Texas at Dallas

20 Scaleup University Of Texas at Dallas

21 Discussion University Of Texas at Dallas Both of the approaches achieve substantial speedup and scaleup comparing with the sequential execution. DWCM has better speedup and scalability than DSM. Weight calculation is computationally more expensive than sample generation. DWCM does parallel weight calculation, so it outperforms DSM. Both of the approaches show similar accuracy to the sequential execution asymptotically.

22 University Of Texas at Dallas Questions?