Ahsanul Haque , Swarup Chandra , Latifur Khan * and Michael Baron + * Department of Computer Science, University of Texas at Dallas + Department of Mathematical.

Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Michael Baron + * Department of Computer Science, University of Texas at Dallas + Department of Mathematical Sciences, University of Texas at Dallas MapReduce Guided Approximate Inference Over Graphical Models This material is based upon work supported by University Of Texas at Dallas

2 Agenda University Of Texas at Dallas Brief overview on Inference techniques Problem Proposed Approaches Experiments Discussion

4 Graphical Models University Of Texas at Dallas A probabilistic graphical model G is a collection of functions over a set of random variables. Generally represented as a network of nodes: Each node denoting a random variable (e.g., data feature). Each edge denotes relationship between two random variables. Two types of representations: Bayesian network is represented by directed graph. Markov network is represented by undirected graph.

5 Example Graphical Model University Of Texas at Dallas Inference is needed to evaluate Probability of Evidence, Prior and Posterior Marginal, Most Probable Explanation (MPE) and Maximum a Posteriori (MAP) queries. Probability of Evidence needs to be evaluated in classification problems. AC  (A,C) 005 01100 1015 1120 A B C D E F  (A,C)  (C,E)  (D,F)  (B,D)  (C,D)  (A,B)  (E,F) Sample Factor:

6 Exact Inference University Of Texas at Dallas Exact Inference algorithms, e.g., Variable Elimination provide accurate results for Probability of Evidence. Challenges: Exponential time and space complexity. Computationally intractable on large graphs. Approximate Inference algorithms are used widely in practice to evaluate queries within resource limit. Sampling based, e.g., Gibbs Sampling, Importance Sampling. Propagation based, e.g., Iterative Join Graph Propagation.

7 Adaptive Importance Sampling (AIS) University Of Texas at Dallas

8 RB-AIS University Of Texas at Dallas We focus on a special type of AIS in this paper, called Rao- Blackwellized Adaptive Importance Sampling (RB-AIS). In RB-AIS, a set of variables, X w ⊂ X \ X e (called w-cutset variables) are sampled. X w is chosen in such a way that Exact Inference over X \ X w, X e is tractable. Large |X w | results in quicker evaluation of query but more erroneous result. Small |X w | results in more accurate result but takes more time. Trade off! V. Gogate and R. Dechter, “Approximate inference algorithms for hybrid bayesian networks with discrete constraints.” in UAI. AUAI Press, 2005, pp. 209–216.

9 RB-AIS : Steps University Of Texas at Dallas Start Initial Q on X w Generate Samples Calculate Sample Weights Update Q and Z Converge? End Yes No

11 Problem University Of Texas at Dallas Real world applications require good quality result within the time constraint. Typically, real world networks are large and complex (i.e., large tree width). For instance, if we want to model facebook users using graphical models, it will have billions of nodes in it! Even RB-AIS may run out of time to provide a quality estimate within the time limit. For instance, RB-AIS takes more than 6 hours to find out a single probability of evidence on a network having only 67 nodes and 271 factors.

13 Challenges University Of Texas at Dallas To design a parallel and distributed approach for RB-AIS, following challenges need to be addressed: RB-AIS updates Q periodically. Since values of Q and Z at iteration i depends on those values at iteration i -1, a proper synchronization mechanism is needed. Distributing the task of sample generation on X w over the worker nodes.

14 Proposed Approaches University Of Texas at Dallas We design and implement two MapReduce based approaches for distributed and parallel computation of inference queries using RB-AIS. Distributed Sampling in Mappers (DSM) Parallel sampling. Sequential weight calculation. Each MapReduce Job Unit(MJU) contains only one MapReduce Job. Distributed Weight Calculation in Reducers (DWCR) Parallel sampling. Parallel weight calculation. Each MapReduce Job Unit(MJU) contains two MapReduce Jobs.

15 Distributed Sampling in Mappers (DSM) University Of Texas at Dallas Reducer 1 ( X 1, x 11, Q i [X 1 ] ) n ( X 1, x 1n, Q i [X 1 ] ) Shuffle and Sort: aggregate values by keys X1X1 Q i [x 1 ] Map 1 Input to i th MJU: X w, Q i X2X2 Q i [x 2 ]X3X3 Q i [x 3 ]XmXm Q i [x m ] Z 1 ( X 2, x 21, Q i [X 2 ] ) n ( X 2, x 2n, Q i [X 2 ] ) 1 ( X 3, x 31, Q i [X 3 ] ) n ( X 3, x 3n, Q i [X 3 ] ) 1 ( X m, x m1, Q i [X m ] ) n ( X m, x mn, Q i [X m ] ) s ( X 1, x 1s, Q[X 1 ] )( X 2, x 2s, Q[X 2 ] ) ( X 3, x 3s, Q[X 3 ] ) ( X m, x ms, Q[X m ] ) Update Z, and Q i to Q i+1 Z X1X1 Q i+1 [x 1 ]X2X2 Q i+1 [x 2 ]X3X3 Q i+1 [x 3 ]XmXm Q i+1 [x m ]Z Combine x 1s, x 2s …x ms to form x s, where s = {1,2…n} Map 2Map 3Map m

16 Distributed Weight Calculation in Reducers (DWCR) University Of Texas at Dallas Input to i th MJU: X w, Q i Map 1 Input: X 1 ⊂ X w Output: Partial Samples x 1 ∈ X 1 Map 2 Input: X 1 ⊂ X w Output: Partial Samples x 2 ∈ X 2 Map m Input: X m ⊂ X w Output: Partial Samples x m ∈ X m Reducer Update Z, and Q i to Q i+1 Reducer 1 Combine partial Samples s: x i → x; i ∈ {1….m} Calculate weight Ψ x Reducer 2 Combine partial Samples s: x i → x; i ∈ {1….m} Calculate weight Ψ x Reducer r Combine partial Samples s: x i → x; i ∈ {1….m} Calculate weight Ψ x Map 1 Output Ψ x Map 2 Output Ψ x Map j Output Ψ x

18 Setup University Of Texas at Dallas Performance Metrics: Speedup = T sq /T d T sq = Execution time of sequential approach. T d = Execution time of distributed approach. Scaleup = T s /T p T s = Execution time using single Machine. T p = Execution time using multiple Machines. Hadoop version 1.2.1. 8 data nodes, 1 name node. Each machine has 2.2GHz processor and 4 GB of RAM. Network Number of Nodes Number of Factors 54.wcsp [1] 67271 29.wcsp [1] 82462 404.wcsp [1] 100710 [1] “The probabilistic inference challenge (pic2011),” http://www.cs.huji.ac.il/project/PASCAL/showNet.php, 2011, last updated on 10.23.2014.

19 Speedup University Of Texas at Dallas

20 Scaleup University Of Texas at Dallas

21 Discussion University Of Texas at Dallas Both of the approaches achieve substantial speedup and scaleup comparing with the sequential execution. DWCR has better speedup and scalability than DSM. Weight calculation is computationally more expensive than sample generation. DWCR does both parallel weight calculation and parallel sampling, so it outperforms DSM. Both of the approaches show similar accuracy to the sequential execution asymptotically.

22 University Of Texas at Dallas Questions?

Ahsanul Haque , Swarup Chandra , Latifur Khan * and Michael Baron + * Department of Computer Science, University of Texas at Dallas + Department of Mathematical.

Similar presentations

Presentation on theme: "Ahsanul Haque , Swarup Chandra , Latifur Khan * and Michael Baron + * Department of Computer Science, University of Texas at Dallas + Department of Mathematical."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Michael Baron + * Department of Computer Science, University of Texas at Dallas + Department of Mathematical.

Similar presentations

Presentation on theme: "Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Michael Baron + * Department of Computer Science, University of Texas at Dallas + Department of Mathematical."— Presentation transcript:

Similar presentations

About project

Feedback

Ahsanul Haque , Swarup Chandra , Latifur Khan * and Michael Baron + * Department of Computer Science, University of Texas at Dallas + Department of Mathematical.

Presentation on theme: "Ahsanul Haque , Swarup Chandra , Latifur Khan * and Michael Baron + * Department of Computer Science, University of Texas at Dallas + Department of Mathematical."— Presentation transcript: