Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Distributed Bucket Elimination Algorithm

Similar presentations


Presentation on theme: "A Distributed Bucket Elimination Algorithm"— Presentation transcript:

1 A Distributed Bucket Elimination Algorithm
Andrew Gelfand William Lam CS 230 Project – March 15, 2011

2 Motivation Elimination is exponential in induced width
Ex: Consider a Bayesian Network (BN) with: Binary variables, k=2 Induced Width, w*=20 Largest table has 219 entries Requires only 4 MB of memory With k=3, need 9 GB of memory! Elimination (in particular BE) is a common, simple and non-confounded algorithm for exact inference Efficient, non-scalable algorithm – either solve rapidly or run out of memory => Real bottleneck of exact elimination algorithms is space!

3 Idea Extend available memory and processing power by using multiple machines Framework Message Passing Interface (MPI) Master/Worker Paradigm Not without cost Communication overhead Challenge: Perform BE for Pr(e) w/ a distributed system Minimize communication overhead

4 Bucket Elimination [Dechter96]
Query: Elimination Order: d,e,b,c Original Functions Messages Bucket Tree D E D,A,B E,B,C D: E: B: C: A: B B,A,C BE is a unifying algorithmic framework for probabilistic inference that organizes computations using ‘buckets’ A bucket is an organizational device that contains a set of functions, either the original functions/CPTs or functions generated by the algorithm Say, want to compute Pr(a|e=0), given elimination order d,e,b,c Using BE, we process as follows: 1) Partition/Assign original functions/CPTs into ‘buckets’ using the specified elimination order; 2) Process from top to bottom, eliminating the variable in the bucket from subsequent computations BE is also a special case of tree elimination in which the tree-structure upon which messages are passed, the bucket tree, is determined by the variable elimination order used Nodes of tree are referred to as buckets BE processes along bucket tree from leaves to root – at each bucket performing two steps: 1) Combination; and 2) Elimination C C,A Time/Space is O(exp(w*)) A A

5 Processing a Bucket Σ = X X20 RAM
Further illustration of combination and elimination steps Problem occurs when intermediate functions don’t fit into memory Function f1 over variables X1…X20; function f2 over variables x20…X30; combine and eliminate variable X20 yielding function f3 Recognize fact that this is true for BE and that other algorithms need only store in size of separator

6 Processing a Bucket Σ = X X20 Σx20 f1,1 x f2,1 = f3,1
RAM Σ = X X20 f1,1 f2,1 f3,1 f1,2 f2,2 f3,2 1 RAM Σx20 f1,1 x f2,1 = f3,1 Proc. 1 2 Σx20 f1,1 x f2,2 = f3,2 Proc. 2

7 Processing a Bucket Σ = X
RAM Σ = X X20 Strategy: Decompose f into blocks & compute on m threads Strategy: Decompose each f into blocks and compute piecemeal on m worker nodes f1,1 f2,1 f3,1 f1,2 f2,2 f3,2 1 RAM Σx20 f1,1 x f2,1 = f3,1 Proc. 1 2 Σx20 f1,1 x f2,2 = f3,2 Proc. 2 7 7

8 Algorithm Design Issues
How should function tables be decomposed? Task 1: Function Table Decomposition In what order should blocks be processed? Task 2: Job Scheduling

9 Task 1: Function Table Decomposition
Variable order dictates location of entries within a table/block h(Y,X2,X1) h(X1,X2,Y) f(X1,X2)=∑Yh(Y,X2,X1) f(X1,X2) Poor Ordering Good Ordering Ordering Rule: Variable to be eliminated last in scope of h Order of remaining variables in scope of h agree with scope of f All of a tables’ entries are enumerated and assigned an index consistent with the order of the variables in the functions scope Ex: need entries 3, 12, 21 of h to compute entry 1 of f(X1,X2) Assuming blocks of h1 are only 8 entries in size, then entries (3, 12, 21) would reside in different blocks, requiring 3 load/unloads Worse yet, when we go to compute entry 2 of f(X1,X2) we will have just unloaded block 1 of h1 containing entry 6. This is thrashing at its worst.

10 Task 2: Job Scheduling Assign each block i to a processor j
Send ‘input’ blocks as messages Computation imposes scheduling constraints Ex: Formulate as an IP Proc 1 Proc 2 D > B, E > B, B > C, C > A

11 Preliminary Results Tested on a small problem “pedigree1”(w* = 15)

12 Conclusion Described extension of BE utilizing multiple machines
Identified and addressed key design issues Decomposition of functions Job scheduling Future experiments Test on more problems, specifically ones with structure suitable for parallelization


Download ppt "A Distributed Bucket Elimination Algorithm"

Similar presentations


Ads by Google