Statistical Methods in AI/ML Bucket elimination Vibhav Gogate.

Presentation on theme: "Statistical Methods in AI/ML Bucket elimination Vibhav Gogate."— Presentation transcript:

Statistical Methods in AI/ML Bucket elimination Vibhav Gogate

Bucket Elimination: Initialization A B C D E F  (A,C)  (C,E)  (D,F)  (B,D)  (C,D)  (A,B) You put each function in exactly one bucket How? Along the order, find the first bucket such that one of the variable’s in the function’s scope is the bucket variable AEDFBCAEDFBC  (E,F)

Bucket elimination: Processing Buckets Process in order Multiply all the functions in the bucket Sum-out the bucket variable Put the new function in one of the buckets obeying the initialization constraint A B C D E F  (C,E)  (E,F)  (D,F)  (B,D)  (C,D) AEDFBCAEDFBC ψ(B,C) ψ(C,F) ψ(B,C,F) ψ 2 (B,C) ψ(C)  (A,C)  (A,B) Z

Bucket elimination: Why it works? A B C D E F AEDFBCAEDFBC  (C,E)  (E,F)  (D,F)  (B,D)  (C,D) ψ(B,C) ψ(C,F) ψ(B,C,F) ψ 2 (B,C) ψ(C)  (A,C)  (A,B) Z

Bucket elimination: Why it works? AEDFBCAEDFBC  (C,E)  (E,F)  (D,F)  (B,D)  (C,D) ψ(B,C) ψ(C,F) ψ(B,C,F) ψ 2 (B,C) ψ(C)  (A,C)  (A,B) Z

Bucket elimination: Why it works? AEDFBCAEDFBC  (C,E)  (E,F)  (D,F)  (B,D)  (C,D) ψ(B,C) ψ(C,F) ψ(B,C,F) ψ 2 (B,C) ψ(C)  (A,C)  (A,B) Z

Bucket elimination: Why it works? AEDFBCAEDFBC  (C,E)  (E,F)  (D,F)  (B,D)  (C,D) ψ(B,C) ψ(C,F) ψ(B,C,F) ψ 2 (B,C) ψ(C)  (A,C)  (A,B) Z

Bucket elimination: Why it works? AEDFBCAEDFBC  (C,E)  (E,F)  (D,F)  (B,D)  (C,D) ψ(B,C) ψ(C,F) ψ(B,C,F) ψ 2 (B,C) ψ(C)  (A,C)  (A,B) Z and so on.

Bucket elimination: Complexity AEDFBCAEDFBC  (C,E)  (E,F)  (D,F)  (B,D)  (C,D) ψ(B,C) ψ(C,F) ψ(B,C,F) ψ 2 (B,C) ψ(C)  (A,C)  (A,B) Z exp(3) exp(4) exp(3) exp(2) exp(1) ≈6exp(3) Complexity: O(nexp(w)) w: scope of the largest function generated n:#variables

Bucket elimination: Determining complexity graphically Schematic operation on a graph – Process nodes in order – Connect all children of a node to each other E D F B C A A B C D E F

Bucket elimination: Complexity Complexity of processing a bucket “i” – exp(children i ) Complexity of bucket elimination – nexp(max(children i )) E D F B C A

Treewidth and Tree Decompositions Running schematic bucket elimination yields a chordal graph – Each cycle of length > 3 has a chord (an edge connecting two nodes that are not adjacent in the cycle) Every chordal graph can be represented using a tree decomposition

Tree Decomposition of Chordal graphs E D F B C AABC EFC DBCF FBC BC C FC FBC BC C

Tree Decomposition and Treewidth: Definition Given a network and its interaction graph Tree Decomposition is a set of subset of variables connected by a tree such that: – Each variable is present in at least one subset – Each edge is present in at least one subset – The set of subsets containing a variable “X” form a connected sub-tree Running intersection property Width of a tree decomposition: Cardinality of the maximum subset minus 1 Treewidth: minimum width out of all possible tree decompositions

Bucket elimination: Complexity Best possible complexity: O(nexp(w+1)) where w is the treewidth of the graph Thus, we have a graph-based algorithm for determining the complexity of bucket elimination. If w is small, we can solve the problem efficiently!

Generating Tree Decompositions Computing treewidth is NP-hard Branch and Bound algorithm (Gogate&Dechter, 2004) Best-first search algorithm – (Dow and Korf, 2009) Heuristics in practice – min-fill heuristic – min-degree heuristic

Min-degree and min-fill min-degree – At each point, select a variable with minimum degree (ties broken arbitrarily) – Connect the children of the variable to each other min-fill – At each point, select a variable that adds the minimum number of edges to the current graph – Connect the children of the selected variable to each other

Computing all Marginals Bucket elimination computes – P(e) or Z – P(X i |e) where “X i ” is the last variable eliminated To compute all marginals P(X i |e) for all variables X i – Run bucket elimination “n” times Efficient algorithm – Junction tree algorithm or bucket tree propagation – Requires only two passes to compute all marginals

Junction tree algorithm: An exact message passing algorithm Construct a tree decomposition T Initialize the tree decomposition as in bucket elimination Select an arbitrary node of T as root Pass messages from leaves to root (upward pass) Pass messages from root to leaves (downward pass)

Message passing Equations Multiply all received messages except from R Multiply all functions Sum-out all variables except the separator S R

Computing all marginals S

Message passing Equations Select “EFC” as root Pass messages from leaves to root Pass messages from root to leaves ABC EFC DBCF FBC BC C FC FBC BC C  (C,E)  (E,F)  (D,F)  (B,D)  (C,D)  (A,C)  (A,B)

Architectures Shenoy-Shafer architecture Hugin architecture – Associate one function with each cluster – Requires multiplication – Smaller time complexity – Higher space complexity