Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computing & Information Sciences Kansas State University Friday, 31 Oct 2008CIS 530 / 730: Artificial Intelligence Lecture 26 of 42 Friday, 31 October.

Similar presentations


Presentation on theme: "Computing & Information Sciences Kansas State University Friday, 31 Oct 2008CIS 530 / 730: Artificial Intelligence Lecture 26 of 42 Friday, 31 October."— Presentation transcript:

1 Computing & Information Sciences Kansas State University Friday, 31 Oct 2008CIS 530 / 730: Artificial Intelligence Lecture 26 of 42 Friday, 31 October 2008 William H. Hsu Department of Computing and Information Sciences, KSU KSOL course page: http://snipurl.com/v9v3http://snipurl.com/v9v3 Course web site: http://www.kddresearch.org/Courses/Fall-2008/CIS730http://www.kddresearch.org/Courses/Fall-2008/CIS730 Instructor home page: http://www.cis.ksu.edu/~bhsuhttp://www.cis.ksu.edu/~bhsu Reading for Next Class: Chapter 15, Russell & Norvig 2 nd edition Inference and Software Tools 2 Discussion: MP5, Hugin, BNJ; Term Projects

2 Computing & Information Sciences Kansas State University Friday, 31 Oct 2008CIS 530 / 730: Artificial Intelligence Lecture Outline Wednesday’s Reading: Sections 14.6 – 14.8, R&N 2e, Chapter 15 Friday: Sections 18.1 – 18.2, R&N 2e Today: Inference in Graphical Models  Bayesian networks and causality  Inference and learning  BNJ interface (http://bnj.sourceforge.net)http://bnj.sourceforge.net  Causality

3 Computing & Information Sciences Kansas State University Friday, 31 Oct 2008CIS 530 / 730: Artificial Intelligence Bayes Net Application: Car Diagnosis © 2003 D. Lin, University of Alberta CS 366, Introduction to Artificial Intelligence, http://www.cs.ualberta.ca/~lindek/366/http://www.cs.ualberta.ca/~lindek/366/

4 Computing & Information Sciences Kansas State University Friday, 31 Oct 2008CIS 530 / 730: Artificial Intelligence © 2003 D. Lin, University of Alberta CS 366, Introduction to Artificial Intelligence, http://www.cs.ualberta.ca/~lindek/366/http://www.cs.ualberta.ca/~lindek/366/ Bayes Net Application: Interpreting Mammograms

5 Computing & Information Sciences Kansas State University Friday, 31 Oct 2008CIS 530 / 730: Artificial Intelligence Bayes Net Application: Forecasting Oil Prices (ARCO1) © 2003 D. Lin, University of Alberta CS 366, Introduction to Artificial Intelligence, http://www.cs.ualberta.ca/~lindek/366/http://www.cs.ualberta.ca/~lindek/366/

6 Computing & Information Sciences Kansas State University Friday, 31 Oct 2008CIS 530 / 730: Artificial Intelligence Bayes Net Application: Forecasting Oil Prices – Real Network © 2003 D. Lin, University of Alberta CS 366, Introduction to Artificial Intelligence, http://www.cs.ualberta.ca/~lindek/366/http://www.cs.ualberta.ca/~lindek/366/

7 Computing & Information Sciences Kansas State University Friday, 31 Oct 2008CIS 530 / 730: Artificial Intelligence Bayesian Network Inference Inference: calculating P(X |Y ) for some variables or sets of variables X and Y. Inference in Bayesian networks is #P-hard! Reduces to How many satisfying assignments? I1I2I3I4I5 O Inputs: prior probabilities of.5 P(O) must be (#sat. assign.)*(.5^#inputs) www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt

8 Computing & Information Sciences Kansas State University Friday, 31 Oct 2008CIS 530 / 730: Artificial Intelligence Bayesian Network Inference But…inference is still tractable in some cases. Let’s look a special class of networks: trees / forests in which each node has at most one parent. www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt

9 Computing & Information Sciences Kansas State University Friday, 31 Oct 2008CIS 530 / 730: Artificial Intelligence Decomposing the probabilities Suppose we want P(X i | E ) where E is some set of evidence variables. Let’s split E into two parts:  E i - is the part consisting of assignments to variables in the subtree rooted at X i  E i + is the rest of it XiXi www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt

10 Computing & Information Sciences Kansas State University Friday, 31 Oct 2008CIS 530 / 730: Artificial Intelligence Decomposing the probabilities XiXi Where:  is a constant independent of X i  (X i ) = P(X i |E i + ) (X i ) = P(E i - | X i ) www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt

11 Computing & Information Sciences Kansas State University Friday, 31 Oct 2008CIS 530 / 730: Artificial Intelligence Using the decomposition for inference We can use this decomposition to do inference as follows. First, compute (X i ) = P(E i - | X i ) for all X i recursively, using the leaves of the tree as the base case. If X i is a leaf:  If X i is in E : (X i ) = 0 if X i matches E, 1 otherwise  If X i is not in E : E i - is the null set, so P(E i - | X i ) = 1 (constant) www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt

12 Computing & Information Sciences Kansas State University Friday, 31 Oct 2008CIS 530 / 730: Artificial Intelligence Quick aside: “Virtual evidence” For theoretical simplicity, but without loss of generality, let’s assume that all variables in E (the evidence set) are leaves in the tree. Why can we do this WLOG: XiXi XiXi Xi’Xi’ Observe X i Equivalent to Observe X i ’ Where P( X i ’| X i ) =1 if X i ’=X i, 0 otherwise www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt

13 Computing & Information Sciences Kansas State University Friday, 31 Oct 2008CIS 530 / 730: Artificial Intelligence Calculating (X i ) for non-leaves Suppose X i has one child, X j Then: XiXi XjXj www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt

14 Computing & Information Sciences Kansas State University Friday, 31 Oct 2008CIS 530 / 730: Artificial Intelligence Calculating (X i ) for non-leaves Now, suppose X i has a set of children, C. Since X i d-separates each of its subtrees, the contribution of each subtree to (X i ) is independent: where j (X i ) is the contribution to P(E i - | X i ) of the part of the evidence lying in the subtree rooted at one of X i ’s children X j. www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt

15 Computing & Information Sciences Kansas State University Friday, 31 Oct 2008CIS 530 / 730: Artificial Intelligence We are now -happy So now we have a way to recursively compute all the (X i )’s, starting from the root and using the leaves as the base case. If we want, we can think of each node in the network as an autonomous processor that passes a little “ message” to its parent. www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt

16 Computing & Information Sciences Kansas State University Friday, 31 Oct 2008CIS 530 / 730: Artificial Intelligence The other half of the problem Remember, P(X i |E) =  (X i ) (X i ). Now that we have all the (X i )’s, what about the  (X i )’s?  (X i ) = P(X i |E i + ). What about the root of the tree, X r ? In that case, E r + is the null set, so  (X r ) = P(X r ). No sweat. Since we also know (X r ), we can compute the final P(X r ). So for an arbitrary X i with parent X p, let’s inductively assume we know  (X p ) and/or P(X p |E). How do we get  (X i )? www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt

17 Computing & Information Sciences Kansas State University Friday, 31 Oct 2008CIS 530 / 730: Artificial Intelligence Computing  (X i ) XpXp XiXi Where  i (X p ) is defined as www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt

18 Computing & Information Sciences Kansas State University Friday, 31 Oct 2008CIS 530 / 730: Artificial Intelligence We’re done. Yay! Thus we can compute all the  (X i )’s, and, in turn, all the P(X i |E)’s. Can think of nodes as autonomous processors passing and  messages to their neighbors    www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt

19 Computing & Information Sciences Kansas State University Friday, 31 Oct 2008CIS 530 / 730: Artificial Intelligence Conjunctive queries What if we want, e.g., P(A, B | C) instead of just marginal distributions P(A | C) and P(B | C)? Just use chain rule:  P(A, B | C) = P(A | C) P(B | A, C)  Each of the latter probabilities can be computed using the technique just discussed. www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt

20 Computing & Information Sciences Kansas State University Friday, 31 Oct 2008CIS 530 / 730: Artificial Intelligence Polytrees Technique can be generalized to polytrees: undirected versions of the graphs are still trees, but nodes can have more than one parent www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt

21 Computing & Information Sciences Kansas State University Friday, 31 Oct 2008CIS 530 / 730: Artificial Intelligence Dealing with cycles Can deal with undirected cycles in graph by  clustering variables together  Conditioning A BC D A D BC Set to 0 Set to 1 www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt

22 Computing & Information Sciences Kansas State University Friday, 31 Oct 2008CIS 530 / 730: Artificial Intelligence Join trees Arbitrary Bayesian network can be transformed via some evil graph-theoretic magic into a join tree in which a similar method can be employed. A B ED F C G ABC BCD DF In the worst case the join tree nodes must take on exponentially many combinations of values, but often works well in practice www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt

23 Computing & Information Sciences Kansas State University Friday, 31 Oct 2008CIS 530 / 730: Artificial Intelligence BNJ Visualization: Pseudo-Code Annotation (Code Page) © 2004 KSU BNJ Development Team

24 Computing & Information Sciences Kansas State University Friday, 31 Oct 2008CIS 530 / 730: Artificial Intelligence Graphical Models Overview [1]: Bayesian Networks P(20s, Female, Low, Non-Smoker, No-Cancer, Negative, Negative) = P(T) · P(F) · P(L | T) · P(N | T, F) · P(N | L, N) · P(N | N) · P(N | N) Conditional Independence  X is conditionally independent (CI) from Y given Z (sometimes written X  Y | Z) iff P(X | Y, Z) = P(X | Z) for all values of X, Y, and Z  Example: P(Thunder | Rain, Lightning) = P(Thunder | Lightning)  T  R | L Bayesian (Belief) Network  Acyclic directed graph model B = (V, E,  ) representing CI assertions over   Vertices (nodes) V: denote events (each a random variable)  Edges (arcs, links) E: denote conditional dependencies Markov Condition for BBNs (Chain Rule): Example BBN X1X1 X3X3 X4X4 X5X5 Age Exposure-To-Toxins Smoking Cancer X6X6 Serum Calcium X2X2 Gender X7X7 Lung Tumor

25 Computing & Information Sciences Kansas State University Friday, 31 Oct 2008CIS 530 / 730: Artificial Intelligence Graphical Models Overview [3]: Inference Problem Multiply-connected case: exact, approximate inference are #P-complete Adapted from slides by S. Russell, UC Berkeleyhttp://aima.cs.berkeley.edu/http://aima.cs.berkeley.edu/

26 Computing & Information Sciences Kansas State University Friday, 31 Oct 2008CIS 530 / 730: Artificial Intelligence Polytrees  aka singly-connected Bayesian networks  Definition: a Bayesian network with no undirected loops  Idea: restrict distributions (CPTs) to single nodes  Theorem: inference in singly-connected BBN requires linear time Linear in network size, including CPT sizes Much better than for unrestricted (multiply-connected) BBNs Tree Dependent Distributions  Further restriction of polytrees: every node has at one parent  Now only need to keep 1 prior, P(root), and n - 1 CPTs (1 per node)  All CPTs are 2-dimensional: P(child | parent) Independence Assumptions  As for general BBN: x is independent of non-descendants given (single) parent z  Very strong assumption (applies in some domains but not most) Tree Dependent Distributions x z root

27 Computing & Information Sciences Kansas State University Friday, 31 Oct 2008CIS 530 / 730: Artificial Intelligence Propagation Algorithm in Singly-Connected Bayesian Networks – Pearl (1983) C1C1 C2C2 C3C3 C4C4 C5C5 C6C6 Upward (child-to- parent) messages  ’ (C i ’ ) modified during message-passing phase Downward  messages P ’ (C i ’ ) is computed during  message-passing phase Adapted from Neapolitan (1990), Guo (2000) Multiply-connected case: exact, approximate inference are #P-complete (counting problem is #P-complete iff decision problem is NP-complete)

28 Computing & Information Sciences Kansas State University Friday, 31 Oct 2008CIS 530 / 730: Artificial Intelligence Inference by Clustering [1]: Graph Operations (Moralization, Triangulation, Maximal Cliques) Adapted from Neapolitan (1990), Guo (2000) A D BE G C H F Bayesian Network (Acyclic Digraph) A D BE G C H F Moralize A1A1 D8D8 B2B2 E3E3 G5G5 C4C4 H7H7 F6F6 Triangulate Clq6 D8D8 C4C4 G5G5 H7H7 C4C4 Clq5 G5G5 F6F6 E3E3 Clq4 G5G5 E3E3 C4C4 Clq3 A1A1 B2B2 Clq1 E3E3 C4C4 B2B2 Clq2 Find Maximal Cliques

29 Computing & Information Sciences Kansas State University Friday, 31 Oct 2008CIS 530 / 730: Artificial Intelligence Inference by Clustering [2]: Junction Tree – Lauritzen & Spiegelhalter (1988) Input: list of cliques of triangulated, moralized graph G u Output: Tree of cliques Separators nodes S i, Residual nodes R i and potential probability  (Clq i ) for all cliques Algorithm: 1. S i = Clq i  (Clq 1  Clq 2  …  Clq i-1 ) 2. R i = Clq i - S i 3. If i >1 then identify a j < i such that Clq j is a parent of Clq i 4. Assign each node v to a unique clique Clq i that v  c(v)  Clq i 5. Compute  (Clq i ) =  f(v) Clqi = P(v | c(v)) {1 if no v is assigned to Clq i } 6. Store Clq i, R i, S i, and  (Clq i ) at each vertex in the tree of cliques Adapted from Neapolitan (1990), Guo (2000)

30 Computing & Information Sciences Kansas State University Friday, 31 Oct 2008CIS 530 / 730: Artificial Intelligence Inference by Clustering [3]: Clique-Tree Operations Clq6 D8D8 C4C4G5G5 H7H7 C4C4 Clq5 G5G5 F6F6 E3E3 Clq4 G5G5 E3E3 C4C4 Clq3 A1A1 B2B2 Clq1 E3E3 C4C4 B2B2 Clq2  (Clq5) = P(H|C,G)  (Clq2) = P(D|C) Clq 1 Clq3 = {E,C,G} R3 = {G} S3 = { E,C } Clq1 = {A, B} R1 = {A, B} S1 = {} Clq2 = {B,E,C} R2 = {C,E} S2 = { B } Clq4 = {E, G, F} R4 = {F} S4 = { E,G } Clq5 = {C, G,H} R5 = {H} S5 = { C,G } Clq6 = {C, D} R5 = {D} S5 = { C}  (Clq 1 ) = P(B|A)P(A)  (Clq2) = P(C|B,E)  (Clq3) = 1  (Clq4) = P(E|F)P(G|F)P(F) AB BEC ECG EGF CGH CD B EC CGEG C R i : residual nodes S i : separator nodes  (Clq i ): potential probability of Clique i Clq 2 Clq 3 Clq 4 Clq 5 Clq 6 Adapted from Neapolitan (1990), Guo (2000)

31 Computing & Information Sciences Kansas State University Friday, 31 Oct 2008CIS 530 / 730: Artificial Intelligence Deciding Optimal Cutset: NP -hard Current Open Problems  Bounded cutset conditioning: ordering heuristics  Finding randomized algorithms for loop cutset optimization Inference by Loop Cutset Conditioning Split vertex in undirected cycle; condition upon each of its state values Number of network instantiations: Product of arity of nodes in minimal loop cutset Posterior: marginal conditioned upon cutset variable values X3X3 X4X4 X5X5 Exposure-To- Toxins Smoking Cancer X6X6 Serum Calcium X2X2 Gender X7X7 Lung Tumor X 1,1 Age = [0, 10) X 1,2 Age = [10, 20) X 1,10 Age = [100,  )

32 Computing & Information Sciences Kansas State University Friday, 31 Oct 2008CIS 530 / 730: Artificial Intelligence BNJ Visualization [2] Pseudo-Code Annotation (Code Page) © 2004 KSU BNJ Development Team ALARM Network

33 Computing & Information Sciences Kansas State University Friday, 31 Oct 2008CIS 530 / 730: Artificial Intelligence BNJ Visualization [3] Network © 2004 KSU BNJ Development Team Poker Network

34 Computing & Information Sciences Kansas State University Friday, 31 Oct 2008CIS 530 / 730: Artificial Intelligence Inference by Variable Elimination [1]: Intuition Adapted from slides by S. Russell, UC Berkeley http://aima.cs.berkeley.edu/http://aima.cs.berkeley.edu/

35 Computing & Information Sciences Kansas State University Friday, 31 Oct 2008CIS 530 / 730: Artificial Intelligence Inference by Variable Elimination [2]: Factoring Operations Adapted from slides by S. Russell, UC Berkeley http://aima.cs.berkeley.edu/http://aima.cs.berkeley.edu/

36 Computing & Information Sciences Kansas State University Friday, 31 Oct 2008CIS 530 / 730: Artificial Intelligence [2] Representation Evaluator for Learning Problems Genetic Wrapper for Change of Representation and Inductive Bias Control D: Training Data : Inference Specification D train (Inductive Learning) D val (Inference) [1] Genetic Algorithm α Candidate Representation f(α) Representation Fitness Optimized Representation Genetic Algorithms for Parameter Tuning in Bayesian Network Structure Learning

37 Computing & Information Sciences Kansas State University Friday, 31 Oct 2008CIS 530 / 730: Artificial Intelligence Tools for Building Graphical Models Commercial Tools: Ergo, Netica, TETRAD, Hugin Bayes Net Toolbox (BNT) – Murphy (1997-present)  Distribution page http://http.cs.berkeley.edu/~murphyk/Bayes/bnt.html http://http.cs.berkeley.edu/~murphyk/Bayes/bnt.html  Development group http://groups.yahoo.com/group/BayesNetToolbox http://groups.yahoo.com/group/BayesNetToolbox Bayesian Network tools in Java (BNJ) – Hsu et al. (1999-present)  Distribution page http://bnj.sourceforge.nethttp://bnj.sourceforge.net  Development group http://groups.yahoo.com/group/bndevhttp://groups.yahoo.com/group/bndev  Current (re)implementation projects for KSU KDD Lab Continuous state: Minka (2002) – Hsu, Guo, Li Formats: XML BNIF (MSBN), Netica – Barber, Guo Space-efficient DBN inference – Meyer Bounded cutset conditioning – Chandak

38 Computing & Information Sciences Kansas State University Friday, 31 Oct 2008CIS 530 / 730: Artificial Intelligence References [1]: Graphical Models and Inference Algorithms Graphical Models  Bayesian (Belief) Networks tutorial – Murphy (2001) http://www.cs.berkeley.edu/~murphyk/Bayes/bayes.html http://www.cs.berkeley.edu/~murphyk/Bayes/bayes.html  Learning Bayesian Networks – Heckerman (1996, 1999) http://research.microsoft.com/~heckerman http://research.microsoft.com/~heckerman Inference Algorithms  Junction Tree (Join Tree, L-S, Hugin): Lauritzen & Spiegelhalter (1988) http://citeseer.nj.nec.com/huang94inference.html http://citeseer.nj.nec.com/huang94inference.html  (Bounded) Loop Cutset Conditioning: Horvitz & Cooper (1989) http://citeseer.nj.nec.com/shachter94global.html http://citeseer.nj.nec.com/shachter94global.html  Variable Elimination (Bucket Elimination, ElimBel): Dechter (1986) http://citeseer.nj.nec.com/dechter96bucket.html  Recommended Books Neapolitan (1990) – out of print; see Pearl (1988), Jensen (2001) Castillo, Gutierrez, Hadi (1997) Cowell, Dawid, Lauritzen, Spiegelhalter (1999)  Stochastic Approximation http://citeseer.nj.nec.com/cheng00aisbn.html http://citeseer.nj.nec.com/cheng00aisbn.html

39 Computing & Information Sciences Kansas State University Friday, 31 Oct 2008CIS 530 / 730: Artificial Intelligence References [2]: Machine Learning, KDD, and Bioinformatics Machine Learning, Data Mining, and Knowledge Discovery  K-State KDD Lab: literature survey and resource catalog (1999-present) http://www.kddresearch.org/Resources http://www.kddresearch.org/Resources  Bayesian Network tools in Java (BNJ): Hsu, Barber, King, Meyer, Thornton (2002-present) http://bnj.sourceforge.net http://bnj.sourceforge.net  Machine Learning in Java (BNJ): Hsu, Louis, Plummer (2002) http://mldev.sourceforge.net http://mldev.sourceforge.net Bioinformatics  European Bioinformatics Institute Tutorial: Brazma et al. (2001) http://www.ebi.ac.uk/microarray/biology_intro.htm http://www.ebi.ac.uk/microarray/biology_intro.htm  Hebrew University: Friedman, Pe’er, et al. (1999, 2000, 2002) http://www.cs.huji.ac.il/labs/compbio/ http://www.cs.huji.ac.il/labs/compbio/  K-State BMI Group: literature survey and resource catalog (2002-2005) http://www.kddresearch.org/Groups/Bioinformatics http://www.kddresearch.org/Groups/Bioinformatics

40 Computing & Information Sciences Kansas State University Friday, 31 Oct 2008CIS 530 / 730: Artificial Intelligence Terminology Introduction to Reasoning under Uncertainty  Probability foundations  Definitions: subjectivist, frequentist, logicist  (3) Kolmogorov axioms Bayes’s Theorem  Prior probability of an event  Joint probability of an event  Conditional (posterior) probability of an event Maximum A Posteriori (MAP) and Maximum Likelihood (ML) Hypotheses  MAP hypothesis: highest conditional probability given observations (data)  ML: highest likelihood of generating the observed data  ML estimation (MLE): estimating parameters to find ML hypothesis Bayesian Inference: Computing Conditional Probabilities (CPs) in A Model Bayesian Learning: Searching Model (Hypothesis) Space using CPs

41 Computing & Information Sciences Kansas State University Friday, 31 Oct 2008CIS 530 / 730: Artificial Intelligence Summary Points Introduction to Probabilistic Reasoning  Framework: using probabilistic criteria to search H  Probability foundations  Definitions: subjectivist, objectivist; Bayesian, frequentist, logicist  Kolmogorov axioms Bayes’s Theorem  Definition of conditional (posterior) probability  Product rule Maximum A Posteriori (MAP) and Maximum Likelihood (ML) Hypotheses  Bayes’s Rule and MAP  Uniform priors: allow use of MLE to generate MAP hypotheses  Relation to version spaces, candidate elimination Next Week: Chapter 14, Russell and Norvig  Later: Bayesian learning: MDL, BOC, Gibbs, Simple (Naïve) Bayes  Categorizing text and documents, other applications


Download ppt "Computing & Information Sciences Kansas State University Friday, 31 Oct 2008CIS 530 / 730: Artificial Intelligence Lecture 26 of 42 Friday, 31 October."

Similar presentations


Ads by Google