# KSU Math Department Colloquium

## Presentation on theme: "KSU Math Department Colloquium"— Presentation transcript:

KSU Math Department Colloquium
Graphical Models of Probability for Causal Reasoning Thursday 07 November 2002 (revised 09 December 2003) William H. Hsu Laboratory for Knowledge Discovery in Databases Department of Computing and Information Sciences Kansas State University This presentation is:

Overview Graphical Models of Probability
Markov graphs Bayesian (belief) networks Causal semantics Direction-dependent separation (d-separation) property Learning and Reasoning: Problems, Algorithms Inference: exact and approximate Junction tree – Lauritzen and Spiegelhalter (1988) (Bounded) loop cutset conditioning – Horvitz and Cooper (1989) Variable elimination – Dechter (1996) Structure learning K2 algorithm – Cooper and Herskovits (1992) Variable ordering problem – Larannaga (1996), Hsu et al. (2002) Probabilistic Reasoning in Machine Learning, Data Mining Current Research and Open Problems

Stages of Data Mining and Knowledge Discovery in Databases

Graphical Models Overview [1]: Bayesian Networks
Conditional Independence X is conditionally independent (CI) from Y given Z (sometimes written X  Y | Z) iff P(X | Y, Z) = P(X | Z) for all values of X, Y, and Z Example: P(Thunder | Rain, Lightning) = P(Thunder | Lightning)  T  R | L Bayesian (Belief) Network Acyclic directed graph model B = (V, E, ) representing CI assertions over  Vertices (nodes) V: denote events (each a random variable) Edges (arcs, links) E: denote conditional dependencies Markov Condition for BBNs (Chain Rule): Example BBN X1 X3 X4 X5 Age Exposure-To-Toxins Smoking Cancer X6 Serum Calcium X2 Gender X7 Lung Tumor P(20s, Female, Low, Non-Smoker, No-Cancer, Negative, Negative) = P(T) · P(F) · P(L | T) · P(N | T, F) · P(N | L, N) · P(N | N) · P(N | N)

From S. Russell & P. Norvig (1995) Adapted from J. Schlabach (1996)
Graphical Models Overview [2]: Markov Blankets and d-Separation Property Motivation: The conditional independence status of nodes within a BBN might change as the availability of evidence E changes. Direction-dependent separation (d-separation) is a technique used to determine conditional independence of nodes as evidence changes. Definition: A set of evidence nodes E d-separates two sets of nodes X and Y if every undirected path from a node in X to a node in Y is blocked given E. A path is blocked if one of three conditions holds: From S. Russell & P. Norvig (1995) Z X E Y (1) (2) (3) Adapted from J. Schlabach (1996)

Graphical Models Overview [3]: Inference Problem
Multiply-connected case: exact, approximate inference are #P-complete Adapted from slides by S. Russell, UC Berkeley

Other Topics in Graphical Models [1]: Temporal Probabilistic Reasoning
Goal: Estimate Filtering: r = t Intuition: infer current state from observations Applications: signal identification Variation: Viterbi algorithm Prediction: r < t Intuition: infer future state Applications: prognostics Smoothing: r > t Intuition: infer past hidden state Applications: signal enhancement CF Tasks Plan recognition by smoothing Prediction cf. WebCANVAS – Cadez et al. (2000) Adapted from Murphy (2001), Guo (2002)

Other Topics in Graphical Models [2]: Learning Structure from Data
General-Case BBN Structure Learning: Use Inference to Compute Scores Optimal Strategy: Bayesian Model Averaging Assumption: models h  H are mutually exclusive and exhaustive Combine predictions of models in proportion to marginal likelihood Compute conditional probability of hypothesis h given observed data D i.e., compute expectation over unknown h for unseen cases Let h  structure, parameters   CPTs Posterior Score Marginal Likelihood Prior over Parameters Prior over Structures Likelihood

Propagation Algorithm in Singly-Connected Bayesian Networks – Pearl (1983)
Upward (child-to-parent)  messages ’ (Ci’) modified during  message-passing phase Downward  messages P’ (Ci’) is computed during  message-passing phase Multiply-connected case: exact, approximate inference are #P-complete (counting problem is #P-complete iff decision problem is NP-complete) Adapted from Neapolitan (1990), Guo (2000)

Adapted from Neapolitan (1990), Guo (2000)
Inference by Clustering [1]: Graph Operations (Moralization, Triangulation, Maximal Cliques) Clq6 D8 C4 G5 H7 Clq5 F6 E3 Clq4 Clq3 A1 B2 Clq1 Clq2 Find Maximal Cliques A1 D8 B2 E3 G5 C4 H7 F6 Triangulate A D B E G C H F Moralize A D B E G C H F Bayesian Network (Acyclic Digraph) Adapted from Neapolitan (1990), Guo (2000)

Adapted from Neapolitan (1990), Guo (2000)
Inference by Clustering [2]: Junction Tree – Lauritzen & Spiegelhalter (1988) Input: list of cliques of triangulated, moralized graph Gu Output: Tree of cliques Separators nodes Si, Residual nodes Ri and potential probability (Clqi) for all cliques Algorithm: 1. Si = Clqi (Clq1  Clq2 … Clqi-1) 2. Ri = Clqi - Si 3. If i >1 then identify a j < i such that Clqj is a parent of Clqi 4. Assign each node v to a unique clique Clqi that v  c(v)  Clqi 5. Compute (Clqi) = f(v) Clqi = P(v | c(v)) {1 if no v is assigned to Clqi} 6. Store Clqi , Ri , Si, and (Clqi) at each vertex in the tree of cliques Adapted from Neapolitan (1990), Guo (2000)

Inference by Clustering [3]: Clique-Tree Operations
(Clq5) = P(H|C,G) (Clq2) = P(D|C) Clq1 Clq3 = {E,C,G} R3 = {G} S3 = { E,C } Clq1 = {A, B} R1 = {A, B} S1 = {} Clq2 = {B,E,C} R2 = {C,E} S2 = { B } Clq4 = {E, G, F} R4 = {F} S4 = { E,G } Clq5 = {C, G,H} R5 = {H} S5 = { C,G } Clq6 = {C, D} R5 = {D} S5 = { C} (Clq1) = P(B|A)P(A) (Clq2) = P(C|B,E) (Clq3) = 1 (Clq4) = P(E|F)P(G|F)P(F) AB BEC ECG EGF CGH CD B EC CG EG C Ri: residual nodes Si: separator nodes (Clqi): potential probability of Clique i Clq2 Clq3 Clq4 Clq5 Clq6 A1 B2 Clq1 G5 F6 E3 Clq4 E3 C4 B2 Clq2 G5 E3 C4 Clq3 G5 H7 C4 Clq5 Clq6 D8 C4 Adapted from Neapolitan (1990), Guo (2000)

Inference by Loop Cutset Conditioning
Deciding Optimal Cutset: NP-hard Current Open Problems Bounded cutset conditioning: ordering heuristics Finding randomized algorithms for loop cutset optimization X1,1 Age = [0, 10) Split vertex in undirected cycle; condition upon each of its state values X1,2 Age = [10, 20) Exposure-To- Toxins Serum Calcium Number of network instantiations: Product of arity of nodes in minimal loop cutset X3 Cancer X6 X1,10 Age = [100, ) X5 X4 X7 Smoking Lung Tumor X2 Gender Posterior: marginal conditioned upon cutset variable values

Inference by Variable Elimination [1]: Intuition
Adapted from slides by S. Russell, UC Berkeley

Inference by Variable Elimination [2]: Factoring Operations
Adapted from slides by S. Russell, UC Berkeley

Inference by Variable Elimination [3]: Example
P(A), P(B|A), P(C|A), P(D|B,A), P(F|B,C), P(G|F) A B C F G Season Sprinkler Rain Wet Slippery D Manual Watering G D F B C A P(G|F) G=1 P(D|B,A) P(F|B,C) P(B|A) P(C|A) P(A) P(A|G=1) = ? d = < A, C, B, F, D, G > λG(f) = ΣG=1 P(G|F) Adapted from Dechter (1996), Joehanes (2002)

Genetic Algorithms for Parameter Tuning in Bayesian Network Structure Learning
[2] Representation Evaluator for Learning Problems Genetic Wrapper for Change of Representation and Inductive Bias Control D: Training Data : Inference Specification Dtrain (Inductive Learning) Dval (Inference) [1] Genetic Algorithm α Candidate Representation f(α) Fitness Optimized

Computational Genomics and Microarray Gene Expression Modeling
Learning Environment [A] Structure Learning G1 G2 G3 G4 G5 D: Data (User, Microarray) [B] Parameter Estimation G1 G2 G3 G4 G5 G = (V, E) Treatment 1 (Control) Treatment 2 (Pathogen) Messenger RNA (mRNA) Extract 1 (mRNA) Extract 2 cDNA DNA Hybridization Microarray (under LASER) Specification Fitness (Inferential Loss) B = (V, E, ) Dval (Model Validation by Inference) Adapted from Friedman et al. (2000)

DESCRIBER: An Experimental Intelligent Filter
Domain-Specific Workflow Repositories Workflows Transactional, Objective Views Workflow Components Data Sources, Transformations; Other Services Data Entity, Service, and Component Repository Index for Bioinformatics Experimental Research Learning over Workflow Instances and Use Cases (Historical User Requirements) Use Case & Query/Evaluation Data Personalized Interface Domain-Specific Collaborative Recommendation User Queries & Evaluations Decision Support Models Users of Scientific Workflow Repository Interface(s) to Distributed Repository Example Queries: What experiments have found cell cycle-regulated metabolic pathways in Saccharomyces? What codes and microarray data were used? How and why?

Relational Graphical Models in DESCRIBER
RGMs of Queries Module 4 Learning & Validation of RGMs for User Requirements Complete RGMs of User Queries Module 1 Collaborative Recommendation Front-End Personalized Interface Module 5 RGM Parameters from User Query Data Module 3 Estimation of RGM Parameters from Workflow and Component Database Workflows Complete RGMs of Workflows (Data-Oriented) Recommendations/Evaluations (Before and After Use) User Module 2 Learning & Validation of Relational Graphical Models (RGMs) for Experimental Workflows and Components Workflow Logs, Instances, Templates, Components (Services, Data Sources) Training Data Structure & Data Training Structure & Data

Tools for Building Graphical Models
Commercial Tools: Ergo, Netica, TETRAD, Hugin Bayes Net Toolbox (BNT) – Murphy (1997-present) Distribution page Development group Bayesian Network tools in Java (BNJ) – Hsu et al. (1999-present) Distribution page Development group Current (re)implementation projects for KSU KDD Lab Continuous state: Minka (2002) – Hsu, Guo, Perry, Boddhireddy Formats: XML BNIF (MSBN), Netica – Guo, Hsu Space-efficient DBN inference – Joehanes Bounded cutset conditioning – Chandak

References [1]: Graphical Models and Inference Algorithms
Bayesian (Belief) Networks tutorial – Murphy (2001) Learning Bayesian Networks – Heckerman (1996, 1999) Inference Algorithms Junction Tree (Join Tree, L-S, Hugin): Lauritzen & Spiegelhalter (1988) (Bounded) Loop Cutset Conditioning: Horvitz & Cooper (1989) Variable Elimination (Bucket Elimination, ElimBel): Dechter (1986) Recommended Books Neapolitan (1990) – out of print; see Pearl (1988), Jensen (2001) Castillo, Gutierrez, Hadi (1997) Cowell, Dawid, Lauritzen, Spiegelhalter (1999) Stochastic Approximation

References [2]: Machine Learning, KDD, and Bioinformatics
Machine Learning, Data Mining, and Knowledge Discovery K-State KDD Lab: literature survey and resource catalog (2002) Bayesian Network tools in Java (BNJ): Hsu, Guo, Joehanes, Perry, Thornton (2002) Machine Learning in Java (BNJ): Hsu, Louis, Plummer (2002) NCSA Data to Knowledge (D2K): Welge, Redman, Auvil, Tcheng, Hsu Bioinformatics European Bioinformatics Institute Tutorial: Brazma et al. (2001) Hebrew University: Friedman, Pe’er, et al. (1999, 2000, 2002) K-State BMI Group: literature survey and resource catalog (2002)

Acknowledgements Kansas State University Lab for Knowledge Discovery in Databases Graduate research assistants: Haipeng Guo Roby Joehanes Other grad students: Prashanth Boddhireddy, Siddharth Chandak, Ben B. Perry, Rengakrishnan Subramanian Undergraduate programmers: James W. Plummer, Julie A. Thornton Joint Work with KSU Bioinformatics and Medical Informatics (BMI) group: Sanjoy Das (EECE), Judith L. Roe (Biology), Stephen M. Welch (Agronomy) KSU Microarray group: Scot Hulbert (Plant Pathology), J. Clare Nelson (Plant Pathology), Jan Leach (Plant Pathology) Kansas Geological Survey, Kansas Biological Survey, KU EECS Other Research Partners NCSA Automated Learning Group (Michael Welge, Tom Redman, David Clutter, Lisa Gatzke) The Institute for Genomic Research (John Quackenbush, Alex Saeed) University of Manchester (Carole Goble, Robert Stevens) International Rice Research Institute (Richard Bruskiewich)