The IMAP Hybrid Method for Learning Gaussian Bayes Nets Oliver Schulte School of Computing Science Simon Fraser University Vancouver, Canada

Slides:

Advertisements

Similar presentations

Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.

Advertisements

1 Some Comments on Sebastiani et al Nature Genetics 37(4)2005.

Outline 1)Motivation 2)Representing/Modeling Causal Systems 3)Estimation and Updating 4)Model Search 5)Linear Latent Variable Models 6)Case Study: fMRI.

Learning Bayes Nets Based on Conditional Dependencies Oliver Schulte Simon Fraser University Vancouver, Canada ` with Wei.

CHAPTER 21 Inferential Statistical Analysis. Understanding probability The idea of probability is central to inferential statistics. It means the chance.

. The sample complexity of learning Bayesian Networks Or Zuk*^, Shiri Margel* and Eytan Domany* *Dept. of Physics of Complex Systems Weizmann Inst. of.

. On the Number of Samples Needed to Learn the Correct Structure of a Bayesian Network Or Zuk, Shiri Margel and Eytan Domany Dept. of Physics of Complex.

From the homework: Distribution of DNA fragments generated by Micrococcal nuclease digestion mean(nucs) = bp median(nucs) = 110 bp sd(nucs+ = 17.3.

EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.

Random Regression: Example Target Query: P(gender(sam) = F)? Sam is friends with Bob and Anna. Unnormalized Probability: Oliver Schulte, Hassan Khosravi,

Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.

Statistics II: An Overview of Statistics. Outline for Statistics II Lecture: SPSS Syntax – Some examples. Normal Distribution Curve. Sampling Distribution.

Additional Topics in Regression Analysis

Goal: Reconstruct Cellular Networks Biocarta. Conditions Genes.

Statistics for the Social Sciences Psychology 340 Fall 2006 Review For Exam 1.

Machine Learning CMPT 726 Simon Fraser University

PSYC512: Research Methods PSYC512: Research Methods Lecture 9 Brian P. Dyre University of Idaho.

1 gR2002 Peter Spirtes Carnegie Mellon University.

PSYC512: Research Methods PSYC512: Research Methods Lecture 8 Brian P. Dyre University of Idaho.

Bayesian Networks Alan Ritter.

PY 427 Statistics 1Fall 2006 Kin Ching Kong, Ph.D Lecture 6 Chicago School of Professional Psychology.

Choosing Statistical Procedures

Bayes Net Perspectives on Causation and Causal Inference

AM Recitation 2/10/11.

Overview of Statistical Hypothesis Testing: The z-Test

Hypothesis Testing.

Chapter 8 Introduction to Hypothesis Testing

A Brief Introduction to Graphical Models

Fall 2013 Lecture 5: Chapter 5 Statistical Analysis of Data …yes the “S” word.

Learning Structure in Bayes Nets (Typically also learn CPTs here) Given the set of random variables (features), the space of all possible networks.

Chapter 8: Introduction to Hypothesis Testing. 2 Hypothesis Testing An inferential procedure that uses sample data to evaluate the credibility of a hypothesis.

Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.

Reverse engineering gene regulatory networks Dirk Husmeier Adriano Werhli Marco Grzegorczyk.

Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.

Lecture 5: Chapter 5: Part I: pg Statistical Analysis of Data …yes the “S” word.

Inference Complexity As Learning Bias Daniel Lowd Dept. of Computer and Information Science University of Oregon Joint work with Pedro Domingos.

Learning Linear Causal Models Oksana Kohutyuk ComS 673 Spring 2005 Department of Computer Science Iowa State University.

Experimental Design and Analysis Instructor: Mark Hancock February 8, 2008Slides by Mark Hancock.

EMIS 7300 SYSTEMS ANALYSIS METHODS FALL 2005 Dr. John Lipp Copyright © Dr. John Lipp.

Mind Change Optimal Learning Of Bayes Net Structure Oliver Schulte School of Computing Science Simon Fraser University Vancouver, Canada

Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,

Learning the Structure of Related Tasks Presented by Lihan He Machine Learning Reading Group Duke University 02/03/2006 A. Niculescu-Mizil, R. Caruana.

Correlation Assume you have two measurements, x and y, on a set of objects, and would like to know if x and y are related. If they are directly related,

Experimental Design and Statistics. Scientific Method

Learning With Bayesian Networks Markus Kalisch ETH Zürich.

Inferential Statistics. The Logic of Inferential Statistics Makes inferences about a population from a sample Makes inferences about a population from.

INTRODUCTION TO Machine Learning 3rd Edition

Slides for “Data Mining” by I. H. Witten and E. Frank.

MCMC in structure space MCMC in order space.

Guest lecture: Feature Selection Alan Qi Dec 2, 2004.

Lecture 2: Statistical learning primer for biologists

Introducing Communication Research 2e © 2014 SAGE Publications Chapter Seven Generalizing From Research Results: Inferential Statistics.

Inference Algorithms for Bayes Networks

Learning Bayes Nets Based on Conditional Dependencies Oliver Schulte Department of Philosophy and School of Computing Science Simon Fraser University Vancouver,

CS Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller.

A Cooperative Coevolutionary Genetic Algorithm for Learning Bayesian Network Structures Arthur Carvalho

BIOL 582 Lecture Set 2 Inferential Statistics, Hypotheses, and Resampling.

Hypothesis Tests u Structure of hypothesis tests 1. choose the appropriate test »based on: data characteristics, study objectives »parametric or nonparametric.

Hypothesis Testing and Statistical Significance

Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability Primer Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability.

1 Day 2: Search June 9, 2015 Carnegie Mellon University Center for Causal Discovery.

Statistical principles: the normal distribution and methods of testing Or, “Explaining the arrangement of things”

Bayes Net Learning: Bayesian Approaches

Inferences and Conclusions from Data

Cross-validation for the selection of statistical models

CS 188: Artificial Intelligence Spring 2007

Searching for Graphical Causal Models of Education Data

Statistical Power.

Presentation transcript:

The IMAP Hybrid Method for Learning Gaussian Bayes Nets Oliver Schulte School of Computing Science Simon Fraser University Vancouver, Canada with Gustavo Frigo, Hassan Khosravi (Simon Fraser) and Russ Greiner (U of Alberta) 1/19

Outline Brief Intro to Bayes Nets (BN) BN structure learning: Score-based vs. Constraint-based. Hybrid Design: Combining Dependency (Correlation) Information with Model Selection. Evaluation by Simulations 2/19

Bayes Nets: Overview Bayes Net Structure = Directed Acyclic Graph. Nodes = Variables of Interest. Arcs = direct “influence”, “association”. Parameters = CP Tables = Prob of Child given Parents. Structure represents (in)dependencies. (Structure + parameters) represents joint probability distribution over variables. Many applications: Gene expression, finance, medicine, … 3/19

Example: Alienation Model 4/19 Wheaton et al Structure entails (in)dependencies: Anomia67 dependent on SES. Education independent of any node given SES Can predict any node, e.g. “Probability of Powerless67 = x, given SES = y?” Anomia67Powerless67Anomia71Powerless71 Alienation67 Alienation71 SES EducationSEI

Gaussian Bayes Nets 5/19 aka (recursive) structural equation models. Continous variables. Each child is linear combination of parents, plus normally distributed error ε (mean 0). E.g. Alienation71 = 0.61*Alienation SES + ε Alienation71 SES Alienation ε

Two Approaches to Learning Bayes Net Structure 6/19 Select graph G as model with parameters to be estimated score BIC, AIC balances data fit with model complexity use statistical correlation test (e.g., Fisher’s z) to find (in)dependencies. choose G that entails (in)dependencies in sample. Input: random sample Search and Score Constraint-Based (CB) AB AB Covers correlation between A and B Score = 5 Score = 10 AB ……

Overfitting with Score-based Search 7/19 Anomia67Powerless67Anomia71Powerless71 Alienation67 Alienation71 SES EducationSEI Anomia67Powerless67Anomia71Powerless71 Alienation67 Alienation71 SES EducationSEI Target Model BIC Model

Overfitting with Score-based Search 8/19 Generate random parametrized graphs with different sizes, 30 graphs for each size. Generate random samples of size ,000. Use Bayes Information Criterion (BIC) to learn.

Our Hybrid Approach 9/19 CB: weakness is type II error, false acceptance of independencies.  use only dependencies Schulte et al IEEE CIDM Score-based: produces overly dense graphs  cross-check score with dependencies New idea Let Dep be a set of conditional dependencies extracted from sample d. Move from graph G to G’ only if 1.score increases: S(G’,d) > score(G,d) and 2.G’ entails strictly more dependencies from Dep than G. S(G,d) = score function. A B C Case 1Case 2Case 3 S 10.5 d = sample G

Example for Hybrid Criterion 10/19 A B C A B C Dep(A,B), Dep(A,B|C) sample test score The score prefers the denser graph. The correlation between B and C is not statistically significant. So the hybrid criterion prefers the simpler graph.

Adapting Local Search 11/19 Dep(A,B), Dep(A,B|C), Dep(B,C) sample test test n 2 Markov blanket dependencies: is A independent of B given the Markov blanket of A? n= number of nodes current graph new graph apply local search operator to maximize score while covering additional dependencies any local search can be used inherits convergence properties of local search if test is correct in the sample size limit

Simulation Setup (key results) 12/19 Statistical Test: Fisher z-statistic,  = 5% Score S: BIC (Bayes information score). Search Method: GES [Meek, Chickering 2003]. Random Gaussian DAGs. #Nodes: Sample Sizes , random graphs for each combination of node number with sample size, average results. Graphs generated with Tetrad’s random DAG utility (CMU).

Simulation Results: Number of Edges 13/19 Edge ratio of 1 is ideal. GES = standard score search. IGES = hybrid search. Improvement = ratio(GES) – ratio(IGES).

Simulation Results: f-measure 14/19 Structure fit improves most for >10 nodes. f-measure on correctly placed links combines false positives and negatives: 2 correctly placed/ (2 correctly placed + incorrectly placed + missed edges.)

Real-World Structures: Insurance and Alarm 15/19 Alarm (Beinlich et al. 1989)/Insurance (Binder et al.) 37/25 nodes. Average over 5 samples per sample size. F-measure, higher is better.

False Dependency Rate 16/19 type I error (false correlations) rare, frequency less than 3%.

False Independency/Acceptance Rate 17/19 type II error (false independencies) are quite frequent. e.g., 20% for sample size 1,000.

Conclusion In Gaussian BNs, score-based methods tend to produce overly dense graphs for >10 variables. Key new idea: constrain search s.t. adding edges is justified by fitting more statistically significant correlations. Also: use only dependency information, not independencies (rejections of null hypothesis, not acceptances). For synthetic and real-world target graphs finds less complex graphs, better fit to target graph. 18/19

The End 19/19 Thank you! full paper