04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller.

Slides:



Advertisements
Similar presentations
CS498-EA Reasoning in AI Lecture #15 Instructor: Eyal Amir Fall Semester 2011.
Advertisements

1 Some Comments on Sebastiani et al Nature Genetics 37(4)2005.
Gibbs Sampling Qianji Zheng Oct. 5th, 2010.
EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.
Bayesian Networks Chapter 2 (Duda et al.) – Section 2.11
Software Engineering Laboratory1 Introduction of Bayesian Network 4 / 20 / 2005 CSE634 Data Mining Prof. Anita Wasilewska Hiroo Kusaba.
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
Bayesian Reinforcement Learning with Gaussian Processes Huanren Zhang Electrical and Computer Engineering Purdue University.
Haimonti Dutta, Department Of Computer And Information Science1 David HeckerMann A Tutorial On Learning With Bayesian Networks.
Goal: Reconstruct Cellular Networks Biocarta. Conditions Genes.
Required Sample size for Bayesian network Structure learning
CS 188: Artificial Intelligence Spring 2007 Lecture 14: Bayes Nets III 3/1/2007 Srini Narayanan – ICSI and UC Berkeley.
. Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.
Computer vision: models, learning and inference Chapter 10 Graphical Models.
Causal Models, Learning Algorithms and their Application to Performance Modeling Jan Lemeire Parallel Systems lab November 15 th 2006.
CIS 410/510 Probabilistic Methods for Artificial Intelligence Instructor: Daniel Lowd.
Cristina Manfredotti D.I.S.Co. Università di Milano - Bicocca An Introduction to the Use of Bayesian Network to Analyze Gene Expression Data Cristina Manfredotti.
. Expressive Graphical Models in Variational Approximations: Chain-Graphs and Hidden Variables Tal El-Hay & Nir Friedman School of Computer Science & Engineering.
Bayes Net Perspectives on Causation and Causal Inference
A Brief Introduction to Graphical Models
Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.
Learning Structure in Bayes Nets (Typically also learn CPTs here) Given the set of random variables (features), the space of all possible networks.
Using Bayesian Networks to Analyze Expression Data By Friedman Nir, Linial Michal, Nachman Iftach, Pe'er Dana (2000) Presented by Nikolaos Aravanis Lysimachos.
第十讲 概率图模型导论 Chapter 10 Introduction to Probabilistic Graphical Models
Bayesian Networks What is the likelihood of X given evidence E? i.e. P(X|E) = ?
Data Analysis with Bayesian Networks: A Bootstrap Approach Nir Friedman, Moises Goldszmidt, and Abraham Wyner, UAI99.
Using Bayesian Networks to Analyze Whole-Genome Expression Data Nir Friedman Iftach Nachman Dana Pe’er Institute of Computer Science, The Hebrew University.
Bayesian Networks for Data Mining David Heckerman Microsoft Research (Data Mining and Knowledge Discovery 1, (1997))
Bayesian Statistics and Belief Networks. Overview Book: Ch 13,14 Refresher on Probability Bayesian classifiers Belief Networks / Bayesian Networks.
Introduction to Bayesian Networks
Lectures 2 – Oct 3, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall.
Inference Complexity As Learning Bias Daniel Lowd Dept. of Computer and Information Science University of Oregon Joint work with Pedro Domingos.
Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,
Course files
Learning the Structure of Related Tasks Presented by Lihan He Machine Learning Reading Group Duke University 02/03/2006 A. Niculescu-Mizil, R. Caruana.
Learning With Bayesian Networks Markus Kalisch ETH Zürich.
Announcements Project 4: Ghostbusters Homework 7
1 CMSC 671 Fall 2001 Class #21 – Tuesday, November 13.
Slides for “Data Mining” by I. H. Witten and E. Frank.
Dependency Networks for Collaborative Filtering and Data Visualization UAI-2000 발표 : 황규백.
Exploiting Structure in Probability Distributions Irit Gat-Viks Based on presentation and lecture notes of Nir Friedman, Hebrew University.
Markov Chain Monte Carlo for LDA C. Andrieu, N. D. Freitas, and A. Doucet, An Introduction to MCMC for Machine Learning, R. M. Neal, Probabilistic.
CS 188: Artificial Intelligence Bayes Nets: Approximate Inference Instructor: Stuart Russell--- University of California, Berkeley.
1 Parameter Learning 2 Structure Learning 1: The good Graphical Models – Carlos Guestrin Carnegie Mellon University September 27 th, 2006 Readings:
Inference Algorithms for Bayes Networks
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Rao-Blackwellised Particle Filtering for Dynamic Bayesian Network Arnaud Doucet Nando de Freitas Kevin Murphy Stuart Russell.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Crash Course on Machine Learning Part VI Several slides from Derek Hoiem, Ben Taskar, Christopher Bishop, Lise Getoor.
Density Estimation in R Ha Le and Nikolaos Sarafianos COSC 7362 – Advanced Machine Learning Professor: Dr. Christoph F. Eick 1.
Bayesian Networks Chapter 2 (Duda et al.) – Section 2.11 CS479/679 Pattern Recognition Dr. George Bebis.
Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
CS 2750: Machine Learning Directed Graphical Models
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
Learning Bayesian Network Models from Data
CS 4/527: Artificial Intelligence
Markov Properties of Directed Acyclic Graphs
CSCI 5822 Probabilistic Models of Human and Machine Learning
CAP 5636 – Advanced Artificial Intelligence
Markov Networks.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
Instructors: Fei Fang (This Lecture) and Dave Touretzky
CAP 5636 – Advanced Artificial Intelligence
Bayesian Statistics and Belief Networks
CS 188: Artificial Intelligence
Class #19 – Tuesday, November 3
CS 188: Artificial Intelligence
Markov Networks.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
Presentation transcript:

04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller

04/21/2005 CS673 2 Roadmap Bayesian learning of Bayesian Networks –Exact vs Approximate Learning Markov Chain Monte Carlo method –MCMC over structures –MCMC over orderings Experimental Results Conclusions

04/21/2005 CS673 3 Bayesian Networks Compact representation of probability distributions via conditional independence Qualitative part: Directed acyclic graph-DAG Nodes – random variables Edges – direct influence Together : Define a unique distribution in a factored form Quantitative part: Set of conditional probability distribution EB R A C E BP(A|E,B) e b e !b !e b !e !b P(B,E,A,C,R) =P(B)P(E)P(A|B,E)P(R|E)P(C|A)

04/21/2005 CS673 4 Why Learn Bayesian Networks? Conditional independencies & graphical representation capture the structure of many real-world distributions - Provides insights into domain Graph structure allows “knowledge discovery” Is there a direct connection between X & Y Does X separate between two “subsystems” Does X causally affect Y Bayesian Networks can be used for many tasks –Inference, causality, etc. Examples: scientific data mining - Disease properties and symptoms - Interactions between the expression of genes

04/21/2005 CS673 5 Learning Bayesian Networks Data + Prior Information Inducer EB R A C E BP(A|E,B) e b e !b !e b !e !b Inducer needs the prior probability distribution P(B) Using Bayesian conditioning, update the prior P(B) P(B|D)Inducer needs the prior probability distribution P(B) Using Bayesian conditioning, update the prior P(B) P(B|D)

04/21/2005 CS673 6 Why Struggle for Accurate Structure? AEB S AEB S AEB S “True” structure Adding an arc Missing an arc Increases the number of parameters to be fitted Wrong assumptions about causality and domain structureIncreases the number of parameters to be fitted Wrong assumptions about causality and domain structure Cannot be compensated by accurate fitting of parameters Also misses causality and domain structureCannot be compensated by accurate fitting of parameters Also misses causality and domain structure

04/21/2005 CS673 7 Score-based learning Define scoring function that evaluates how well a structure matches the data EB A E A B E B A E, B, A. Search for a structure that maximizes the score

04/21/2005 CS673 8 Bayesian Score of a Model where Marginal Likelihood Likelihood Prior over parameters

04/21/2005 CS673 9 Discovering Structure – Model Selection P(G|D) EB R A C Current practice: model selection Pick a single high-scoring model Use that model to infer domain structureCurrent practice: model selection Pick a single high-scoring model Use that model to infer domain structure

04/21/2005 CS Discovering Structure – Model Averaging P(G|D) EB R A C EB R A C EB R A C EB R A C EB RA C Problem Small sample size many high scoring models Answer based on one model often useless Want features common to many models Problem Small sample size many high scoring models Answer based on one model often useless Want features common to many models

04/21/2005 CS Bayesian Approach Estimate probability of features –Edge X  Y –Markov edge X -- Y –Path X  …  Y –... Feature of G, e.g., X  Y Indicator function for feature f Bayesian score for G Huge (super-exponential – 2 Θ(n 2 ) ) number of networks G Exact learning - intractable

04/21/2005 CS Approximate Bayesian Learning Restrict the search space to G k, where G k – set of graphs with indegree bounded by k -space still super-exponential Find a set G of high scoring structures –Estimate - Hill-climbing – biased sample of structures

04/21/2005 CS Markov Chain Monte Carlo over Networks MCMC Sampling –Define Markov Chain over BNs –Perform a walk through the chain to get samples G’s whose posteriors converge to the posterior P(G|D) of the true structure Possible pitfalls: –Still super-exponential number of networks –Time for chain to converge to posterior is unknown –Islands of high posterior, connected by low bridges

04/21/2005 CS Better Approach to Approximate Learning Further constraints on the search space –Perform model averaging over the structures consistent with some know (fixed) total ordering ‹ Ordering of variables: –X 1 ‹ X 2 ‹…‹ X n parents for X i must be in X 1, X 2,…, X i-1 Intuition: Order decouples choice of parents –Choice of Pa(X 7 ) does not restrict choice of Pa(X 12 ) Can compute efficiently in closed form Likelihood P(D| ‹ ) Feature probability P(f|D, ‹ )Can compute efficiently in closed form Likelihood P(D| ‹ ) Feature probability P(f|D, ‹ )

04/21/2005 CS Sample Orderings We can write Sample orderings and approximate MCMC Sampling Define Markov Chain over orderings Run chain to get samples from posterior P( < |D)

04/21/2005 CS Experiments: Exact posterior over orders versus order-MCMC

04/21/2005 CS Experiments: Convergence

04/21/2005 CS Experiments: structure-MCMC – posterior correlation for two different runs

04/21/2005 CS Experiments: order-MCMC – posterior correlation for two different runs

04/21/2005 CS Conclusion Order-MCMC better than structure-MCMC

04/21/2005 CS References Being Bayesian about Network Structure: A Bayesian Approach to Structure Discovery in Bayesian Networks, N. Friedman and D. Koller. Machine Learning Journal, 2002 NIPS 2001 Tutorial on learning Bayesian networks from Data. Nir Friedman and Daphne Koller Nir Friedman and Moises Goldzsmidt, AAAI-98 Tutorial on learning Bayesian networks from Data. D. Heckerman. A Tutorial on Learning with Bayesian Networks. In Learning in Graphical Models, M. Jordan, ed.. MIT Press, Cambridge, MA, Also appears as Technical Report MSR-TR-95-06, Microsoft Research, March, An earlier version appears as Bayesian Networks for Data Mining, Data Mining and Knowledge Discovery, 1:79-119, Christophe Andrieu, Nando de Freitas, Arnaud Doucet and Michael I. Jordan. An Introduction to MCMC for Machine Learning. Machine Learning, Artificial Intelligence: A Modern Approach. Stuart Russell and Peter Norvig