Presentation is loading. Please wait.

Presentation is loading. Please wait.

04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller.

Similar presentations


Presentation on theme: "04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller."— Presentation transcript:

1 04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller

2 04/21/2005 CS673 2 Roadmap Bayesian learning of Bayesian Networks –Exact vs Approximate Learning Markov Chain Monte Carlo method –MCMC over structures –MCMC over orderings Experimental Results Conclusions

3 04/21/2005 CS673 3 Bayesian Networks Compact representation of probability distributions via conditional independence Qualitative part: Directed acyclic graph-DAG Nodes – random variables Edges – direct influence Together : Define a unique distribution in a factored form Quantitative part: Set of conditional probability distribution EB R A C E BP(A|E,B) e b e !b !e b !e !b 0.9 0.1 0.2 0.8 0.9 0.1 0.01 0.99 P(B,E,A,C,R) =P(B)P(E)P(A|B,E)P(R|E)P(C|A)

4 04/21/2005 CS673 4 Why Learn Bayesian Networks? Conditional independencies & graphical representation capture the structure of many real-world distributions - Provides insights into domain Graph structure allows “knowledge discovery” Is there a direct connection between X & Y Does X separate between two “subsystems” Does X causally affect Y Bayesian Networks can be used for many tasks –Inference, causality, etc. Examples: scientific data mining - Disease properties and symptoms - Interactions between the expression of genes

5 04/21/2005 CS673 5 Learning Bayesian Networks Data + Prior Information Inducer EB R A C E BP(A|E,B) e b e !b !e b !e !b 0.9 0.1 0.2 0.8 0.9 0.1 0.01 0.99 Inducer needs the prior probability distribution P(B) Using Bayesian conditioning, update the prior P(B) P(B|D)Inducer needs the prior probability distribution P(B) Using Bayesian conditioning, update the prior P(B) P(B|D)

6 04/21/2005 CS673 6 Why Struggle for Accurate Structure? AEB S AEB S AEB S “True” structure Adding an arc Missing an arc Increases the number of parameters to be fitted Wrong assumptions about causality and domain structureIncreases the number of parameters to be fitted Wrong assumptions about causality and domain structure Cannot be compensated by accurate fitting of parameters Also misses causality and domain structureCannot be compensated by accurate fitting of parameters Also misses causality and domain structure

7 04/21/2005 CS673 7 Score-based learning Define scoring function that evaluates how well a structure matches the data EB A E A B E B A E, B, A. Search for a structure that maximizes the score

8 04/21/2005 CS673 8 Bayesian Score of a Model where Marginal Likelihood Likelihood Prior over parameters

9 04/21/2005 CS673 9 Discovering Structure – Model Selection P(G|D) EB R A C Current practice: model selection Pick a single high-scoring model Use that model to infer domain structureCurrent practice: model selection Pick a single high-scoring model Use that model to infer domain structure

10 04/21/2005 CS673 10 Discovering Structure – Model Averaging P(G|D) EB R A C EB R A C EB R A C EB R A C EB RA C Problem Small sample size many high scoring models Answer based on one model often useless Want features common to many models Problem Small sample size many high scoring models Answer based on one model often useless Want features common to many models

11 04/21/2005 CS673 11 Bayesian Approach Estimate probability of features –Edge X  Y –Markov edge X -- Y –Path X  …  Y –... Feature of G, e.g., X  Y Indicator function for feature f Bayesian score for G Huge (super-exponential – 2 Θ(n 2 ) ) number of networks G Exact learning - intractable

12 04/21/2005 CS673 12 Approximate Bayesian Learning Restrict the search space to G k, where G k – set of graphs with indegree bounded by k -space still super-exponential Find a set G of high scoring structures –Estimate - Hill-climbing – biased sample of structures

13 04/21/2005 CS673 13 Markov Chain Monte Carlo over Networks MCMC Sampling –Define Markov Chain over BNs –Perform a walk through the chain to get samples G’s whose posteriors converge to the posterior P(G|D) of the true structure Possible pitfalls: –Still super-exponential number of networks –Time for chain to converge to posterior is unknown –Islands of high posterior, connected by low bridges

14 04/21/2005 CS673 14 Better Approach to Approximate Learning Further constraints on the search space –Perform model averaging over the structures consistent with some know (fixed) total ordering ‹ Ordering of variables: –X 1 ‹ X 2 ‹…‹ X n parents for X i must be in X 1, X 2,…, X i-1 Intuition: Order decouples choice of parents –Choice of Pa(X 7 ) does not restrict choice of Pa(X 12 ) Can compute efficiently in closed form Likelihood P(D| ‹ ) Feature probability P(f|D, ‹ )Can compute efficiently in closed form Likelihood P(D| ‹ ) Feature probability P(f|D, ‹ )

15 04/21/2005 CS673 15 Sample Orderings We can write Sample orderings and approximate MCMC Sampling Define Markov Chain over orderings Run chain to get samples from posterior P( < |D)

16 04/21/2005 CS673 16 Experiments: Exact posterior over orders versus order-MCMC

17 04/21/2005 CS673 17 Experiments: Convergence

18 04/21/2005 CS673 18 Experiments: structure-MCMC – posterior correlation for two different runs

19 04/21/2005 CS673 19 Experiments: order-MCMC – posterior correlation for two different runs

20 04/21/2005 CS673 20 Conclusion Order-MCMC better than structure-MCMC

21 04/21/2005 CS673 21 References Being Bayesian about Network Structure: A Bayesian Approach to Structure Discovery in Bayesian Networks, N. Friedman and D. Koller. Machine Learning Journal, 2002 NIPS 2001 Tutorial on learning Bayesian networks from Data. Nir Friedman and Daphne Koller Nir Friedman and Moises Goldzsmidt, AAAI-98 Tutorial on learning Bayesian networks from Data. D. Heckerman. A Tutorial on Learning with Bayesian Networks. In Learning in Graphical Models, M. Jordan, ed.. MIT Press, Cambridge, MA, 1999. Also appears as Technical Report MSR-TR-95-06, Microsoft Research, March, 1995. An earlier version appears as Bayesian Networks for Data Mining, Data Mining and Knowledge Discovery, 1:79-119, 1997. Christophe Andrieu, Nando de Freitas, Arnaud Doucet and Michael I. Jordan. An Introduction to MCMC for Machine Learning. Machine Learning, 2002. Artificial Intelligence: A Modern Approach. Stuart Russell and Peter Norvig


Download ppt "04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller."

Similar presentations


Ads by Google