Learning Bayesian Networks. Dimensions of Learning ModelBayes netMarkov net DataCompleteIncomplete StructureKnownUnknown ObjectiveGenerativeDiscriminative.

Slides:

Advertisements

Similar presentations

Pattern Recognition and Machine Learning

Advertisements

Basics of Statistical Estimation

A Tutorial on Learning with Bayesian Networks

Learning with Missing Data

Probabilistic models Haixu Tang School of Informatics.

1 Some Comments on Sebastiani et al Nature Genetics 37(4)2005.

Learning: Parameter Estimation

Learning Bayesian Networks. Dimensions of Learning ModelBayes netMarkov net DataCompleteIncomplete StructureKnownUnknown ObjectiveGenerativeDiscriminative.

Flipping A Biased Coin Suppose you have a coin with an unknown bias, θ ≡ P(head). You flip the coin multiple times and observe the outcome. From observations,

Lecture (10) Mathematical Expectation. The expected value of a variable is the value of a descriptor when averaged over a large number theoretically infinite.

Learning in Bayes Nets Task 1: Given the network structure and given data, where a data point is an observed setting for the variables, learn the CPTs.

An Overview of Learning Bayes Nets From Data Chris Meek Microsoft Researchhttp://research.microsoft.com/~meek.

Parameter Estimation using likelihood functions Tutorial #1

Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.

. Learning Bayesian networks Slides by Nir Friedman.

1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.

Probabilistic Graphical Models Tool for representing complex systems and performing sophisticated reasoning tasks Fundamental notion: Modularity Complex.

Basics of Statistical Estimation. Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown.

Haimonti Dutta, Department Of Computer And Information Science1 David HeckerMann A Tutorial On Learning With Bayesian Networks.

Presenting: Assaf Tzabari

. PGM: Tirgul 10 Parameter Learning and Priors. 2 Why learning? Knowledge acquisition bottleneck u Knowledge acquisition is an expensive process u Often.

Learning Bayesian Networks

Thanks to Nir Friedman, HU

Learning Bayesian Networks (From David Heckerman’s tutorial)

Learning In Bayesian Networks. Learning Problem Set of random variables X = {W, X, Y, Z, …} Training set D = { x 1, x 2, …, x N }  Each observation specifies.

Crash Course on Machine Learning

Recitation 1 Probability Review

Chapter Two Probability Distributions: Discrete Variables

Bayesian Inference Ekaterina Lomakina TNU seminar: Bayesian inference 1 March 2013.

1 Bayesian Param. Learning Bayesian Structure Learning Graphical Models – Carlos Guestrin Carnegie Mellon University October 6 th, 2008 Readings:

Bayesian Learning Chapter Some material adapted from lecture notes by Lise Getoor and Ron Parr.

Likelihood function and Bayes Theorem In simplest case P(B|A) = P(A|B) P(B)/P(A) and we consider the likelihood function in which we view the conditional.

Elementary manipulations of probabilities Set probability of multi-valued r.v. P({x=Odd}) = P(1)+P(3)+P(5) = 1/6+1/6+1/6 = ½ Multi-variant distribution:

Instructor: Eyal Amir Grad TAs: Wen Pu, Yonatan Bisk Undergrad TAs: Sam Johnson, Nikhil Johri CS 440 / ECE 448 Introduction to Artificial Intelligence.

- 1 - Bayesian inference of binomial problem Estimating a probability from binomial data –Objective is to estimate unknown proportion (or probability of.

Learning With Bayesian Networks Markus Kalisch ETH Zürich.

Announcements Spring Courses Somewhat Relevant to Machine Learning 5314: Algorithms for molecular bio (who’s teaching?) 5446: Chaotic dynamics (Bradley)

1 Francisco José Vázquez Polo [ Miguel Ángel Negrín Hernández [ {fjvpolo or

Learning In Bayesian Networks. General Learning Problem Set of random variables X = {X 1, X 2, X 3, X 4, …} Training set D = { X (1), X (2), …, X (N)

CS498-EA Reasoning in AI Lecture #10 Instructor: Eyal Amir Fall Semester 2009 Some slides in this set were adopted from Eran Segal.

IE 300, Fall 2012 Richard Sowers IESE. 8/30/2012 Goals: Rules of Probability Counting Equally likely Some examples.

1 Parameter Learning 2 Structure Learning 1: The good Graphical Models – Carlos Guestrin Carnegie Mellon University September 27 th, 2006 Readings:

1 Param. Learning (MLE) Structure Learning The Good Graphical Models – Carlos Guestrin Carnegie Mellon University October 1 st, 2008 Readings: K&F:

1 Learning P-maps Param. Learning Graphical Models – Carlos Guestrin Carnegie Mellon University September 24 th, 2008 Readings: K&F: 3.3, 3.4, 16.1,

The Uniform Prior and the Laplace Correction Supplemental Material not on exam.

Bayesian Optimization Algorithm, Decision Graphs, and Occam’s Razor Martin Pelikan, David E. Goldberg, and Kumara Sastry IlliGAL Report No May.

CHAPTER 3: BAYESIAN DECISION THEORY. Making Decision Under Uncertainty Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Spring 2014 Course PSYC Thinking Proseminar – Matt Jones Provides beginning Ph.D. students with a basic introduction to research on complex human.

CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016.

Bayesian Belief Network AI Contents t Introduction t Bayesian Network t KDD Data.

. The EM algorithm Lecture #11 Acknowledgement: Some slides of this lecture are due to Nir Friedman.

Oliver Schulte Machine Learning 726

Bayes Net Learning: Bayesian Approaches

Oliver Schulte Machine Learning 726

Irina Rish IBM T.J.Watson Research Center

An Overview of Learning Bayes Nets From Data

CSCI 5822 Probabilistic Models of Human and Machine Learning

Course on Bayesian Methods in Environmental Valuation

Bayesian Networks: Motivation

OVERVIEW OF BAYESIAN INFERENCE: PART 1

More about Posterior Distributions

Learning Bayesian networks

Important Distinctions in Learning BNs

CS498-EA Reasoning in AI Lecture #20

CSCI 5822 Probabilistic Models of Human and Machine Learning

Important Distinctions in Learning BNs

Bayesian Learning Chapter

Parameter Learning 2 Structure Learning 1: The good

BN Semantics 3 – Now it’s personal! Parameter Learning 1

Learning Bayesian networks

Presentation transcript:

Learning Bayesian Networks

Dimensions of Learning ModelBayes netMarkov net DataCompleteIncomplete StructureKnownUnknown ObjectiveGenerativeDiscriminative

Bayes net(s) data X 1 true false true X21532X21532 X Learning Bayes nets from data X1X1 X4X4 X9X9 X3X3 X2X2 X5X5 X6X6 X7X7 X8X8 Bayes-net learner + prior/expert information

From thumbtacks to Bayes nets Thumbtack problem can be viewed as learning the probability for a very simple BN: X heads/tails  X1X1 X2X2 XNXN... toss 1 toss 2toss N

The next simplest Bayes net X heads/tails Y tails heads “heads”“tails”

The next simplest Bayes net X heads/tails Y XX X1X1 X2X2 XNXN YY Y1Y1 Y2Y2 YNYN case 1 case 2 case N ?

The next simplest Bayes net X heads/tails Y XX X1X1 X2X2 XNXN YY Y1Y1 Y2Y2 YNYN case 1 case 2 case N "parameter independence"

The next simplest Bayes net X heads/tails Y XX X1X1 X2X2 XNXN YY Y1Y1 Y2Y2 YNYN case 1 case 2 case N "parameter independence"  two separate thumbtack-like learning problems

A bit more difficult... X heads/tails Y Three probabilities to learn:  X=heads  Y=heads|X=heads  Y=heads|X=tails

A bit more difficult... X heads/tails Y XX X1X1 X2X2  Y|X=heads Y1Y1 Y2Y2 case 1 case 2  Y|X=tails heads tails

A bit more difficult... X heads/tails Y XX X1X1 X2X2  Y|X=heads Y1Y1 Y2Y2 case 1 case 2  Y|X=tails

A bit more difficult... X heads/tails Y XX X1X1 X2X2  Y|X=heads Y1Y1 Y2Y2 case 1 case 2  Y|X=tails ? ? ?

A bit more difficult... X heads/tails Y XX X1X1 X2X2  Y|X=heads Y1Y1 Y2Y2 case 1 case 2  Y|X=tails 3 separate thumbtack-like problems

In general … Learning probabilities in a Bayes net is straightforward if Complete data Local distributions from the exponential family (binomial, Poisson, gamma,...) Parameter independence Conjugate priors

Incomplete data makes parameters dependent X heads/tails Y XX X1X1 X2X2  Y|X=heads Y1Y1 Y2Y2 case 1 case 2  Y|X=tails

Solution: Use EM Initialize parameters ignoring missing data E step: Infer missing values using current parameters M step: Estimate parameters using completed data Can also use gradient descent

Learning Bayes-net structure Given data, which model is correct? XY model 1: XY model 2:

Bayesian approach Given data, which model is correct? more likely? XY model 1: XY model 2: Data d

Bayesian approach: Model averaging Given data, which model is correct? more likely? XY model 1: XY model 2: Data d average predictions

Bayesian approach: Model selection Given data, which model is correct? more likely? XY model 1: XY model 2: Data d Keep the best model: - Explanation - Understanding - Tractability

To score a model, use Bayes’ theorem Given data d: "marginal likelihood" model score likelihood

Thumbtack example conjugate prior X heads/tails

More complicated graphs X heads/tails Y 3 separate thumbtack-like learning problems X Y|X=heads Y|X=tails

Model score for a discrete Bayes net

Computation of marginal likelihood Efficient closed form if Local distributions from the exponential family (binomial, poisson, gamma,...) Parameter independence Conjugate priors No missing data (including no hidden variables)

Structure search Finding the BN structure with the highest score among those structures with at most k parents is NP hard for k>1 (Chickering, 1995) Heuristic methods –Greedy –Greedy with restarts –MCMC methods score all possible single changes any changes better? perform best change yes no return saved structure initialize structure

Structure priors 1. All possible structures equally likely 2. Partial ordering, required / prohibited arcs 3. Prior(m)  Similarity(m, prior BN)

Parameter priors All uniform: Beta(1,1) Use a prior Bayes net

Parameter priors Recall the intuition behind the Beta prior for the thumbtack: The hyperparameters  h and  t can be thought of as imaginary counts from our prior experience, starting from "pure ignorance" Equivalent sample size =  h +  t The larger the equivalent sample size, the more confident we are about the long-run fraction

Parameter priors x1x1 x4x4 x9x9 x3x3 x2x2 x5x5 x6x6 x7x7 x8x8 + equivalent sample size imaginary count for any variable configuration parameter priors for any Bayes net structure for X 1 …X n parameter modularity

x1x1 x4x4 x9x9 x3x3 x2x2 x5x5 x6x6 x7x7 x8x8 prior network+equivalent sample size data improved network(s) x 1 true false true x 2 false true x 3 true false Combining knowledge & data x1x1 x4x4 x9x9 x3x3 x2x2 x5x5 x6x6 x7x7 x8x8