Graphical Models - Learning - Wolfram Burgard, Luc De Raedt, Kristian Kersting, Bernhard Nebel Albert-Ludwigs University Freiburg, Germany PCWP CO HRBP.

Slides:



Advertisements
Similar presentations
Learning with Missing Data
Advertisements

Graphical Models - Inference - Wolfram Burgard, Luc De Raedt, Kristian Kersting, Bernhard Nebel Albert-Ludwigs University Freiburg, Germany PCWP CO HRBP.
© 1998, Nir Friedman, U.C. Berkeley, and Moises Goldszmidt, SRI International. All rights reserved. Learning Bayesian Networks from Data Nir Friedman U.C.
Learning: Parameter Estimation
Expectation Maximization
Cognitive Computer Vision Kingsley Sage and Hilary Buxton Prepared under ECVision Specific Action 8-3
The EM algorithm LING 572 Fei Xia Week 10: 03/09/2010.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.
© 1998, Nir Friedman, U.C. Berkeley, and Moises Goldszmidt, SRI International. All rights reserved. Learning I Excerpts from Tutorial at:
Graphical Models - Learning -
Bayesian Networks - Intro - Wolfram Burgard, Luc De Raedt, Kristian Kersting, Bernhard Nebel Albert-Ludwigs University Freiburg, Germany PCWP CO HRBP HREKG.
Visual Recognition Tutorial
Graphical Models - Inference -
Graphical Models - Modeling - Wolfram Burgard, Luc De Raedt, Kristian Kersting, Bernhard Nebel Albert-Ludwigs University Freiburg, Germany PCWP CO HRBP.
Graphical Models - Inference - Wolfram Burgard, Luc De Raedt, Kristian Kersting, Bernhard Nebel Albert-Ludwigs University Freiburg, Germany PCWP CO HRBP.
First introduced in 1977 Lots of mathematical derivation Problem : given a set of data (data is incomplete or having missing values). Goal : assume the.
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
. Learning Bayesian networks Slides by Nir Friedman.
Lecture 5: Learning models using EM
. PGM: Tirgul 10 Learning Structure I. Benefits of Learning Structure u Efficient learning -- more accurate models with less data l Compare: P(A) and.
1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.
The EM algorithm LING 572 Fei Xia 03/01/07. What is EM? EM stands for “expectation maximization”. A parameter estimation method: it falls into the general.
. PGM: Tirgul 10 Parameter Learning and Priors. 2 Why learning? Knowledge acquisition bottleneck u Knowledge acquisition is an expensive process u Often.
Maximum Likelihood (ML), Expectation Maximization (EM)
Visual Recognition Tutorial
What is it? When would you use it? Why does it work? How do you implement it? Where does it stand in relation to other methods? EM algorithm reading group.
EM Algorithm Likelihood, Mixture Models and Clustering.
. Learning Bayesian networks Most Slides by Nir Friedman Some by Dan Geiger.
. Learning Bayesian Networks from Data Nir Friedman Daphne Koller Hebrew U. Stanford.
Maximum likelihood (ML)
Recitation 1 Probability Review
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Cognitive Computer Vision Kingsley Sage and Hilary Buxton Prepared under ECVision Specific Action 8-3
Lecture 19: More EM Machine Learning April 15, 2010.
Aprendizagem Computacional Gladys Castillo, UA Bayesian Networks Classifiers Gladys Castillo University of Aveiro.
Unsupervised Learning: Clustering Some material adapted from slides by Andrew Moore, CMU. Visit for
Bayesian Learning Chapter Some material adapted from lecture notes by Lise Getoor and Ron Parr.
COMP 538 Reasoning and Decision under Uncertainty Introduction Readings: Pearl (1998, Chapter 1 Shafer and Pearl, Chapter 1.
1 CMSC 671 Fall 2001 Class #25-26 – Tuesday, November 27 / Thursday, November 29.
Learning With Bayesian Networks Markus Kalisch ETH Zürich.
Notes on Graphical Models Padhraic Smyth Department of Computer Science University of California, Irvine.
Lecture 6 Spring 2010 Dr. Jianjun Hu CSCE883 Machine Learning.
CS Statistical Machine learning Lecture 24
CS498-EA Reasoning in AI Lecture #10 Instructor: Eyal Amir Fall Semester 2009 Some slides in this set were adopted from Eran Segal.
Prototype Classification Methods Fu Chang Institute of Information Science Academia Sinica ext. 1819
Lecture 2: Statistical learning primer for biologists
Some Neat Results From Assignment 1. Assignment 1: Negative Examples (Rohit)
Advances in Bayesian Learning Learning and Inference in Bayesian Networks Irina Rish IBM T.J.Watson Research Center
Guidance: Assignment 3 Part 1 matlab functions in statistics toolbox  betacdf, betapdf, betarnd, betastat, betafit.
04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller.
1 Structure Learning (The Good), The Bad, The Ugly Inference Graphical Models – Carlos Guestrin Carnegie Mellon University October 13 th, 2008 Readings:
Computational Intelligence: Methods and Applications Lecture 26 Density estimation, Expectation Maximization. Włodzisław Duch Dept. of Informatics, UMK.
. The EM algorithm Lecture #11 Acknowledgement: Some slides of this lecture are due to Nir Friedman.
Qian Liu CSE spring University of Pennsylvania
Maximum Likelihood Estimation
Introduction to Artificial Intelligence
Irina Rish IBM T.J.Watson Research Center
Introduction to Artificial Intelligence
Bayesian Models in Machine Learning
Probabilistic Models with Latent Variables
Introduction to Artificial Intelligence
Introduction to EM algorithm
Learning Bayesian networks
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
EM for Inference in MV Data
Bayesian Learning Chapter
EM for Inference in MV Data
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
Learning Bayesian networks
Presentation transcript:

Graphical Models - Learning - Wolfram Burgard, Luc De Raedt, Kristian Kersting, Bernhard Nebel Albert-Ludwigs University Freiburg, Germany PCWP CO HRBP HREKG HRSAT ERRCAUTER HR HISTORY CATECHOL SAO2 EXPCO2 ARTCO2 VENTALV VENTLUNG VENITUBE DISCONNECT MINVOLSET VENTMACH KINKEDTUBE INTUBATIONPULMEMBOLUS PAPSHUNT ANAPHYLAXIS MINOVL PVSAT FIO2 PRESS INSUFFANESTHTPR LVFAILURE ERRBLOWOUTPUT STROEVOLUMELVEDVOLUME HYPOVOLEMIA CVP BP Advanced I WS 06/07 Based on J. A. Bilmes,“A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models“, TR , U.C. Berkeley, April 1998; G. J. McLachlan, T. Krishnan, „The EM Algorithm and Extensions“, John Wiley & Sons, Inc., 1997; D. Koller, course CS-228 handouts, Stanford University, 2001., N. Friedman & D. Koller‘s NIPS‘99. Parameter Estimation

Bayesian Networks Advanced I WS 06/07 Outline Introduction Reminder: Probability theory Basics of Bayesian Networks Modeling Bayesian networks Inference (VE, Junction tree) [Excourse: Markov Networks] Learning Bayesian networks Relational Models

Bayesian Networks Advanced I WS 06/07 What is Learning? Agents are said to learn if they improve their performance over time based on experience. The problem of understanding intelligence is said to be the greatest problem in science today and “the” problem for this century – as deciphering the genetic code was for the second half of the last one…the problem of learning represents a gateway to understanding intelligence in man and machines. -- Tomasso Poggio and Steven Smale, Learning

Bayesian Networks Advanced I WS 06/07 Why bothering with learning? Bottleneck of knowledge aquisition –Expensive, difficult –Normally, no expert is around Data is cheap ! –Huge amount of data avaible, e.g. Clinical tests Web mining, e.g. log files Learning

Bayesian Networks Advanced I WS 06/07 Why Learning Bayesian Networks? Conditional independencies & graphical language capture structure of many real-world distributions Graph structure provides much insight into domain –Allows “knowledge discovery” Learned model can be used for many tasks Supports all the features of probabilistic learning –Model selection criteria –Dealing with missing data & hidden variables

Bayesian Networks Advanced I WS 06/07 Learning With Bayesian Networks Data + Priori Info E B A.9.1 e b e be b b e BEP(A | E,B) - Learning

Bayesian Networks Advanced I WS 06/07 Learning With Bayesian Networks Data + Priori Info E B A.9.1 e b e be b b e BEP(A | E,B) Learning algorithm - Learning

Bayesian Networks Advanced I WS 06/07 What does the data look like? A1A2A3A4A5A6 true falsetruefalse true false... truefalse true complete data set X1 X2 XM... - Learning attributes/variables data cases

Bayesian Networks Advanced I WS 06/07 What does the data look like? A1A2A3A4A5A6 true ? false ? true ?? false... truefalse ? true ? incomplete data set Real-world data: states of some random variables are missing –E.g. medical diagnose: not all patient are subjects to all test –Parameter reduction, e.g. clustering,... - Learning

Bayesian Networks Advanced I WS 06/07 missing value What does the data look like? incomplete data set Real-world data: states of some random variables are missing –E.g. medical diagnose: not all patient are subjects to all test –Parameter reduction, e.g. clustering,... - Learning A1A2A3A4A5A6 true ? false ? true ?? false... truefalse ? true ?

Bayesian Networks Advanced I WS 06/07 missing value What does the data look like? incomplete data set Real-world data: states of some random variables are missing –E.g. medical diagnose: not all patient are subjects to all test –Parameter reduction, e.g. clustering,... - Learning hidden/ latent A1A2A3A4A5A6 true ? false ? true ?? false... truefalse ? true ?

Bayesian Networks Advanced I WS 06/07 Hidden variable – Examples X1 X2 X3 Y3Y2 Y1 X1 X2 X3 H Y3Y2 Y1 H hidden Parameter reduction: - Learning

Bayesian Networks Advanced I WS 06/07 Hidden variable – Examples 2.Clustering: Cluster Xn X2 X1... assignment of cluster attributes Cluster hidden - Learning Hidden variables also appear in clustering Autoclass model: –Hidden variables assigns class labels –Observed attributes are independent given the class

Bayesian Networks Advanced I WS 06/07 [Slides due to Eamonn Keogh] - Learning

Bayesian Networks Advanced I WS 06/07 Iteration 1 The cluster means are randomly assigned [Slides due to Eamonn Keogh] - Learning

Bayesian Networks Advanced I WS 06/07 Iteration 2 [Slides due to Eamonn Keogh] - Learning

Bayesian Networks Advanced I WS 06/07 Iteration 5 [Slides due to Eamonn Keogh] - Learning

Bayesian Networks Advanced I WS 06/07 Iteration 25 [Slides due to Eamonn Keogh] - Learning

What is a natural grouping among these objects? [Slides due to Eamonn Keogh]

School Employees Simpson's Family MalesFemales Clustering is subjective What is a natural grouping among these objects? [Slides due to Eamonn Keogh]

Bayesian Networks Advanced I WS 06/07 Learning With Bayesian Networks Fixed structureFixed variablesHidden variables observed fully Partially Easiest problem counting Selection of arcs New domain with no domain expert Data mining Numerical, nonlinear optimization, Multiple calls to BNs, Difficult for large networks Encompasses to difficult subproblem, „Only“ Structural EM is known Scientific discouvery AB AB ? AB ?? H - Learning

Bayesian Networks Advanced I WS 06/07 Let be set of data over m RVs is called a data case iid - assumption: – All data cases are independently sampled from identical distributions Parameter Estimation Find: Parameters of CPDs which match the data best - Learning

Bayesian Networks Advanced I WS 06/07 Maximum Likelihood - Parameter Estimation What does „best matching“ mean ? - Learning

Bayesian Networks Advanced I WS 06/07 Maximum Likelihood - Parameter Estimation What does „best matching“ mean ? - Learning Find paramteres which have most likely produced the data

Bayesian Networks Advanced I WS 06/07 Maximum Likelihood - Parameter Estimation What does „best matching“ mean ? 1.MAP parameters - Learning

Bayesian Networks Advanced I WS 06/07 Maximum Likelihood - Parameter Estimation What does „best matching“ mean ? 1.MAP parameters 2.Data is equally likely for all parameters. - Learning

Bayesian Networks Advanced I WS 06/07 Maximum Likelihood - Parameter Estimation What does „best matching“ mean ? 1.MAP parameters 2.Data is equally likely for all parameters 3.All parameters are apriori equally likely - Learning

Bayesian Networks Advanced I WS 06/07 Maximum Likelihood - Parameter Estimation What does „best matching“ mean ? Find: ML parameters - Learning

Bayesian Networks Advanced I WS 06/07 Likelihood of the paramteres given the data Maximum Likelihood - Parameter Estimation What does „best matching“ mean ? Find: ML parameters - Learning

Bayesian Networks Advanced I WS 06/07 Likelihood of the paramteres given the data Maximum Likelihood - Parameter Estimation What does „best matching“ mean ? Find: ML parameters - Learning Log-Likelihood of the paramteres given the data

Bayesian Networks Advanced I WS 06/07 Maximum Likelihood This is one of the most commonly used estimators in statistics Intuitively appealing Consistent: estimate converges to best possible value as the number of examples grow Asymptotic efficiency: estimate is as close to the true value as possible given a particular training set - Learning

Bayesian Networks Advanced I WS 06/07 Learning With Bayesian Networks Fixed structureFixed variablesHidden variables observed fully Partially Easiest problem counting Selection of arcs New domain with no domain expert Data mining Numerical, nonlinear optimization, Multiple calls to BNs, Difficult for large networks Encompasses to difficult subproblem, „Only“ Structural EM is known Scientific discouvery AB AB ? AB ?? H - Learning ?

Bayesian Networks Advanced I WS 06/07 Known Structure, Complete Data E B A.9.1 e b e be b b e BEP(A | E,B) Learning algorithm - Learning E B A ?? e b e ?? ? ? ?? be b b e BEP(A | E,B) E, B, A. Network structure is specified –Learner needs to estimate parameters Data does not contain missing values

Bayesian Networks Advanced I WS 06/07 ML Parameter Estimation A1A2A3A4A5A6 true falsetruefalse true false... truefalse true - Learning

Bayesian Networks Advanced I WS 06/07 ML Parameter Estimation A1A2A3A4A5A6 true falsetruefalse true false... truefalse true - Learning (iid)

Bayesian Networks Advanced I WS 06/07 ML Parameter Estimation A1A2A3A4A5A6 true falsetruefalse true false... truefalse true - Learning (iid)

Bayesian Networks Advanced I WS 06/07 ML Parameter Estimation A1A2A3A4A5A6 true falsetruefalse true false... truefalse true - Learning (iid) (BN semantics)

Bayesian Networks Advanced I WS 06/07 ML Parameter Estimation A1A2A3A4A5A6 true falsetruefalse true false... truefalse true - Learning (iid) (BN semantics)

Bayesian Networks Advanced I WS 06/07 ML Parameter Estimation A1A2A3A4A5A6 true falsetruefalse true false... truefalse true - Learning (iid) (BN semantics)

Bayesian Networks Advanced I WS 06/07 ML Parameter Estimation A1A2A3A4A5A6 true falsetruefalse true false... truefalse true - Learning (iid) Only local parameters of family of Aj involved (BN semantics)

Bayesian Networks Advanced I WS 06/07 ML Parameter Estimation A1A2A3A4A5A6 true falsetruefalse true false... truefalse true - Learning (iid) Each factor individually !! Only local parameters of family of Aj involved (BN semantics)

Bayesian Networks Advanced I WS 06/07 ML Parameter Estimation A1A2A3A4A5A6 true falsetruefalse true false... truefalse true - Learning (iid) Each factor individually !! Only local parameters of family of Aj involved (BN semantics) Decomposability of the likelihood

Bayesian Networks Advanced I WS 06/07 Decomposability of Likelihood If the data set if complete (no question marks) we can maximize each local likelihood function independently, and then combine the solutions to get an MLE solution. decomposition of the global problem to independent, local sub-problems. This allows efficient solutions to the MLE problem.

Bayesian Networks Advanced I WS 06/07 Random variable V with 1,...,K values where Nk is the counts of state k in data This constraint implies that the choice on  I influences the choice on  j (i<>j) Likelihood for Multinominals - Learning

Bayesian Networks Advanced I WS 06/07 Likelihood for Binominals (2 states only) - Learning => MLE is  1 +  2 =1 Compute partial derivative Set partial derivative zero

Bayesian Networks Advanced I WS 06/07 Likelihood for Binominals (2 states only) - Learning => MLE is  1 +  2 =1 Compute partial derivative Set partial derivative zero In general, for multinomials (>2 states), the MLE is

Bayesian Networks Advanced I WS 06/07 multinomial for each joint state pa of the parents of V: MLE Likelihood for Conditional Multinominals - Learning

Bayesian Networks Advanced I WS 06/07 Learning With Bayesian Networks Fixed structureFixed variablesHidden variables observed fully Partially Easiest problem counting Selection of arcs New domain with no domain expert Data mining Numerical, nonlinear optimization, Multiple calls to BNs, Difficult for large networks Encompasses to difficult subproblem, „Only“ Structural EM is known Scientific discouvery AB AB ? AB ?? H - Learning ?

Bayesian Networks Advanced I WS 06/07 Known Structure, Incomplete Data E B A.9.1 e b e be b b e BEP(A | E,B) Learning algorithm - Learning E B A ?? e b e ?? ? ? ?? be b b e BEP(A | E,B) E, B, A. Network structure is specified Data contains missing values –Need to consider assignments to missing values

Bayesian Networks Advanced I WS 06/07 EM Idea In the case of complete data, ML parameter estimation is easy: –simply counting (1 iteration) Incomplete data ? 1.Complete data (Imputation) most probable?, average?,... value 2.Count 3.Iterate - Learning

Bayesian Networks Advanced I WS 06/07 EM Idea: complete the data incomplete data AB true ? falsetrue false ? - Learning AB

Bayesian Networks Advanced I WS 06/07 EM Idea: complete the data incomplete data AB true ? falsetrue false ? complete data N false truefalse true BA expected counts complete - Learning AB

Bayesian Networks Advanced I WS 06/07 EM Idea: complete the data incomplete data AB true ? falsetrue false ? complete data N false truefalse true BA expected counts complete - Learning AB ABN true 1.0 true ?=true0.5 true ?=false0.5 falsetrue1.0 truefalse1.0 false ?=true0.5 false ?=false0.5

Bayesian Networks Advanced I WS 06/07 EM Idea: complete the data incomplete data AB true ? falsetrue false ? complete data N false truefalse true BA expected counts complete - Learning maximize AB

Bayesian Networks Advanced I WS 06/07 EM Idea: complete the data incomplete data AB true ? falsetrue false ? complete data N false truefalse true BA expected counts complete iterate - Learning maximize AB

Bayesian Networks Advanced I WS 06/07 EM Idea: complete the data incomplete data AB true ? falsetrue false ? - Learning AB

Bayesian Networks Advanced I WS 06/07 EM Idea: complete the data incomplete data AB true ? falsetrue false ? complete data N false truefalse true BA expected counts complete iterate - Learning maximize AB

Bayesian Networks Advanced I WS 06/07 EM Idea: complete the data incomplete data AB true ? falsetrue false ? - Learning AB

Bayesian Networks Advanced I WS 06/07 EM Idea: complete the data incomplete data AB true ? falsetrue false ? complete data N false truefalse true BA expected counts complete iterate - Learning maximize AB

Bayesian Networks Advanced I WS 06/07 Complete-data likelihood A1A2A3A4A5A6 true ? false ?true??false... truefalse? true? incomplete-data likelihood Assume complete data exists with complete-data likelihood - Learning

Bayesian Networks Advanced I WS 06/07 EM Algorithm - Abstract - Learning Expectation Step Maximization Step

Bayesian Networks Advanced I WS 06/07 P(Y   Expectation Maximization (EM): Construct an new function based on the “current point” (which “behaves well”) Property: The maximum of the new function has a better scoring then the current point. - Learning EM Algorithm - Principle

Bayesian Networks Advanced I WS 06/07 P(Y  EM Algorithm - Principle  Current point q k Expectation Maximization (EM): Construct an new function based on the “current point” (which “behaves well”) Property: The maximum of the new function has a better scoring then the current point. - Learning

Bayesian Networks Advanced I WS 06/07 P(Y  EM Algorithm - Principle  Current point q k new function Q (q,q k ) Expectation Maximization (EM): Construct an new function based on the “current point” (which “behaves well”) Property: The maximum of the new function has a better scoring then the current point. - Learning

Bayesian Networks Advanced I WS 06/07 P(Y  EM Algorithm - Principle  Current point q k Maximum q k+1 new function Q (q,q k ) Expectation Maximization (EM): Construct an new function based on the “current point” (which “behaves well”) Property: The maximum of the new function has a better scoring then the current point. - Learning

Bayesian Networks Advanced I WS 06/07 Random variable V with 1,...,K values where EN k are the expected counts of state k in the data, i.e. „MLE“: EM for Multi-Nominals - Learning

Bayesian Networks Advanced I WS 06/07 multinomial for each joint state pa of the parents of V: „MLE“ EM for Conditional Multinominals - Learning

Bayesian Networks Advanced I WS 06/07 Learning Parameters: incomplete data Initial parameters Current model Non-decomposable likelihood (missing value, hidden nodes) S X D C B ……… Data - Learning

Bayesian Networks Advanced I WS 06/07 Learning Parameters: incomplete data Initial parameters Current model Non-decomposable likelihood (missing value, hidden nodes) S X D C B ……… Data S X D C B ……….. Expected counts Expectation Inference: P(S|X=0,D=1,C=0,B=1) - Learning

Bayesian Networks Advanced I WS 06/07 Learning Parameters: incomplete data Initial parameters Current model Non-decomposable likelihood (missing value, hidden nodes) S X D C B ……… Data S X D C B ……….. Expected counts Expectation Inference: P(S|X=0,D=1,C=0,B=1) Update parameters (ML, MAP) Maximization - Learning

Bayesian Networks Advanced I WS 06/07 Learning Parameters: incomplete data EM-algorithm: iterate until convergence Initial parameters Current model Non-decomposable likelihood (missing value, hidden nodes) S X D C B ……… Data S X D C B ……….. Expected counts Expectation Inference: P(S|X=0,D=1,C=0,B=1) Update parameters (ML, MAP) Maximization - Learning

Bayesian Networks Advanced I WS 06/07 Learning Parameters: incomplete data 1.Initialize parameters 2.Compute pseudo counts for each variable 3.Set parameters to the (completed) ML estimates 4.If not converged, iterate to 2 junction tree algorithm - Learning

Bayesian Networks Advanced I WS 06/07 Monotonicity (Dempster, Laird, Rubin ´77): the incomplete- data likelihood fuction is not decreased after an EM iteration (discrete) Bayesian networks: for any initial, non- uniform value the EM algorithm converges to a (local or global) maximum. - Learning

Bayesian Networks Advanced I WS 06/07 LL on training set (Alarm) Experiment by Bauer, Koller and Singer [UAI97] - Learning

Bayesian Networks Advanced I WS 06/07 Parameter value (Alarm) Experiment by Baur, Koller and Singer [UAI97] - Learning

Bayesian Networks Advanced I WS 06/07 EM in Practice - Learning Initial parameters: Random parameters setting “Best” guess from other source Stopping criteria: Small change in likelihood of data Small change in parameter values Avoiding bad local maxima: Multiple restarts Early “pruning” of unpromising ones Speed up: various methods to speed convergence

Bayesian Networks Advanced I WS 06/07 Gradient Ascent Main result Requires same BN inference computations as EM Pros: –Flexible –Closely related to methods in neural network training Cons: –Need to project gradient onto space of legal parameters –To get reasonable convergence we need to combine with “smart” optimization techniques - Learning

Bayesian Networks Advanced I WS 06/07 Parameter Estimation: Summary Parameter estimation is a basic task for learning with Bayesian networks Due to missing values non-linear optimization –EM, Gradient,... EM for multi-nominal random variables –Fully observed data: counting –Partially observed data: pseudo counts Junction tree to do multiple inference - Learning