Learning In Bayesian Networks. Learning Problem Set of random variables X = {W, X, Y, Z, …} Training set D = { x 1, x 2, …, x N }  Each observation specifies.

Slides:



Advertisements
Similar presentations
CS188: Computational Models of Human Behavior
Advertisements

A Tutorial on Learning with Bayesian Networks
Probabilistic models Haixu Tang School of Informatics.
Learning: Parameter Estimation
LECTURE 11: BAYESIAN PARAMETER ESTIMATION
Expectation Maximization
Dynamic Bayesian Networks (DBNs)
Supervised Learning Recap
Learning Bayesian Networks. Dimensions of Learning ModelBayes netMarkov net DataCompleteIncomplete StructureKnownUnknown ObjectiveGenerativeDiscriminative.
Flipping A Biased Coin Suppose you have a coin with an unknown bias, θ ≡ P(head). You flip the coin multiple times and observe the outcome. From observations,
Bayesian Wrap-Up (probably). 5 minutes of math... Marginal probabilities If you have a joint PDF:... and want to know about the probability of just one.
Introduction of Probabilistic Reasoning and Bayesian Networks
EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.
Overview Full Bayesian Learning MAP learning
1 Graphical Models in Data Assimilation Problems Alexander Ihler UC Irvine Collaborators: Sergey Kirshner Andrew Robertson Padhraic Smyth.
Software Engineering Laboratory1 Introduction of Bayesian Network 4 / 20 / 2005 CSE634 Data Mining Prof. Anita Wasilewska Hiroo Kusaba.
Bayes Nets Rong Jin. Hidden Markov Model  Inferring from observations (o i ) to hidden variables (q i )  This is a general framework for representing.
Bayesian learning finalized (with high probability)
Probabilistic Graphical Models Tool for representing complex systems and performing sophisticated reasoning tasks Fundamental notion: Modularity Complex.
Basics of Statistical Estimation. Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown.
Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.
Learning Bayesian Networks. Dimensions of Learning ModelBayes netMarkov net DataCompleteIncomplete StructureKnownUnknown ObjectiveGenerativeDiscriminative.
Sample Midterm question. Sue want to build a model to predict movie ratings. She has a matrix of data, where for M movies and U users she has collected.
. PGM: Tirgul 10 Parameter Learning and Priors. 2 Why learning? Knowledge acquisition bottleneck u Knowledge acquisition is an expensive process u Often.
Bayesian Learning, Cont’d. Administrivia Various homework bugs: Due: Oct 12 (Tues) not 9 (Sat) Problem 3 should read: (duh) (some) info on naive Bayes.
5/25/2005EE562 EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS Lecture 16, 6/1/2005 University of Washington, Department of Electrical Engineering Spring 2005.
Visual Recognition Tutorial
Bayesian Networks Alan Ritter.
Introduction to Bayesian Parameter Estimation
Learning Bayesian Networks (From David Heckerman’s tutorial)
1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.
1 Learning with Bayesian Networks Author: David Heckerman Presented by Yan Zhang April
Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.
Quiz 4: Mean: 7.0/8.0 (= 88%) Median: 7.5/8.0 (= 94%)
Crash Course on Machine Learning
Machine Learning CUNY Graduate Center Lecture 21: Graphical Models.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Made by: Maor Levy, Temple University  Probability expresses uncertainty.  Pervasive in all of Artificial Intelligence  Machine learning 
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
IID Samples In supervised learning, we usually assume that data points are sampled independently and from the same distribution IID assumption: data are.
Bayesian Networks for Data Mining David Heckerman Microsoft Research (Data Mining and Knowledge Discovery 1, (1997))
Unsupervised Learning: Clustering Some material adapted from slides by Andrew Moore, CMU. Visit for
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Bayesian Classification. Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities.
Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,
Announcements Spring Courses Somewhat Relevant to Machine Learning 5314: Algorithms for molecular bio (who’s teaching?) 5446: Chaotic dynamics (Bradley)
Gaussian Processes Li An Li An
Learning In Bayesian Networks. General Learning Problem Set of random variables X = {X 1, X 2, X 3, X 4, …} Training set D = { X (1), X (2), …, X (N)
CS498-EA Reasoning in AI Lecture #10 Instructor: Eyal Amir Fall Semester 2009 Some slides in this set were adopted from Eran Segal.
CHAPTER 5 Probability Theory (continued) Introduction to Bayesian Networks.
CHAPTER 6 Naive Bayes Models for Classification. QUESTION????
Conditional Probability Mass Function. Introduction P[A|B] is the probability of an event A, giving that we know that some other event B has occurred.
Lecture 2: Statistical learning primer for biologists
Gaussian Processes For Regression, Classification, and Prediction.
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
Lecture 3: MLE, Bayes Learning, and Maximum Entropy
Spring 2014 Course PSYC Thinking Proseminar – Matt Jones Provides beginning Ph.D. students with a basic introduction to research on complex human.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Classification COMP Seminar BCB 713 Module Spring 2011.
Oliver Schulte Machine Learning 726
CS 2750: Machine Learning Directed Graphical Models
Bayes Net Learning: Bayesian Approaches
Distributions and Concepts in Probability Theory
CSCI 5822 Probabilistic Models of Human and Machine Learning
CSCI 5822 Probabilistic Models of Human and Machine Learning
OVERVIEW OF BAYESIAN INFERENCE: PART 1
CSCI 5822 Probabilistic Models of Human and Machine Learning
Class #16 – Tuesday, October 26
LECTURE 07: BAYESIAN ESTIMATION
Bayes for Beginners Luca Chech and Jolanda Malamud
Presentation transcript:

Learning In Bayesian Networks

Learning Problem Set of random variables X = {W, X, Y, Z, …} Training set D = { x 1, x 2, …, x N }  Each observation specifies values of subset of variables  x 1 = {w 1, x 1, ?, z 1, …}  x 2 = {w 2, x 2, y 2, z 2, …}  x 3 = {?, x 3, y 3, z 3, …} Goal  Predict joint distribution over some variables given other variables  E.g., P(W, Y | Z, X)

Classes Of Graphical Model Learning Problems  Network structure known All variables observed  Network structure known Some missing data (or latent variables)  Network structure not known All variables observed  Network structure not known Some missing data (or latent variables) today and next class going to skip (not too relevant for papers we’ll read; see optional readings for more info)

Learning CPDs When All Variables Are Observed And Network Structure Is Known Trivial problem? XY Z XYP(Z|X,Y) 00? 01? 10? 11? P(X) ? ? P(Y) ? XYZ Training Data

Recasting Learning As Inference We’ve already encountered probabilistic models that have latent (a.k.a. hidden, nonobservable) variables that must be estimated from data. E.g., Weiss model  Direction of motion E.g., Gaussian mixture model  To which cluster does each data point belong Why not treat unknown entries in the conditional probability tables the same way?

Recasting Learning As Inference Suppose you have a coin with an unknown bias, θ ≡ P(head). You flip the coin multiple times and observe the outcome. From observations, you can infer the bias of the coin This is learning. This is inference.

Treating Conditional Probabilities As Latent Variables Graphical model probabilities (priors, conditional distributions) can also be cast as random variables  E.g., Gaussian mixture model Remove the knowledge “built into” the links (conditional distributions) and into the nodes (prior distributions). Create new random variables to represent the knowledge  Hierarchical Bayesian Inference zλ x z x zλ x q

Slides stolen from David Heckerman tutorial

training example 1 training example 2

Parameters might not be independent training example 1 training example 2

General Approach: Learning Probabilities in a Bayes Net If network structure S h known and no missing data… We can express joint distribution over variables X in terms of model parameter vector θ s Given random sample D = { x 1, x 2,..., x N }, compute the posterior distribution p(θ s | D, S h ) Given posterior distribution, marginals and conditionals on nodes in network can be determined. Probabilistic formulation of all supervised and unsupervised learning problems.

Computing Parameter Posteriors E.g., net structure X→Y

Computing Parameter Posteriors Given complete data (all X,Y observed) and no direct dependencies among parameters, Explanation Given complete data, each set of parameters is disconnected from each other set of parameters in the graph θxθx X Yθ y|x θxθx θxθx D separation parameter independence

Posterior Predictive Distribution Given parameter posteriors What is prediction of next observation X N+1 ? How can this be used for unsupervised and supervised learning? What we talked about the past three classes What we just discussed

Prediction Directly From Data In many cases, prediction can be made without explicitly computing posteriors over parameters E.g., coin toss example from earlier class Posterior distribution is Prediction of next coin outcome

Generalizing To Multinomial RVs In Bayes Net Variable X i is discrete, with values x i1,... x i r i i: index of multinomial RV j: index over configurations of the parents of node i k: index over values of node i unrestricted distribution: one parameter per probability XiXi XbXb XaXa

Prediction Directly From Data: Multinomial Random Variables Prior distribution is Posterior distribution is Posterior predictive distribution: I: index over nodes j: index over values of parents of I k: index over values of node i

Other Easy Cases Members of the exponential family  see Barber text 8.5 Linear regression with Gaussian noise  see Barber text 18.1