Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,

Slides:



Advertisements
Similar presentations
Part 2: Unsupervised Learning
Advertisements

CS188: Computational Models of Human Behavior
CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 27 – Overview of probability concepts 1.
A Tutorial on Learning with Bayesian Networks
Biointelligence Laboratory, Seoul National University
Identifying Conditional Independencies in Bayes Nets Lecture 4.
For Monday Finish chapter 14 Homework: –Chapter 13, exercises 8, 15.
Introduction of Probabilistic Reasoning and Bayesian Networks
EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.
Introduction to probability theory and graphical models Translational Neuroimaging Seminar on Bayesian Inference Spring 2013 Jakob Heinzle Translational.
Artificial Intelligence Chapter 19 Reasoning with Uncertain Information Biointelligence Lab School of Computer Sci. & Eng. Seoul National University.
From: Probabilistic Methods for Bioinformatics - With an Introduction to Bayesian Networks By: Rich Neapolitan.
Chapter 8-3 Markov Random Fields 1. Topics 1. Introduction 1. Undirected Graphical Models 2. Terminology 2. Conditional Independence 3. Factorization.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Bayesian Networks Chapter 2 (Duda et al.) – Section 2.11
Ch 13. Sequential Data (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Kim Jin-young Biointelligence Laboratory, Seoul.
PGM 2003/04 Tirgul 3-4 The Bayesian Network Representation.
Bayesian Network Representation Continued
Graphical Models Lei Tang. Review of Graphical Models Directed Graph (DAG, Bayesian Network, Belief Network) Typically used to represent causal relationship.
Today Logistic Regression Decision Trees Redux Graphical Models
Bayesian Networks Alan Ritter.
CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of.
Computer vision: models, learning and inference Chapter 10 Graphical Models.
. DAGs, I-Maps, Factorization, d-Separation, Minimal I-Maps, Bayesian Networks Slides by Nir Friedman.
Machine Learning CUNY Graduate Center Lecture 21: Graphical Models.
Biointelligence Laboratory, Seoul National University
Made by: Maor Levy, Temple University  Probability expresses uncertainty.  Pervasive in all of Artificial Intelligence  Machine learning 
A Brief Introduction to Graphical Models
CSC2535 Spring 2013 Lecture 1: Introduction to Machine Learning and Graphical Models Geoffrey Hinton.
Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by B.-H. Kim Biointelligence Laboratory, Seoul National.
Perceptual and Sensory Augmented Computing Machine Learning, Summer’11 Machine Learning – Lecture 13 Introduction to Graphical Models Bastian.
For Wednesday Read Chapter 11, sections 1-2 Program 2 due.
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Yung-Kyun Noh and Joo-kyung Kim Biointelligence.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
Learning Linear Causal Models Oksana Kohutyuk ComS 673 Spring 2005 Department of Computer Science Iowa State University.
Learning the Structure of Related Tasks Presented by Lihan He Machine Learning Reading Group Duke University 02/03/2006 A. Niculescu-Mizil, R. Caruana.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS.
INTERVENTIONS AND INFERENCE / REASONING. Causal models  Recall from yesterday:  Represent relevance using graphs  Causal relevance ⇒ DAGs  Quantitative.
CSC2535 Spring 2011 Lecture 1: Introduction to Machine Learning and Graphical Models Geoffrey Hinton.
Slides for “Data Mining” by I. H. Witten and E. Frank.
An Introduction to Variational Methods for Graphical Models
Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Lecture 2: Statistical learning primer for biologists
Christopher M. Bishop, Pattern Recognition and Machine Learning 1.
Machine Learning – Lecture 11
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
Pattern Recognition and Machine Learning
Introduction on Graphic Models
Today Graphical Models Representing conditional dependence graphically
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Joo-kyung Kim Biointelligence Laboratory,
Ch 1. Introduction (Latter) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by J.W. Ha Biointelligence Laboratory, Seoul National.
Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability Primer Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability.
CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
Biointelligence Laboratory, Seoul National University
CS 2750: Machine Learning Review
CS 2750: Machine Learning Directed Graphical Models
Read R&N Ch Next lecture: Read R&N
Special Topics In Scientific Computing
Read R&N Ch Next lecture: Read R&N
Pattern Recognition and Image Analysis
Biointelligence Laboratory, Seoul National University
Markov Random Fields Presented by: Vladan Radosavljevic.
Summarized by Kim Jin-young
Biointelligence Laboratory, Seoul National University
Multivariate Methods Berlin Chen
Ch 3. Linear Models for Regression (2/2) Pattern Recognition and Machine Learning, C. M. Bishop, Previously summarized by Yung-Kyun Noh Updated.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18
Read R&N Ch Next lecture: Read R&N
Presentation transcript:

Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory, Seoul National University

2(C) 2007, SNU Biointelligence Lab, Bayesian network  Example of Bayesian network – polynomial regression  Generative models  Discrete variables  Linear Gaussian models 8.2 Conditional independence  Three example graphs  d-separation

3(C) 2007, SNU Biointelligence Lab, Probability play a central role in modern pattern recognition All of the probabilistic inference and learning amount to repeated application of the sum rule and the product rule There are useful properties in using probabilistic graphical models  A simple way to visualize the structure of a probabilistic model  Insights into the properties of the model  Complex computations (for inference and learning) can be expressed in terms of graphical manipulations underlying mathematical expressions.

4(C) 2007, SNU Biointelligence Lab, Probabilistic graphical model Node (vertices)  A random variable (or group) Links (edges or arcs)  Probabilistic relationships b/w two variables Probabilistic graphical model  The joint distribution over all random variables can be decomposed into a product of independent factors (conditional independence concept)  Bayesian network  It has a particular direction, we can call it directed graphical models.  Express causal relationships b/w random variables  Markov random fields  Undirected graphical model  Soft constraints b/w random variables

5(C) 2007, SNU Biointelligence Lab, Bayesian network (1/2) Given three variable a, b, c  p(a,b,c) = p(c|a,b)p(a,b) = p(c|a,b)p(b|a)p(a) on the right-hand side by product rule  Different order of a,b,c on p(a,b,c) makes different graphical representation.  The Representation is NOT unique! K variables Directed acyclic graphs (DAGs)  Restriction: no directed cycles Fully connected graph (Chain rule)

6(C) 2007, SNU Biointelligence Lab, Bayesian network (2/2) The joint distribution defined by a graph is given by the product of a conditional distribution of each node conditioned on their parent nodes. Example)  The joint distribution of all 7 variables is given by denotes the set of parents of x k

7(C) 2007, SNU Biointelligence Lab, Example: Polynomial regression plate parameter Observed Variable

8(C) 2007, SNU Biointelligence Lab, Generative models To draws samples from a given probability distribution Given, our goal is to draw a sample Ancestral sampling  Draw a sample from the conditional prob.  Start with the lowest-numbered node Generative models  Bayesian network captures the causal process where the observed data was generated.  By contrast, the polynomial regression is not a generative model because the variable x did not have any a probability distribution.  Can not produce a sample from the model  If introducing a suitable prior prob. for x, it can be a generative model to draw a sample x.  Hidden variables can be used for making the simpler component.

9(C) 2007, SNU Biointelligence Lab, Nodes as building blocks in Graphical models The framework of graphical models is very useful in expressing the way in which “building blocks” are linked together. Choose the relationship b/w each parent-child pair in a directed graph to be conjugate.  The followings can extend hierarchically to construct arbitrarily complex DAGs.  Discrete variables  Gaussian variables

10(C) 2007, SNU Biointelligence Lab, Discrete variables Given a node with k discrete states  # of parameter: (k-1) because of Given two nodes x 1, x 2 and each has k discrete states,  If x 1 and x 2 are dependent,  # of parameters on p(x 1,x 2 ) : k 2 -1  Given M variables: k M -1 (fully connected)  If x 1 and x 2 are independent,  # of parameters on p(x 1,x 2 ): 2(k-1)  Given M variables: M(k-1) Given a chain of M discrete nodes (not fully connected), each having K states, requires the specification of K-1 + (M-1)K(K-1)

11(C) 2007, SNU Biointelligence Lab, Discrete variables Sharing parameters  For the chain, the number of parameters is K 2 -1 when are governed by the same set of K(K-1) parameters A graph over discrete variables into a Bayesian model by introducing Dirichlet priors to parameters  Each node acquires an additional parent representing the Dirichlet distribution over the parameters Parameterized models for the conditional distributions  Require 2 M parameters the prob. p(y=1) over  The number of parameters grows linearly with M

12(C) 2007, SNU Biointelligence Lab, Linear-Gaussian models (continuous variables) A multivariate Gaussian can be expressed as a directed graph corresponding to a linear-Gaussian model over the component variables. Graph G with D variables X = {x 1,…x D }, continuous random variable x i having a Gaussian distribution The mean of this distribution is taken to be a linear combination of the states of its parent nodes of node x i A quadratic form of the components of X  The joint distribution is a multivariate Gaussian

13(C) 2007, SNU Biointelligence Lab, Linear-Gaussian models (continuous variables) Can evaluate the mean and covariance of the joint distribution recursively.

14(C) 2007, SNU Biointelligence Lab, Linear-Gaussian models (continuous variables) Consider two cases  No links  The mean of p(x) is given b  The covariance matrix is diagonal of the form  The joint distribution represents a set of D independent univariate Gaussian distributions  Graph with one missing link  Multivariate Gaussian distribution

15(C) 2007, SNU Biointelligence Lab, Conditional Independence Conditional independence simplifies both the structure of a model and the computations An important feature of graphical models is that conditional independence properties of the joint distribution can be read directly from the graph without having to perform any analytical manipulations  The general framework for this is called d-separation

16(C) 2007, SNU Biointelligence Lab, Three example graphs – 1 st case None of the variables are observed The variable c is observed The conditioned node ‘blocks’ the path from a to b causes a and b to become (conditionally) independent. Node c is tail-to-tail

17(C) 2007, SNU Biointelligence Lab, Three example graphs – 2 nd case None of the variables are observed The variable c is observed The conditioned node ‘blocks’ the path from a to b causes a and b to become (conditionally) independent. Node c is head-to-tail

18(C) 2007, SNU Biointelligence Lab, Three example graphs – 3 rd case None of the variables are observed The variable c is observed When node c is unobserved, it ‘blocks’ the path and the variables a and b are independent. Conditioning on c ‘unblocks’ the path and render a and b dependent. Node c is head-to-head

19(C) 2007, SNU Biointelligence Lab, Three example graphs - Fuel gauge example  B – Battery, F-fuel, G-electric fuel gauge  Checking the fuel gauge has the meaning?  Checking the battery also has the meaning? ( Makes it more likely ) Makes it less likely than observation of fuel gauge only.

20(C) 2007, SNU Biointelligence Lab, d-separation Tail-to-tail node or head-to-tail node  Unless it is observed in which case it blocks a path, the path is unblocked. Head-to-head node  Blocks a path if is unobserved, but on the node, and/or at least one of its descendants, is observed the path becomes unblocked. d-separation?  All paths are blocked.  The joint distribution will satisfy conditional independence w.r.t. concerned variables.

21(C) 2007, SNU Biointelligence Lab, d-separation (a) a is dependent to b given c  Head-to-head node e is unblocked, because a descendant c is in the conditioning set.  Tail-to-tail node f is unblocked (b) a is independent to b given f  Head-to-head node e is blocked  Tail-to-tail node f is blocked

22(C) 2007, SNU Biointelligence Lab, d-separation Another example of conditional independence  Problem: finding posterior dist. for the mean of a univariate Gaussian dist.  Every path is blocked and so the observations D={x 1,…,x N } are independent given (The observations are in general no longer independent!) (independent)

23(C) 2007, SNU Biointelligence Lab, d-separation Naïve Bayes model  Key assumption: conditioned on the class z, the distribution of the input variables x 1,…, x D are independent.  Input{x 1,…,x N } with their class labels, then we can fit the naïve Bayes model to the training data using maximum likelihood assuming that the data are drawn independently from the model.

24(C) 2007, SNU Biointelligence Lab, d-separation Directed factorization  Filtering whether can be expressed in terms of the factorization implied by the graph?  If we present to the filter the set of all possible distributions p(x) over the set of variables X, then the subset of distributions that are passed by the filter will be denoted DF (Directed Factorization)  Fully connected graph: The set DF will contain all possible distributions  Fully disconnected graph: The joint distributions which factorize into the product of the marginal distributions over the variables only.

25(C) 2007, SNU Biointelligence Lab, d-separation Markov blanket  When the conditional distribution of x i, consider the minimal set of nodes that isolates x i from the rest of the graph.  The set of nodes comprising parents, children, co-parents is called the Markov blanket. Co-parents parents children