A Brief Introduction to Graphical Models

Slides:



Advertisements
Similar presentations
Introduction to Graphical Models Brookes Vision Lab Reading Group.
Advertisements

CS188: Computational Models of Human Behavior
Bayesian network for gene regulatory network construction
CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 27 – Overview of probability concepts 1.
Graphical Models and Applications CNS/EE148 Instructors: M.Polito, P.Perona, R.McEliece TA: C. Fanti.
BAYESIAN NETWORKS. Bayesian Network Motivation  We want a representation and reasoning system that is based on conditional independence  Compact yet.
1 Knowledge Engineering for Bayesian Networks. 2 Probability theory for representing uncertainty l Assigns a numerical degree of belief between 0 and.
Belief Networks Done by: Amna Omar Abdullah Fatima Mahmood Al-Hajjat Najwa Ahmad Al-Zubi.
Bayesian Networks VISA Hyoungjune Yi. BN – Intro. Introduced by Pearl (1986 ) Resembles human reasoning Causal relationship Decision support system/ Expert.
Introduction of Probabilistic Reasoning and Bayesian Networks
EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.
From: Probabilistic Methods for Bioinformatics - With an Introduction to Bayesian Networks By: Rich Neapolitan.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Review: Bayesian learning and inference
Bayesian Networks Chapter 2 (Duda et al.) – Section 2.11
PGM 2003/04 Tirgul 3-4 The Bayesian Network Representation.
Bayesian Belief Networks
Goal: Reconstruct Cellular Networks Biocarta. Conditions Genes.
PatReco: Bayesian Networks Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall
1 gR2002 Peter Spirtes Carnegie Mellon University.
Today Logistic Regression Decision Trees Redux Graphical Models
. Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.
Bayesian Networks Alan Ritter.
Bayes Nets. Bayes Nets Quick Intro Topic of much current research Models dependence/independence in probability distributions Graph based - aka “graphical.
. DAGs, I-Maps, Factorization, d-Separation, Minimal I-Maps, Bayesian Networks Slides by Nir Friedman.
Causal Models, Learning Algorithms and their Application to Performance Modeling Jan Lemeire Parallel Systems lab November 15 th 2006.
1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.
Machine Learning CUNY Graduate Center Lecture 21: Graphical Models.
Made by: Maor Levy, Temple University  Probability expresses uncertainty.  Pervasive in all of Artificial Intelligence  Machine learning 
Bayesian networks Chapter 14. Outline Syntax Semantics.
Soft Computing Lecture 17 Introduction to probabilistic reasoning. Bayesian nets. Markov models.
Bayesian Learning By Porchelvi Vijayakumar. Cognitive Science Current Problem: How do children learn and how do they get it right?
Probabilistic graphical models. Graphical models are a marriage between probability theory and graph theory (Michael Jordan, 1998) A compact representation.
第十讲 概率图模型导论 Chapter 10 Introduction to Probabilistic Graphical Models
Perceptual and Sensory Augmented Computing Machine Learning, Summer’11 Machine Learning – Lecture 13 Introduction to Graphical Models Bastian.
Introduction to Bayesian Networks
Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,
Learning the Structure of Related Tasks Presented by Lihan He Machine Learning Reading Group Duke University 02/03/2006 A. Niculescu-Mizil, R. Caruana.
Learning With Bayesian Networks Markus Kalisch ETH Zürich.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Lecture notes 9 Bayesian Belief Networks.
INTERVENTIONS AND INFERENCE / REASONING. Causal models  Recall from yesterday:  Represent relevance using graphs  Causal relevance ⇒ DAGs  Quantitative.
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
CHAPTER 5 Probability Theory (continued) Introduction to Bayesian Networks.
Review: Bayesian inference  A general scenario:  Query variables: X  Evidence (observed) variables and their values: E = e  Unobserved variables: Y.
Lecture 2: Statistical learning primer for biologists
Bayesian networks and their application in circuit reliability estimation Erin Taylor.
Quiz 3: Mean: 9.2 Median: 9.75 Go over problem 1.
Christopher M. Bishop, Pattern Recognition and Machine Learning 1.
Machine Learning – Lecture 11
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
Cognitive Computer Vision Kingsley Sage and Hilary Buxton Prepared under ECVision Specific Action 8-3
Pattern Recognition and Machine Learning
04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller.
Introduction on Graphic Models
Chapter 12. Probability Reasoning Fall 2013 Comp3710 Artificial Intelligence Computing Science Thompson Rivers University.
Bayesian Networks Chapter 2 (Duda et al.) – Section 2.11 CS479/679 Pattern Recognition Dr. George Bebis.
Integrative Genomics I BME 230. Probabilistic Networks Incorporate uncertainty explicitly Capture sparseness of wiring Incorporate multiple kinds of data.
INTRODUCTION TO Machine Learning 2nd Edition
CS 2750: Machine Learning Directed Graphical Models
Qian Liu CSE spring University of Pennsylvania
Read R&N Ch Next lecture: Read R&N
Learning Bayesian Network Models from Data
Read R&N Ch Next lecture: Read R&N
Bayesian Networks Based on
Markov Random Fields Presented by: Vladan Radosavljevic.
CS 188: Artificial Intelligence Fall 2008
Read R&N Ch Next lecture: Read R&N
Probabilistic Reasoning
Read R&N Ch Next lecture: Read R&N
Presentation transcript:

A Brief Introduction to Graphical Models Presenter: Yijuan Lu

Outline Application Definition Representation Inference and Learning Conclusion

Application Probabilistic expert system for medical diagnosis Widely adopted by Microsoft e.g. the Answer Wizard of Office 95 the Office Assistant of Office 97 over 30 technical support troubleshooters

Application Machine Learning Statistics Patten Recognition Natural Language Processing Computer Vision Image Processing Bio-informatics …….

What causes grass wet? Mr. Holmes leaves his house: the grass is wet in front of his house. two reasons are possible: either it rained or the sprinkler of Holmes has been on during the night. Then, Mr. Holmes looks at the sky and finds it is cloudy: Since when it is cloudy, usually the sprinkler is off and it is more possible it rained. He concludes it is more likely that rain causes grass wet.

What causes grass wet? Cloudy Sprinkler Rain WetGrass P(S=T|C=T) P(R=T|C=T)

Earthquake or burglary? Mr. Holmes is in his office He receives a call from his neighbor that the alarm of his house went off. He thinks that somebody broke into his house. Afterwards he hears an announcement from radio that a small earthquake just happened Since the alarm has been going off during an earthquake. He concludes it is more likely that earthquake causes the alarm.

Earthquake or burglary? Call Burglary Alarm Earthquake Newscast

Graphical Model Probability Theory Graph Theory Graphical Model: + Provides a natural tool for two problems: Uncertainty and Complexity Plays an important role in the design and analysis of machine learning algorithms Probability Theory Graph Theory

Graphical Model Modularity: a complex system is built by combining simpler parts. Probability theory: ensures consistency, provides interface models to data. Graph theory: intuitively appealing interface for humans, efficient general purpose algorithms.

Graphical Model Many of the classical multivariate probabilistic systems are special cases of the general graphical model formalism: -Mixture models -Factor analysis -Hidden Markov Models -Kalman filters The graphical model framework provides a way to view all of these systems as instances of common underlying formalism.

Representation Graphical representation of probabilistic relationship between a set of random variables. Variables are represented by nodes. Binary events Discrete variables Continuous variables y v x u Conditional (in)dependency is represented by (missing) edges. Directed Graphical Model: (Bayesian network) Undirected Graphical Model: (Markov Random Field) Combined: chain graph

Bayesian Network Directed acyclic graphs (DAG). Directed edge means causal dependencies. For each variable X and parents pa(X) exists a conditional probability --P(X|pa(X)) Joint distribution Parent Y1 X

Simple Case That means: the value of B depends on A Dependency is described by the conditional probability P(B|A) Knowledge about A: prior probability P(A) Thus computation of joint probability of A and B : P(A,B)=P(B|A)P(A) B A

Simple Case From the joint probability, we can derive all other probabilities: Marginalization: (sum rule) Conditional probabilities: (Bayesian Rule)

Simple Example Cloudy Sprinkler Rain WetGrass

Bayesian Network Variables: The joint probability of P(U) is given by If the variables are binary, we need O(2n) parameters to describe P Can we do better? Key idea: use properties of independence.

Independent Random Variables X is independent of Y iif for all values x,y If X and Y are independent then Unfortunately, most of random variables of interest are not independent of each other

Conditional Independence A more suitable notion is that of conditional independence. X and Y are conditional independent given Z iff P(X=x|Y=y,Z=z)=P(X=x|Z=z) for all values x,y,z notion: I(X,Y|Z) P(X,Y,Z)=P(X|Y,Z)P(Y|Z)P(Z)=P(X|Z)P(Y|Z)P(Z)

Bayesian Network Directed Markov Property: Each random variable X, is Non-descendent Parent Descendent Directed Markov Property: Each random variable X, is conditional independent of its non-descendents, given its parents Pa(X) Formally, P(X|NonDesc(X), Pa(X))=P(X|Pa(X)) Notation: I (X, NonDesc(X) | Pa(X))

Bayesian Network Factored representation of joint probability Variables: The joint probability of P(U) is given by the joint probability is product of all conditional probabilities

Bayesian Network Complexity reduction Joint probability of n binary variables O(2n) Factorized form O(n*2k) K: maximal number of parents of a node

Simple Case Dependency is described by the conditional probability P(B|A) Knowledge about A: priori probability P(A) Calculate the joint probability of the A and B P(A,B)=P(B|A)P(A) B A

Serial Connection Calculate as before: --P(A,B)=P(B|A)P(A) --P(A,B,C)=P(C|A,B)P(A,B) =P(C|B)P(B|A)P(A) I(C,A|B). A B C

Converging Connection Value of A depends on B and C: P(A|B,C) P(A,B,C)=P(A|B,C)P(B)P(C) B c A

Diverging Connection B and C depend on A: P(B|A) and P(C|A) P(A,B,C)=P(B|A)P(C|A)P(A) I(B,C|A) A B C

Wetgrass P(C) P(S|C) P(R|C) P(W|S,R) Cloudy Sprinkler Rain WetGrass P(C) P(S|C) P(R|C) P(W|S,R) P(C,S,R,W)=P(W|S,R)P(R|C)P(S|C)P(C) versus P(C,S,R,W)=P(W|C,S,R)P(R|C,S)P(S|C)P(C)

Markov Random Fields Links represent symmetrical probabilistic dependencies Direct link between A and B: conditional dependency. Weakness of MRF: inability to represent induced dependencies.

Markov Random Fields A B Global Markov property: x is independent of Y given Z iff all paths between X and Y are blocked by Z. (here: A is independent of E, given C) Local Markov property: X is independent of all other nodes given its neighbors. (here: A is independent of D and E, given C and B C D E

Inference Computation of the conditional probability distribution of one set of nodes, given a model and another set of nodes. Bottom-up Observation (leaves): e.g. wet grass The probabilities of the reasons (rain, sprinkler) can be calculated accordingly “diagnosis” from effects to reasons Top-down Knowledge (e.g. “it is cloudy”) influences the probability for “wet grass” Predict the effects

Inference Observe: wet grass (denoted by W=1) Two possible causes: rain or sprinkler Which is more likely? Using Bayes’ rule to compute the posterior probabilities of the reasons (rain, sprinkler)

Inference

Learning

Learning Learn parameters or structure from data Parameter learning: find maximum likelihood estimates of parameters of each conditional probability distribution Structure learning: find correct connectivity between existing nodes

Learning Structure Observation Method Known Full Maximum Likelihood (ML) estimation Partial Expectation Maximization algorithm (EM) Unknown Model selection EM + model selection

Model Selection Method - Select a ‘good’ model from all possible models and use it as if it were the correct model - Having defined a scoring function, a search algorithm is then used to find a network structure that receives the highest score fitting the prior knowledge and data - Unfortunately, the number of DAG’s on n variables is super-exponential in n. The usual approach is therefore to use local search algorithms (e.g., greedy hill climbing) to search through the space of graphs.

Conclusion A graphical representation of the probabilistic structure of a set of random variables, along with functions that can be used to derive the joint probability distribution. Intuitive interface for modeling. Modular: Useful tool for managing complexity. Common formalism for many models.