Machine Learning CUNY Graduate Center Lecture 21: Graphical Models.

Slides:



Advertisements
Similar presentations
CS188: Computational Models of Human Behavior
Advertisements

Probabilistic models Jouni Tuomisto THL. Outline Deterministic models with probabilistic parameters Hierarchical Bayesian models Bayesian belief nets.
CS498-EA Reasoning in AI Lecture #15 Instructor: Eyal Amir Fall Semester 2011.
Dynamic Bayesian Networks (DBNs)
Supervised Learning Recap
Introduction of Probabilistic Reasoning and Bayesian Networks
EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.
Introduction to probability theory and graphical models Translational Neuroimaging Seminar on Bayesian Inference Spring 2013 Jakob Heinzle Translational.
Chapter 8-3 Markov Random Fields 1. Topics 1. Introduction 1. Undirected Graphical Models 2. Terminology 2. Conditional Independence 3. Factorization.
GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.
Graphical Models Lei Tang. Review of Graphical Models Directed Graph (DAG, Bayesian Network, Belief Network) Typically used to represent causal relationship.
Today Logistic Regression Decision Trees Redux Graphical Models
Bayesian Networks Alan Ritter.
CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of.
Computer vision: models, learning and inference Chapter 10 Graphical Models.
1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.
A Brief Introduction to Graphical Models
CSC2535 Spring 2013 Lecture 1: Introduction to Machine Learning and Graphical Models Geoffrey Hinton.
Probabilistic graphical models. Graphical models are a marriage between probability theory and graph theory (Michael Jordan, 1998) A compact representation.
Undirected Models: Markov Networks David Page, Fall 2009 CS 731: Advanced Methods in Artificial Intelligence, with Biomedical Applications.
第十讲 概率图模型导论 Chapter 10 Introduction to Probabilistic Graphical Models
Perceptual and Sensory Augmented Computing Machine Learning, Summer’11 Machine Learning – Lecture 13 Introduction to Graphical Models Bastian.
1 Bayesian Param. Learning Bayesian Structure Learning Graphical Models – Carlos Guestrin Carnegie Mellon University October 6 th, 2008 Readings:
Bayesian Networks for Data Mining David Heckerman Microsoft Research (Data Mining and Knowledge Discovery 1, (1997))
1 Monte Carlo Artificial Intelligence: Bayesian Networks.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
1 Generative and Discriminative Models Jie Tang Department of Computer Science & Technology Tsinghua University 2012.
Generalizing Variable Elimination in Bayesian Networks 서울 시립대학원 전자 전기 컴퓨터 공학과 G 박민규.
Machine Learning CUNY Graduate Center Lecture 4: Logistic Regression.
Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,
Learning the Structure of Related Tasks Presented by Lihan He Machine Learning Reading Group Duke University 02/03/2006 A. Niculescu-Mizil, R. Caruana.
Probability Course web page: vision.cis.udel.edu/cv March 19, 2003  Lecture 15.
Slides for “Data Mining” by I. H. Witten and E. Frank.
1 BN Semantics 1 Graphical Models – Carlos Guestrin Carnegie Mellon University September 15 th, 2008 Readings: K&F: 3.1, 3.2, –  Carlos.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
CPSC 422, Lecture 11Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Oct, 2, 2015.
Lecture 2: Statistical learning primer for biologists
Probabilistic models Jouni Tuomisto THL. Outline Deterministic models with probabilistic parameters Hierarchical Bayesian models Bayesian belief nets.
Quiz 3: Mean: 9.2 Median: 9.75 Go over problem 1.
1 Param. Learning (MLE) Structure Learning The Good Graphical Models – Carlos Guestrin Carnegie Mellon University October 1 st, 2008 Readings: K&F:
Christopher M. Bishop, Pattern Recognition and Machine Learning 1.
Machine Learning – Lecture 11
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
1 Learning P-maps Param. Learning Graphical Models – Carlos Guestrin Carnegie Mellon University September 24 th, 2008 Readings: K&F: 3.3, 3.4, 16.1,
Pattern Recognition and Machine Learning
1 BN Semantics 2 – Representation Theorem The revenge of d-separation Graphical Models – Carlos Guestrin Carnegie Mellon University September 17.
Today Graphical Models Representing conditional dependence graphically
1 BN Semantics 1 Graphical Models – Carlos Guestrin Carnegie Mellon University September 15 th, 2006 Readings: K&F: 3.1, 3.2, 3.3.
Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability Primer Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability.
CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016.
A Brief Introduction to Bayesian networks
INTRODUCTION TO Machine Learning 2nd Edition
CS 2750: Machine Learning Directed Graphical Models
CHAPTER 16: Graphical Models
Qian Liu CSE spring University of Pennsylvania
Read R&N Ch Next lecture: Read R&N
with observed random variables
Markov Networks.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18
Read R&N Ch Next lecture: Read R&N
Pattern Recognition and Image Analysis
Lecture 5 Unsupervised Learning in fully Observed Directed and Undirected Graphical Models.
CSCI 5822 Probabilistic Models of Human and Machine Learning
Markov Random Fields Presented by: Vladan Radosavljevic.
Graduate School of Information Sciences, Tohoku University
Class #16 – Tuesday, October 26
Readings: K&F: 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7 Markov networks, Factor graphs, and an unified view Start approximate inference If we are lucky… Graphical.
Markov Networks.
Presentation transcript:

Machine Learning CUNY Graduate Center Lecture 21: Graphical Models

Today Graphical Models –Representing conditional dependence graphically 1

Graphical Models and Conditional Independence More generally about probabilities, but used in classification and clustering. Both Linear Regression and Logistic Regression use probabilistic models. Graphical Models allow us to structure, and visualize probabilistic models, and the relationships between variables. 2

(Joint) Probability Tables Represent multinomial joint probabilities between K variables as K-dimensional tables Assuming D binary variables, how big is this table? What is we had multinomials with M entries? 3

Probability Models What if the variables are independent? If x and y are independent: The original distribution can be factored How big is this table, if each variable is binary? 4

Conditional Independence Independence assumptions are convenient (Naïve Bayes), but rarely true. More often some groups of variables are dependent, but others are independent. Still others are conditionally independent. 5

Conditional Independence If two variables are conditionally independent. E.g. y = flu?, x = achiness?, z = headache? 6

Factorization if a joint Assume How do you factorize: 7

Factorization if a joint What if there is no conditional independence? How do you factorize: 8

Structure of Graphical Models Graphical models allow us to represent dependence relationships between variables visually –Graphical models are directed acyclic graphs (DAG). –Nodes: random variables –Edges: Dependence relationship –No Edge: Independent variables –Direction of the edge: indicates a parent-child relationship –Parent: Source – Trigger –Child: Destination – Response 9

Example Graphical Models Parents of a node i are denoted π i Factorization of the joint in a graphical model: 10 x x y y x x y y

Basic Graphical Models Independent Variables Observations When we observe a variable, (fix its value from data) we color the node grey. Observing a variable allows us to condition on it. E.g. p(x,z|y) Given an observation we can generate pdfs for the other variables. 11 x x y y z z x x y y z z

Example Graphical Models X = cloudy? Y = raining? Z = wet ground? Markov Chain 12 x x y y z z

Example Graphical Models Markov Chain Are x and z conditionally independent given y? 13 x x y y z z

Example Graphical Models Markov Chain 14 x x y y z z

One Trigger Two Responses X = achiness? Y = flu? Z = fever? 15 x x y y z z

Example Graphical Models Are x and z conditionally independent given y? 16 x x y y z z

Example Graphical Models 17 x x y y z z

Two Triggers One Response X = rain? Y = wet sidewalk? Z = spilled coffee? 18 x x y y z z

Example Graphical Models Are x and z conditionally independent given y? 19 x x y y z z

Example Graphical Models 20 x x y y z z

Factorization 21 x0 x1 x2 x4 x3 x5

Factorization 22 x0 x1 x2 x4 x3 x5

How Large are the probability tables? 23

Model Parameters as Nodes Treating model parameters as a random variable, we can include these in a graphical model Multivariate Bernouli 24 µ0 x0 µ1 x1 µ2 x2

Model Parameters as Nodes Treating model parameters as a random variable, we can include these in a graphical model Multinomial 25 x0 µ µ x1 x2

Naïve Bayes Classification Observed variables xi are independent given the class variable y The distribution can be optimized using maximum likelihood on each variable separately. Can easily combine various types of distributions 26 x0 y y x1 x2

Graphical Models Graphical representation of dependency relationships Directed Acyclic Graphs Nodes as random variables Edges define dependency relations What can we do with Graphical Models –Learn parameters – to fit data –Understand independence relationships between variables –Perform inference (marginals and conditionals) –Compute likelihoods for classification. 27

Plate Notation To indicate a repeated variable, draw a plate around it. 28 x0 y y x1 xn … y y xi n

Completely observed Graphical Model Observations for every node Simplest (least general) graph, assume each independent 29

Completely observed Graphical Model Observations for every node Second simplest graph, assume complete dependence 30

Maximum Likelihood Each node has a conditional probability table, θ Given the tables, we can construct the pdf. Use Maximum Likelihood to find the best settings of θ 31

Maximum likelihood 32

Count functions Count the number of times something appears in the data 33

Maximum Likelihood Define a function: Constraint: 34

Maximum Likelihood Use Lagrange Multipliers 35

Maximum A Posteriori Training Bayesians would never do that, the thetas need a prior. 36

Conditional Dependence Test Can check conditional independence in a graphical model –“Is achiness (x3) independent of the flue (x0) given fever(x1)?” –“Is achiness (x3) independent of sinus infections(x2) given fever(x1)?” 37

D-Separation and Bayes Ball Intuition: nodes are separated or blocked by sets of nodes. –E.g. nodes x1 and x2, “block” the path from x0 to x5. So x0 is cond. ind.from x5 given x1 and x2 38

Bayes Ball Algorithm Shade nodes x c Place a “ball” at each node in x a Bounce balls around the graph according to rules If no balls reach x b, then cond. ind. 39

Ten rules of Bayes Ball Theorem 40

Bayes Ball Example 41

Bayes Ball Example 42

Undirected Graphs What if we allow undirected graphs? What do they correspond to? Not Cause/Effect, or Trigger/Response, but general dependence Example: Image pixels, each pixel is a bernouli –P(x11,…, x1M,…, xM1,…, xMM) –Bright pixels have bright neighbors No parents, just probabilities. Grid models are called Markov Random Fields 43

Undirected Graphs Undirected separability is easy. To check conditional independence of A and B given C, check the Graph reachability of A and B without going through nodes in C 44 D D B B C C A A

Next Time Inference in Graphical Models –Belief Propagation –Junction Tree Algorithm 45