CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of.

Slides:



Advertisements
Similar presentations
CPSC 422, Lecture 11Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Jan, 29, 2014.
Advertisements

1 Undirected Graphical Models Graphical Models – Carlos Guestrin Carnegie Mellon University October 29 th, 2008 Readings: K&F: 4.1, 4.2, 4.3, 4.4,
Markov Networks Alan Ritter.
Introduction to Markov Random Fields and Graph Cuts Simon Prince
Parameter Learning in MN. Outline CRF Learning CRF for 2-d image segmentation IPF parameter sharing revisited.
Identifying Conditional Independencies in Bayes Nets Lecture 4.
Slide 1 Reasoning Under Uncertainty: More on BNets structure and construction Jim Little Nov (Textbook 6.3)
Conditional Random Fields - A probabilistic graphical model Stefan Mutter Machine Learning Group Conditional Random Fields - A probabilistic graphical.
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data John Lafferty Andrew McCallum Fernando Pereira.
EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.
CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) March, 16, 2009.
Probabilistic reasoning over time So far, we’ve mostly dealt with episodic environments –One exception: games with multiple moves In particular, the Bayesian.
CPSC 322, Lecture 30Slide 1 Reasoning Under Uncertainty: Variable elimination Computer Science cpsc322, Lecture 30 (Textbook Chpt 6.4) March, 23, 2009.
Conditional Random Fields
CPSC 322, Lecture 28Slide 1 Reasoning Under Uncertainty: More on BNets structure and construction Computer Science cpsc322, Lecture 28 (Textbook Chpt 6.3)
CPSC 322, Lecture 31Slide 1 Probability and Time: Markov Models Computer Science cpsc322, Lecture 31 (Textbook Chpt 6.5) March, 25, 2009.
. Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.
Bayesian Networks Alan Ritter.
CPSC 422, Lecture 14Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 14 Feb, 4, 2015 Slide credit: some slides adapted from Stuart.
CPSC 422, Lecture 19Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Feb, 27, 2015 Slide Sources Raymond J. Mooney University of.
Bayes Classifier, Linear Regression 10701/15781 Recitation January 29, 2008 Parts of the slides are from previous years’ recitation and lecture notes,
Crash Course on Machine Learning
11 CS 388: Natural Language Processing: Discriminative Training and Conditional Random Fields (CRFs) for Sequence Labeling Raymond J. Mooney University.
Machine Learning CUNY Graduate Center Lecture 21: Graphical Models.
Read R&N Ch Next lecture: Read R&N
Cognitive Computer Vision Kingsley Sage and Hilary Buxton Prepared under ECVision Specific Action 8-3
Margin Learning, Online Learning, and The Voted Perceptron SPLODD ~= AE* – 3, 2011 * Autumnal Equinox.
1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.
Undirected Models: Markov Networks David Page, Fall 2009 CS 731: Advanced Methods in Artificial Intelligence, with Biomedical Applications.
Department of Computer Science Undergraduate Events More
1 CS 391L: Machine Learning: Bayesian Learning: Beyond Naïve Bayes Raymond J. Mooney University of Texas at Austin.
1 CS 391L: Machine Learning: Bayesian Learning: Naïve Bayes Raymond J. Mooney University of Texas at Austin.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
1 Generative and Discriminative Models Jie Tang Department of Computer Science & Technology Tsinghua University 2012.
CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov
UIUC CS 498: Section EA Lecture #21 Reasoning in Artificial Intelligence Professor: Eyal Amir Fall Semester 2011 (Some slides from Kevin Murphy (UBC))
Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,
Slides for “Data Mining” by I. H. Witten and E. Frank.
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
CPSC 422, Lecture 11Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Oct, 2, 2015.
CPSC 422, Lecture 19Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 23, 2015 Slide Sources Raymond J. Mooney University of.
Probabilistic Graphical Models seminar 15/16 ( ) Haim Kaplan Tel Aviv University.
John Lafferty Andrew McCallum Fernando Pereira
CPSC 422, Lecture 17Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 17 Oct, 19, 2015 Slide Sources D. Koller, Stanford CS - Probabilistic.
Maximum Entropy Model, Bayesian Networks, HMM, Markov Random Fields, (Hidden/Segmental) Conditional Random Fields.
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) Nov, 13, 2013.
CSC2535 Lecture 5 Sigmoid Belief Nets
Instructor: Eyal Amir Grad TAs: Wen Pu, Yonatan Bisk Undergrad TAs: Sam Johnson, Nikhil Johri CS 440 / ECE 448 Introduction to Artificial Intelligence.
CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016.
Reasoning Under Uncertainty: More on BNets structure and construction
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
Read R&N Ch Next lecture: Read R&N
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19
Reasoning Under Uncertainty: More on BNets structure and construction
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 17
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11
Read R&N Ch Next lecture: Read R&N
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 17
Pattern Recognition and Image Analysis
Reasoning Under Uncertainty: Bnet Inference
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11
Readings: K&F: 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7 Markov networks, Factor graphs, and an unified view Start approximate inference If we are lucky… Graphical.
Junction Trees 3 Undirected Graphical Models
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18
Read R&N Ch Next lecture: Read R&N
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
Presentation transcript:

CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models

CPSC 422, Lecture 172 Lecture Overview Probabilistic Graphical models Recap Markov Networks Applications of Markov Networks Inference in Markov Networks (Exact and Approx.) Conditional Random Fields

Parameterization of Markov Networks CPSC 422, Lecture 17Slide 3 Factors define the local interactions (like CPTs in Bnets) What about the global model? What do you do with Bnets? X X

How do we combine local models? As in BNets by multiplying them! CPSC 422, Lecture 17Slide 4

Step Back…. From structure to factors/potentials In a Bnet the joint is factorized…. CPSC 422, Lecture 17Slide 5 In a Markov Network you have one factor for each maximal clique

General definitions Two nodes in a Markov network are independent if and only if every path between them is cut off by evidence So the markov blanket of a node is…? eg for C eg for A C CPSC 422, Lecture 176

7 Lecture Overview Probabilistic Graphical models Recap Markov Networks Applications of Markov Networks Inference in Markov Networks (Exact and Approx.) Conditional Random Fields

Markov Networks Applications (1): Computer Vision Called Markov Random Fields Stereo Reconstruction Image Segmentation Object recognition CPSC 422, Lecture 178 Typically pairwise MRF Each vars correspond to a pixel (or superpixel ) Edges (factors) correspond to interactions between adjacent pixels in the image E.g., in segmentation: from generically penalize discontinuities, to road under car

Image segmentation CPSC 422, Lecture 179

Markov Networks Applications (2): Sequence Labeling in NLP and BioInformatics Conditional random fields CPSC 422, Lecture 1710

CPSC 422, Lecture 1711 Lecture Overview Probabilistic Graphical models Recap Markov Networks Applications of Markov Networks Inference in Markov Networks (Exact and Approx.) Conditional Random Fields

CPSC 422, Lecture 17Slide 12 Variable elimination algorithm for Bnets To compute P(Z| Y 1 =v 1,…,Y j =v j ) : 1.Construct a factor for each conditional probability. 2.Set the observed variables to their observed values. 3.Given an elimination ordering, simplify/decompose sum of products 4.Perform products and sum out Z i 5.Multiply the remaining factors Z 6.Normalize: divide the resulting factor f(Z) by  Z f(Z). Variable elimination algorithm for Markov Networks…..

Gibbs sampling for Markov Networks Example: P(D | C=0) Resample non-evidence variables in a pre-defined order or a random order Suppose we begin with A ABCDEF Note: never change evidence! What do we need to sample? A. P(A | B=0) B. P(A | B=0, C=0) C. P( B=0, C=0| A) CPSC 422, Lecture 17 13

Example: Gibbs sampling Resample probability distribution of P(A|BC) A=1 A=0 C=112 C=0 34 A=1 A=0 B=115 B= ABCDEF ?00110 A=1 A= Normalized result = A=1 A= CPSC 422, Lecture 1714

Example: Gibbs sampling Resample probability distribution of B given A D D=1 D=0 B=112 B=0 21 A=1 A=0 B=115 B= ABCDEF ?0110 B=1 B= Normalized result = B=1 B= CPSC 422, Lecture 1715

CPSC 422, Lecture 1716 Lecture Overview Probabilistic Graphical models Recap Markov Networks Applications of Markov Networks Inference in Markov Networks (Exact and Approx.) Conditional Random Fields

We want to model P(Y 1 | X 1.. X n ) Which model is simpler, MN or BN? CPSC 422, Lecture 18 Slide 17 Y1Y1 X1X1 X2X2 … XnXn Y1Y1 X1X1 X2X2 … XnXn Naturally aggregates the influence of different parents … where all the X i are always observed MNBN

Conditional Random Fields (CRFs) Model P(Y 1.. Y k | X 1.. X n ) Special case of Markov Networks where all the X i are always observed Simple case P(Y 1 | X 1 …X n ) CPSC 422, Lecture 18Slide 18

What are the Parameters? CPSC 422, Lecture 18Slide 19

Let’s derive the probabilities we need CPSC 422, Lecture 18Slide 20 Y1Y1 X1X1 X2X2 … XnXn

Let’s derive the probabilities we need CPSC 422, Lecture 18Slide 21 Y1Y1 X1X1 X2X2 … XnXn

Let’s derive the probabilities we need CPSC 422, Lecture 18Slide 22 Y1Y1 X1X1 X2X2 … XnXn 0

Let’s derive the probabilities we need CPSC 422, Lecture 18Slide 23 Y1Y1 X1X1 X2X2 … XnXn

Let’s derive the probabilities we need CPSC 422, Lecture 18Slide 24 Y1Y1 X1X1 X2X2 … XnXn

Sigmoid Function used in Logistic Regression Great practical interest Number of param w i is linear instead of exponential in the number of parents Natural model for many real- world applications Naturally aggregates the influence of different parents CPSC 422, Lecture 18Slide 25 Y1Y1 X1X1 X2X2 … XnXn Y1Y1 X1X1 X2X2 … XnXn

Logistic Regression as a Markov Net (CRF) Logistic regression is a simple Markov Net (a CRF) aka naïve markov model Y X1X1 X2X2 … XnXn But only models the conditional distribution, P(Y | X ) and not the full joint P(X,Y ) CPSC 422, Lecture 18Slide 26

Naïve Bayes vs. Logistic Regression Y X1X1 X2X2 … XnXn Y X1X1 X2X2 … XnXn Naïve Bayes Logistic Regression (Naïve Markov) Conditional Generative Discriminative CPSC 422, Lecture 18Slide 27

CPSC 422, Lecture 18Slide 28 Learning Goals for today’s class  You can: Perform Exact and Approx. Inference in Markov Networks Describe a few applications of Markov Networks Describe a natural parameterization for a Naïve Markov model (which is a simple CRF) Derive how P(Y|X) can be computed for a Naïve Markov model Explain the discriminative vs. generative distinction and its implications

CPSC 422, Lecture 18Slide 29 Next class Revise generative temporal models (HMM) To Do Linear-chain CRFs Midterm will be marked by Fri

30 Generative vs. Discriminative Models Generative models (like Naïve Bayes): not directly designed to maximize performance on classification. They model the joint distribution P(X,Y). Classification is then done using Bayesian inference But a generative model can also be used to perform any other inference task, e.g. P(X 1 | X 2, …X n, ) “Jack of all trades, master of none.” Discriminative models (like CRFs): specifically designed and trained to maximize performance of classification. They only model the conditional distribution P(Y | X ). By focusing on modeling the conditional distribution, they generally perform better on classification than generative models when given a reasonable amount of training data. CPSC 422, Lecture 18

On Fri: Sequence Labeling Y2Y2 X1X1 X2X2 … XTXT HMM Linear-chain CRF Conditional Generative Discriminative Y1Y1 YTYT.. Y2Y2 X1X1 X2X2 … XTXT Y1Y1 YTYT CPSC 422, Lecture 18 Slide 31

CPSC 422, Lecture 1832 Lecture Overview Indicator function P(X,Y) vs. P(X|Y) and Naïve Bayes Model P(Y|X) explicitly with Markov Networks Parameterization Inference Generative vs. Discriminative models

P(X,Y) vs. P(Y|X) Assume that you always observe a set of variables X = {X 1 …X n } and you want to predict one or more variables Y = {Y 1 …Y m } You can model P(X,Y) and then infer P(Y|X) Slide 33CPSC 422, Lecture 18

P(X,Y) vs. P(Y|X) With a Bnet we can represent a joint as the product of Conditional Probabilities With a Markov Network we can represent a joint a the product of Factors Slide 34CPSC 422, Lecture 18 We will see that Markov Network are also suitable for representing the conditional prob. P(Y|X) directly

Directed vs. Undirected CPSC 422, Lecture 18Slide 35 Factorization

Naïve Bayesian Classifier P(Y,X) A very simple and successful Bnets that allow to classify entities in a set of classes Y 1, given a set of features (X 1 …X n ) Example: Determine whether an is spam (only two classes spam=T and spam=F) Useful attributes of an ? Assumptions The value of each attribute depends on the classification (Naïve) The attributes are independent of each other given the classification P(“bank” | “account”, spam=T) = P(“bank” | spam=T) Slide 36CPSC 422, Lecture 18

Naïve Bayesian Classifier for Spam Spam contains “free” words What is the structure? contains “money” contains “ubc” contains “midterm” The corresponding Bnet represent : P(Y1, X1…Xn) Slide 37 CPSC 422, Lecture 18

Can we derive : P(Y 1 | X 1 …X n ) for any x 1 …x n “free money for you now” NB Classifier for Spam: Usage Spam contains “free” contains “money” contains “ubc” contains “midterm” But you can also perform any other inference… e.g., P( X 1 | X 3 ) Slide 38CPSC 422, Lecture 18

Can we derive : P(Y 1 | X 1 …X n ) “free money for you now” NB Classifier for Spam: Usage Spam contains “free” contains “money” contains “ubc” contains “midterm” But you can perform also any other inference e.g., P( X 1 | X 3 ) Slide 39CPSC 422, Lecture 18

Slide 40

How to improve this model? Need more features– words aren’t enough! Have you ed the sender before? Have 1K other people just gotten the same ? Is the sending information consistent? Is the in ALL CAPS? Do inline URLs point where they say they point? Does the address you by (your) name? Can add these information sources as new variables in the Naïve Bayes model CPSC 422, Lecture 18Slide 41