CPSC 422, Lecture 19Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 23, 2015 Slide Sources Raymond J. Mooney University of.

Slides:



Advertisements
Similar presentations
CPSC 422, Lecture 16Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 16 Feb, 11, 2015.
Advertisements

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 20
Department of Computer Science Undergraduate Events More
Parameter Learning in MN. Outline CRF Learning CRF for 2-d image segmentation IPF parameter sharing revisited.
Conditional Random Fields - A probabilistic graphical model Stefan Mutter Machine Learning Group Conditional Random Fields - A probabilistic graphical.
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data John Lafferty Andrew McCallum Fernando Pereira.
CPSC 422, Lecture 33Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 33 Apr, 8, 2015 Slide source: from David Page (MIT) (which were.
Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
Statistical NLP: Lecture 11
CPSC 322, Lecture 37Slide 1 Finish Markov Decision Processes Last Class Computer Science cpsc322, Lecture 37 (Textbook Chpt 9.5) April, 8, 2009.
Logistics Course reviews Project report deadline: March 16 Poster session guidelines: – 2.5 minutes per poster (3 hrs / 55 minus overhead) – presentations.
Probabilistic reasoning over time So far, we’ve mostly dealt with episodic environments –One exception: games with multiple moves In particular, the Bayesian.
… Hidden Markov Models Markov assumption: Transition model:
Learning Seminar, 2004 Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data J. Lafferty, A. McCallum, F. Pereira Presentation:
CPSC 322, Lecture 12Slide 1 CSPs: Search and Arc Consistency Computer Science cpsc322, Lecture 12 (Textbook Chpt ) January, 29, 2010.
Conditional Random Fields
CPSC 322, Lecture 31Slide 1 Probability and Time: Markov Models Computer Science cpsc322, Lecture 31 (Textbook Chpt 6.5) March, 25, 2009.
CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of.
CPSC 422, Lecture 14Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 14 Feb, 4, 2015 Slide credit: some slides adapted from Stuart.
Department of Computer Science Undergraduate Events More
CPSC 422, Lecture 19Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Feb, 27, 2015 Slide Sources Raymond J. Mooney University of.
CPSC 322, Lecture 35Slide 1 Value of Information and Control Computer Science cpsc322, Lecture 35 (Textbook Chpt 9.4) April, 14, 2010.
1 Sequence Labeling Raymond J. Mooney University of Texas at Austin.
11 CS 388: Natural Language Processing: Discriminative Training and Conditional Random Fields (CRFs) for Sequence Labeling Raymond J. Mooney University.
Machine Learning & Data Mining CS/CNS/EE 155 Lecture 6: Conditional Random Fields 1.
Graphical models for part of speech tagging
Margin Learning, Online Learning, and The Voted Perceptron SPLODD ~= AE* – 3, 2011 * Autumnal Equinox.
CPSC 322, Lecture 22Slide 1 Logic: Domain Modeling /Proofs + Top-Down Proofs Computer Science cpsc322, Lecture 22 (Textbook Chpt 5.2) Oct, 26, 2010.
1 Generative and Discriminative Models Jie Tang Department of Computer Science & Technology Tsinghua University 2012.
CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov
UIUC CS 498: Section EA Lecture #21 Reasoning in Artificial Intelligence Professor: Eyal Amir Fall Semester 2011 (Some slides from Kevin Murphy (UBC))
Maximum Entropy Models and Feature Engineering CSCI-GA.2590 – Lecture 6B Ralph Grishman NYU.
CPSC 422, Lecture 15Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 15 Oct, 14, 2015.
CPSC 422, Lecture 11Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Oct, 2, 2015.
CPSC 322, Lecture 33Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 33 Nov, 30, 2015 Slide source: from David Page (MIT) (which were.
Probabilistic reasoning over time Ch. 15, 17. Probabilistic reasoning over time So far, we’ve mostly dealt with episodic environments –Exceptions: games.
School of Computer Science 1 Information Extraction with HMM Structures Learned by Stochastic Optimization Dayne Freitag and Andrew McCallum Presented.
John Lafferty Andrew McCallum Fernando Pereira
CPSC 422, Lecture 17Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 17 Oct, 19, 2015 Slide Sources D. Koller, Stanford CS - Probabilistic.
Maximum Entropy Model, Bayesian Networks, HMM, Markov Random Fields, (Hidden/Segmental) Conditional Random Fields.
Department of Computer Science Undergraduate Events More
Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.
UBC Department of Computer Science Undergraduate Events More
Conditional Random Fields & Table Extraction Dongfang Xu School of Information.
Instructor: Eyal Amir Grad TAs: Wen Pu, Yonatan Bisk Undergrad TAs: Sam Johnson, Nikhil Johri CS 440 / ECE 448 Introduction to Artificial Intelligence.
Graphical Models for Segmenting and Labeling Sequence Data Manoj Kumar Chinnakotla NLP-AI Seminar.
Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.
Conditional Random Fields and Its Applications Presenter: Shih-Hsiang Lin 06/25/2007.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 10
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 16
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 14
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 17
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 20
UBC Department of Computer Science Undergraduate Events More
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 16
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 16
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 2
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 17
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 14
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 14
Presentation transcript:

CPSC 422, Lecture 19Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 23, 2015 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page, Whitehead Institute, MIT Several Figures from “Probabilistic Graphical Models: Principles and Techniques” D. Koller, N. Friedman 2009

CPSC 422, Lecture 192 Lecture Overview Recap: Naïve Markov – Logistic regression (simple CRF) CRFs: high-level definition CRFs Applied to sequence labeling NLP Examples: Name Entity Recognition, joint POS tagging and NP segmentation

Let’s derive the probabilities we need CPSC 422, Lecture 18Slide 3 Y1Y1 X1X1 X2X2 … XnXn

Naïve Markov Parameters and Inference CPSC 422, Lecture 19Slide 4 Y1Y1 X1X1 X2X2 … XnXn

Let’s generalize …. Assume that you always observe a set of variables X = {X 1 …X n } and you want to predict one or more variables Y = {Y 1 …Y k } A CRF is an undirected graphical model whose nodes corresponds to X ∪ Y. ϕ 1 (D 1 )… ϕ m (D m ) represent the factors which annotate the network (but we disallow factors involving only vars in X – why?) CPSC 422, Lecture 19 5

6 Lecture Overview Recap: Naïve Markov – Logistic regression (simple CRF) CRFs: high-level definition CRFs Applied to sequence labeling NLP Examples: Name Entity Recognition, joint POS tagging and NP segmentation

Sequence Labeling Linear-chain CRF Y2Y2 X1X1 X2X2 … XTXT Y1Y1 YTYT.. CPSC 422, Lecture 19 Slide 7

Increase representational Complexity: Adding Features to a CRF 8 … X 1,1 X 1,m … X 2,1 X 2,m … X T,1 X T,m … … Instead of a single observed variable X i we can model multiple features X ij of that observation. Y1Y1 Y2Y2 YTYT CPSC 422, Lecture 19

CRFs in Natural Language Processing One target variable Y for each word X, encoding the possible labels for X Each target variable is connected to a set of feature variables that capture properties relevant to the target distinction CPSC 422, Lecture 19Slide 9 … X 1,1 X 1,m … X 2,1 X 2,m … X T,1 X T,m … … Y1Y1 Y2Y2 YTYT Does the word end in “ing”? Is the word capitalized? X 1,2 X 2,1

Name Entity Recognition Task Entity often span multiple words “British Columbia” Type of an entity may not be apparent for individual words “University of British Columbia” Let’s assume three categories: Person, Location, Organization BIO notation (for sequence labeling) CPSC 422, Lecture 19Slide 10

Linear chain CRF parameters With two factors “types” for each word CPSC 422, Lecture 19Slide 11 Dependency between neighboring target vars Dependency between target variable and its context in the word sequence, which can include also features of the words (capitalized, appear in an atlas of location names, etc.) Factors are similar to the ones for the Naïve Markov (logistic regression)

Features can also be The word Following word Previous word CPSC 422, Lecture 19 Slide 12

More on features CPSC 422, Lecture 19Slide 13 Total number of features can be However features are sparse i.e. most features are 0 for most words Including features that are conjunctions of simple features increases accuracy

Linear-Chain Performance Per-token/word accuracy in the high 90% range for many natural datasets Per-field precision and recall are more often around %, depending on the dataset. Entire Named Entity Phrase must be correct Slide 14 CPSC 422, Lecture 19

Skip-Chain CRFs Include additional factors that connect non-adjacent target variables E.g., When a word occur multiple times in the same documents CPSC 422, Lecture 19Slide 15 Graphical structure over Y can depend on the values of the Xs !

Coupled linear-chain CRFs Performs part-of-speech labeling and noun- phrase segmentation CPSC 422, Lecture 19Slide 16 Linear-chain CRFs can be combined to perform multiple tasks simultaneously

Coupled linear-chain CRFs Performs part-of-speech labeling and noun- phrase segmentation CPSC 422, Lecture 19Slide 17 Linear-chain CRFs can be combined to perform multiple tasks simultaneously

Forward / Backward / Smoothing and Viterbi can be rewritten (not trivial!) using these factors CPSC 422, Lecture 1918 Then you plug in the factors of the CRFs and all the algorithms work fine with CRFs! Inference in CRFs (just intuition)

CRFs Summary Ability to relax strong independence assumptions Ability to incorporate arbitrary overlapping local and global features Graphical structure over Y can depend on the values of the Xs Can perform multiple tasks simultaneously Standard Inference algorithm for HMM can be applied Practical Leaning algorithms exist State-of–the-art on many labeling tasks (deep learning recently shown to be often better … ensemble them?) See MALLET package CPSC 422, Lecture 1919

Probabilistic Graphical Models CPSC 422, Lecture 19Slide 20 From “Probabilistic Graphical Models: Principles and Techniques” D. Koller, N. Friedman 2009

422 big picture: Where are we? Query Planning DeterministicStochastic Value Iteration Approx. Inference Full Resolution SAT Logics Belief Nets Markov Decision Processes and Partially Observable MDP Markov Chains and HMMs First Order Logics Ontologies Temporal rep. Applications of AI Approx. : Gibbs Undirected Graphical Models Markov Networks Conditional Random Fields Reinforcement Learning Representation Reasoning Technique Prob CFG Prob Relational Models Markov Logics Hybrid: Det +Sto Forward, Viterbi…. Approx. : Particle Filtering CPSC 322, Lecture 34Slide 21

CPSC 422, Lecture 19Slide 22 Learning Goals for today’s class You can: Provide general definition for CRF Apply CRFs to sequence labeling Describe and justify features for CRFs applied to Natural Language processing tasks Explain benefits of CRFs

CPSC 422, Lecture 18Slide 23 Work on practice material posted on Connect Learning Goals (look at the end of the slides for each lecture – or complete list on Connect) Revise all the clicker questions and practice exercises Midterm, Mon, Oct 26, we will start at 9am sharp How to prepare…. Next class Wed Start Logics Revise Logics from 322! CPSC 422, Lecture 1923 Extra Office Hours TODAY 11:00am - 12:30pm in the DLC

Announcements Midterm Avg 73.5 Max 105 Min 30 If score below 70 need to very seriously revise all the material covered so far You can pick up a printout of the solutions along with your midterm. CPSC 422, Lecture 1924

25 Generative vs. Discriminative Models Generative models (like Naïve Bayes): not directly designed to maximize performance on classification. They model the joint distribution P(X,Y ). Classification is then done using Bayesian inference But a generative model can also be used to perform any other inference task, e.g. P(X 1 | X 2, …X n, ) “Jack of all trades, master of none.” Discriminative models (like CRFs): specifically designed and trained to maximize performance of classification. They only model the conditional distribution P(Y | X ). By focusing on modeling the conditional distribution, they generally perform better on classification than generative models when given a reasonable amount of training data. CPSC 422, Lecture 19

Naïve Bayes vs. Logistic Regression Y1Y1 X1X1 X2X2 … XnXn Y1Y1 X1X1 X2X2 … XnXn Naïve Bayes Logistic Regression (Naïve Markov) Conditional Generative Discriminative CPSC 422, Lecture 19Slide 26

Sequence Labeling Y2Y2 X1X1 X2X2 … XTXT HMM Linear-chain CRF Conditional Generative Discriminative Y1Y1 YTYT.. Y2Y2 X1X1 X2X2 … XTXT Y1Y1 YTYT CPSC 422, Lecture 19 Slide 27