CPSC 422, Lecture 16Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 16 Feb, 11, 2015.

Slides:



Advertisements
Similar presentations
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING PoS-Tagging theory and terminology COMP3310 Natural Language Processing.
Advertisements

Grammar Spinner Touch any part of the screen to begin. (Or click your mouse) Touch the screen again each time you want to spin.
Adverb or Preposition?.
Expectation Maximization Dekang Lin Department of Computing Science University of Alberta.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)
Parts of Speech Generally speaking, the “grammatical type” of word: –Verb, Noun, Adjective, Adverb, Article, … We can also include inflection: –Verbs:
Used in place of a noun pronoun.
Part-of-speech tagging. Parts of Speech Perhaps starting with Aristotle in the West (384–322 BCE) the idea of having parts of speech lexical categories,
Part of Speech Tagging Importance Resolving ambiguities by assigning lower probabilities to words that don’t fit Applying to language grammatical rules.
Natural Language Processing Lecture 8—9/24/2013 Jim Martin.
Part II. Statistical NLP Advanced Artificial Intelligence (Hidden) Markov Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.
Part II. Statistical NLP Advanced Artificial Intelligence Hidden Markov Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.
Part II. Statistical NLP Advanced Artificial Intelligence Part of Speech Tagging Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.
Part-of-Speech Tagging & Sequence Labeling
CPSC 422, Lecture 14Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 14 Feb, 4, 2015 Slide credit: some slides adapted from Stuart.
CPSC 422, Lecture 19Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Feb, 27, 2015 Slide Sources Raymond J. Mooney University of.
CS224N Interactive Session Competitive Grammar Writing Chris Manning Sida, Rush, Ankur, Frank, Kai Sheng.
Albert Gatt Corpora and Statistical Methods Lecture 9.
The Eight Parts of Speech
Part-of-Speech Tagging
Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.
Part of Speech Tagging & Hidden Markov Models Mitch Marcus CSE 391.
ESLG 320 Ch. 12 A little grammar language…. Parts of Speech  Noun: a person/place/thing/idea  Verb: an action or a state of being  Adjective: a word.
Overview Project Goals –Represent a sentence in a parse tree –Use parses in tree to search another tree containing ontology of project management deliverables.
인공지능 연구실 정 성 원 Part-of-Speech Tagging. 2 The beginning The task of labeling (or tagging) each word in a sentence with its appropriate part of speech.
Fall 2005 Lecture Notes #8 EECS 595 / LING 541 / SI 661 Natural Language Processing.
10/24/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 6 Giuseppe Carenini.
10/30/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 7 Giuseppe Carenini.
_____________________ Definition Part of Speech (circle one) Picture Antonym (Opposite) Vocab Word Noun Pronoun Adjective Adverb Conjunction Verb Interjection.
13-1 Chapter 13 Part-of-Speech Tagging POS Tagging + HMMs Part of Speech Tagging –What and Why? What Information is Available? Visible Markov Models.
Word classes and part of speech tagging Chapter 5.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging I Introduction Tagsets Approaches.
PARTS OF SPEECHPARTS OF SPEECH. NOUNS Definition: A noun names a person, place, or thing. Example: John, computer, honesty, school A singular noun is.
CS 416 Artificial Intelligence Lecture 17 Reasoning over Time Chapter 15 Lecture 17 Reasoning over Time Chapter 15.
CSA3202 Human Language Technology HMMs for POS Tagging.
Parts of Speech Major source: Wikipedia. Adjectives An adjective is a word that modifies a noun or a pronoun, usually by describing it or making its meaning.
CPSC 422, Lecture 15Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 15 Oct, 14, 2015.
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-14: Probabilistic parsing; sequence labeling, PCFG.
CPSC 422, Lecture 19Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 23, 2015 Slide Sources Raymond J. Mooney University of.
Parts of Speech Review. A Noun is a person, place, thing, or idea.
Part-of-speech tagging
Definitions Adjectives or Adverbs Conjunctions or Interjections Nouns or Prepositions Pronouns or Verbs
LING/C SC/PSYC 438/538 Lecture 18 Sandiway Fong. Adminstrivia Homework 7 out today – due Saturday by midnight.
DGP – S ENTENCE 1 Parts of Speech. S ENTENCE / W ORD B ANK Laugh often with the friends of your heart, for troubles fill all our lives. Word Bank: action.
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
Probability and Time. Overview  Modelling Evolving Worlds with Dynamic Baysian Networks  Simplifying Assumptions Stationary Processes, Markov Assumption.
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-15: Probabilistic parsing; PCFG (contd.)
Part-of-Speech Tagging & Sequence Labeling Hongning Wang
Parts of Speech By: Miaya Nischelle Sample. NOUN A noun is a person place or thing.
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 25– Probabilistic Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th March,
6/18/2016CPSC503 Winter CPSC 503 Computational Linguistics Lecture 6 Giuseppe Carenini.
Overview  Modelling Evolving Worlds with DBNs  Simplifying Assumptions Stationary Processes, Markov Assumption  Inference Tasks in Temporal Models Filtering.
Parts of Speech Review.
Introduction to Machine Learning and Text Mining
Parts of Speech How Words Function.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 16
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 15
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 14
LING/C SC/PSYC 438/538 Lecture 20 Sandiway Fong.
CSCI 5832 Natural Language Processing
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 16
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 16
What part of speech are the green words in this sentence?
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 15
LING/C SC/PSYC 438/538 Lecture 23 Sandiway Fong.
Parts of Speech How Words Function.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 14
CPSC 503 Computational Linguistics
Parts of Speech.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 14
Presentation transcript:

CPSC 422, Lecture 16Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 16 Feb, 11, 2015

CPSC 422, Lecture 162 Lecture Overview Probabilistic temporal Inferences Filtering Prediction Smoothing (forward-backward) Most Likely Sequence of States (Viterbi) Approx. Inference In Temporal Models (Particle Filtering)

Part-of-Speech (PoS) Tagging  Given a text in natural language, label (tag) each word with its syntactic category E.g, Noun, verb, pronoun, preposition, adjective, adverb, article, conjunction  Input Brainpower, not physical plant, is now a firm's chief asset.  Output Brainpower_NN,_, not_RB physical_JJ plant_NN,_, is_VBZ now_RB a_DT firm_NN 's_POS chief_JJ asset_NN._. Tag meanings  NNP (Proper Noun singular), RB (Adverb), JJ (Adjective), NN (Noun sing. or mass), VBZ (Verb, 3 person singular present), DT (Determiner), POS (Possessive ending),. (sentence-final punctuation)

POS Tagging is very useful As a basis for Parsing in NL understanding Information Retrieval Quickly finding names or other phrases for information extraction Select important words from documents (e.g., nouns) Speech synthesis: Knowing PoS produce more natural pronunciations E.g,. Content (noun) vs. content (adjective); object (noun) vs. object (verb)

Most Likely Sequence (Explanation)  Most Likely Sequence: argmax x 1:T P(X 1:T | e 1:T )  Idea find the most likely path to each state in X T As for filtering etc. let’s try to develop a recursive solution CPSC 422, Lecture 16Slide 5

Joint vs. Conditional Prob  You have two binary random variables X and Y B. = A. > D. It depends argmax x P(X | Y=t) ? argmax x P(X, Y=t) C. < XYP(X, Y) tt.4 ft.2 tf.1 ff.3

Most Likely Sequence: Formal Derivation  Suppose we want to find the most likely path to state x t+1 given e 1:t+1. max x 1,...x t P(x 1,.... x t,x t+1 | e 1:t+1 ) but this is…. max x 1,...x t P(x 1,.... x t,x t+1, e 1:t+1 )= max x 1,...x t P(x 1,.... x t,x t+1,e 1:t, e t+1 )= = max x 1,...x t P(e t+1 |e 1:t, x 1,.... x t,x t+1 ) P(x 1,.... x t,x t+1,e 1:t )= = max x 1,...x t P(e t+1 |x t+1 ) P(x 1,.... x t,x t+1,e 1:t )= = max x 1,...x t P(e t+1 |x t+1 ) P(x t+1 | x 1,.... x t, e 1:t )P(x 1,.... x t, e 1:t )= = max x 1,...x t P(e t+1 |x t+1 ) P(x t+1 |x t ) P(x 1,.... x t-1,x t, e 1:t ) = P(e t+1 |x t+1 ) max x t (P(x t+1 |x t ) max x 1,...x t-1 P(x 1,.... x t-1,x t, e 1:t )) Markov Assumption Move outside the max CPSC 422, Lecture 16Slide 7 Cond. Prob

Intuition behind solution CPSC 422, Lecture 16 Slide 8 P(e t+1 |x t+1 ) max x t (P(x t+1 |x t ) max x 1,...x t-1 P(x 1,.... x t-1,x t,e 1:t ))

The probability of the most likely path to S 2 at time t+1 is: CPSC 422, Lecture 16Slide 9 P(e t+1 |x t+1 ) max x t (P(x t+1 |x t ) max x 1,...x t-1 P(x 1,.... x t-1,x t,e 1:t ))

Most Likely Sequence  Identical to filtering (notation warning: this is expressed for X t+1 instead of X t, it does not make any difference!) P(X t+1 | e 1:t+1 ) = α P(e t+1 | X t+1 ) ∑ x t P(X t+1 | x t ) P( x t | e 1:t )‏ max x 1,...x t P(x 1,.... x t,X t+1,e 1:t+1 ) = P(e t+1 |X t+1 ) max x t (P(X t+1 |x t ) max x 1,...x t-1 P(x 1,.... x t-1,x t,e 1:t )  f 1:t = P(X t |e 1:t ) is replaced by m 1:t = max x 1,...x t-1 P(x 1,.... x t-1,X t,e 1:t ) (*)‏  the summation in the filtering equations is replaced by maximization in the most likely sequence equations Recursive call CPSC 422, Lecture 16Slide 10

Rain Example max x 1,...x t P(x 1,.... x t,X t+1,e 1:t+1 ) = P(e t+1 |X t+1 ) max x t [(P(X t+1 |x t ) m 1:t ] m 1:t = max x 1,...x t-1 P(x 1,.... x t-1,X t,e 1:t ) m 1:1 is just P(R 1 |u) = m 1:2 = P(u 2 |R 2 ) <max [P(r 2 |r 1 ) * 0.818, P(r 2 | ┐r 1 ) 0.182], max [P(┐r 2 |r 1 ) * 0.818, P(┐r 2 | ┐r 1 ) 0.182)= = <max(0.7*0.818, 0.3*0.182), max(0.3*0.818, 0.7*0.182)= = * = CPSC 422, Lecture 16Slide 11

Rain Example m 1:3 = P(┐u 3 |R 3 ) <max [P(r 3 |r 2 ) * 0.515, P(r 3 | ┐r 2 ) *0.049], max [P(┐ r 3 |r 2 ) * 0.515, P(┐r 3 | ┐r 2 ) 0.049)= = <max(0.7* 0.515, 0.3* 0.049), max(0.3* 0.515, 0.7* 0.049)= = * = CPSC 422, Lecture 16Slide 12

Viterbi Algorithm  Computes the most likely sequence to X t+1 by running forward along the sequence computing the m message at each time step Keep back pointers to states that maximize the function in the end the message has the prob. Of the most likely sequence to each of the final states we can pick the most likely one and build the path by retracing the back pointers CPSC 422, Lecture 16 Slide 13

Viterbi Algorithm: Complexity  Time complexity?  Space complexity CPSC 422, Lecture 16Slide 14 B. O(T S 2 ) A. O(T 2 S) C. O(T 2 S 2 ) B. O(T 2 S) A. O(T S) C. O(T 2 S 2 ) T = number of time slices S = number of states

CPSC 422, Lecture 1615 Lecture Overview Probabilistic temporal Inferences Filtering Prediction Smoothing (forward-backward) Most Likely Sequence of States (Viterbi) Approx. Inference In Temporal Models (Particle Filtering)

Limitations of Exact Algorithms HMM has very large number of states Our temporal model is a Dynamic Belief Network with several “state” variables Exact algorithms do not scale up  What to do?

Approximate Inference Basic idea: Draw N samples from known prob. distributions Use those samples to estimate unknown prob. distributions Why sample? Inference: getting N samples is faster than computing the right answer (e.g. with Filtering) 17CPSC 422, Lecture 11

Simple but Powerful Approach: Particle Filtering Idea from Exact Filtering: should be able to compute P(X t+1 | e 1:t+1 ) from P( X t | e 1:t ) ‏ “.. One slice from the previous slice…” Idea from Likelihood Weighting Samples should be weighted by the probability of evidence given parents New Idea: run multiple samples simultaneously through the network 18CPSC 422, Lecture 11

Particle Filtering Run all N samples together through the network, one slice at a time STEP 0: Generate a population on N initial-state samples by sampling from initial state distribution P(X 0 ) N = 10

Particle Filtering STEP 1: Propagate each sample for x t forward by sampling the next state value x t+1 based on P(X t+1 |X t ) R t-1 P(R t ) tftf

Particle Filtering STEP 2: Weight each sample by the likelihood it assigns to the evidence E.g. assume we observe not umbrella at t+1 RtRt P(u t )P(┐u t ) tftf

Particle Filtering STEP 3: Create a new sample from the population at X t+1, i.e. resample the population so that the probability that each sample is selected is proportional to its weight  Start the Particle Filtering cycle again from the new sample

In practice, approximation error of particle filtering remains bounded overtime Is PF Efficient? It is also possible to prove that the approximation maintains bounded error with high probability (with specific assumptions)

422 big picture: Where are we? Query Planning DeterministicStochastic Value Iteration Approx. Inference Full Resolution SAT Logics Belief Nets Markov Decision Processes and Partially Observable MDP Markov Chains and HMMs First Order Logics Ontologies Temporal rep. Applications of AI Approx. : Gibbs Undirected Graphical Models Markov Networks Conditional Random Fields Reinforcement Learning Representation Reasoning Technique Prob CFG Prob Relational Models Markov Logics Hybrid: Det +Sto Forward, Viterbi…. Approx. : Particle Filtering CPSC 322, Lecture 34Slide 24

CPSC 422, Lecture 16Slide 25 Learning Goals for today’s class  You can: Describe the problem of finding the most likely sequence of states (given a sequence of observations), derive its solution (Viterbi algorithm) by manipulating probabilities and applying it to a temporal model Describe and apply Particle Filtering for approx. inference in temporal models.

CPSC 422, Lecture 16Slide 26 TODO for Fri Keep working on Assignment-2: RL, Approx. Inference in BN, Temporal Models - due Feb 18 (it may take longer that first one)

Likelihood Weighting 27 +c0.5 -c0.5 +c +s0.1 -s0.9 -c +s0.5 -s0.5 +c +r0.8 -r0.2 -c +r0.2 -r0.8 +s +r +w0.99 -w0.01 -r +w0.90 -w0.10 -s +r +w0.90 -w0.10 -r +w0.01 -w0.99 Samples: +c, +s, +r, +w … Cloudy Sprinkler Rain WetGrass Cloudy Sprinkler Rain WetGrass CPSC 422, Lecture 12 Evidence influences the choice of downstream variables, but not upstream ones 

Why not Likelihood Weighting Two problems 1)Standard algorithm runs each sample in turn, all the way through the network time and space requirements grow with t!

Why not Likelihood Weighting 2.Does not work well when evidence comes “late” in the network topology Samples are largely independent on the evidence State variables never have evidence as parents In the Rain network, I could observe umbrella every day, and still get samples that are sequences of sunny days Number of samples necessary to get good approximations increases exponentially with t !

Why not Likelihood Weighting