1 Introduction to Computational Natural Language Learning Linguistics 79400 (Under: Topics in Natural Language Processing ) Computer Science 83000 (Under:

Slides:

Advertisements

Similar presentations

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Machine Learning PoS-Taggers COMP3310 Natural Language Processing Eric.

Advertisements

CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)

Probabilistic and Lexicalized Parsing CS Probabilistic CFGs: PCFGs Weighted CFGs –Attach weights to rules of CFG –Compute weights of derivations.

Introduction to Computational Natural Language Learning Linguistics (Under: Topics in Natural Language Processing ) Computer Science (Under:

Part of Speech Tagging Importance Resolving ambiguities by assigning lower probabilities to words that don’t fit Applying to language grammatical rules.

Natural Language Processing Lecture 8—9/24/2013 Jim Martin.

Introduction and Jurafsky Model Resource: A Probabilistic Model of Lexical and Syntactic Access and Disambiguation, Jurafsky 1996.

March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1,

For Monday Read Chapter 23, sections 3-4 Homework –Chapter 23, exercises 1, 6, 14, 19 –Do them in order. Do NOT read ahead.

Hidden Markov Model (HMM) Tagging  Using an HMM to do POS tagging  HMM is a special case of Bayesian inference.

Tagging with Hidden Markov Models. Viterbi Algorithm. Forward-backward algorithm Reading: Chap 6, Jurafsky & Martin Instructor: Paul Tarau, based on Rada.

Artificial Intelligence 2004 Natural Language Processing - Syntax and Parsing - Language, Syntax, Parsing Problems in Parsing Ambiguity, Attachment.

Part II. Statistical NLP Advanced Artificial Intelligence Part of Speech Tagging Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.

Amirkabir University of Technology Computer Engineering Faculty AILAB Efficient Parsing Ahmad Abdollahzadeh Barfouroush Aban 1381 Natural Language Processing.

1 Introduction to Computational Natural Language Learning Linguistics (Under: Topics in Natural Language Processing ) Computer Science (Under:

1/13 Parsing III Probabilistic Parsing and Conclusions.

Tagging – more details Reading: D Jurafsky & J H Martin (2000) Speech and Language Processing, Ch 8 R Dale et al (2000) Handbook of Natural Language Processing,

Features and Unification

Introduction to Syntax, with Part-of-Speech Tagging Owen Rambow September 17 & 19.

1/17 Probabilistic Parsing … and some other approaches.

Part of speech (POS) tagging

Machine Learning in Natural Language Processing Noriko Tomuro November 16, 2006.

Artificial Intelligence 2004 Natural Language Processing - Syntax and Parsing - Language Syntax Parsing.

Announcements Main CSE file server went down last night –Hand in your homework using ‘submit_cse467’ as soon as you can – no penalty if handed in today.

Stochastic POS tagging Stochastic taggers choose tags that result in the highest probability: P(word | tag) * P(tag | previous n tags) Stochastic taggers.

1 Introduction to Computational Natural Language Learning Linguistics (Under: Topics in Natural Language Processing ) Computer Science (Under:

SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning.

Albert Gatt Corpora and Statistical Methods Lecture 9.

Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.

1 Statistical Methods Allen ’ s Chapter 7 J&M ’ s Chapters 8 and 12.

1 Statistical NLP: Lecture 10 Lexical Acquisition.

BİL711 Natural Language Processing1 Statistical Parse Disambiguation Problem: –How do we disambiguate among a set of parses of a given sentence? –We want.

Probabilistic Parsing Reading: Chap 14, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.

For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.

CS 4705 Hidden Markov Models Julia Hirschberg CS4705.

1 Statistical Parsing Chapter 14 October 2012 Lecture #9.

GRAMMARS David Kauchak CS159 – Fall 2014 some slides adapted from Ray Mooney.

CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 3 (10/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Statistical Formulation.

S1: Chapter 1 Mathematical Models Dr J Frost Last modified: 6 th September 2015.

CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov

11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.

Page 1 Probabilistic Parsing and Treebanks L545 Spring 2000.

Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture August 2007.

Albert Gatt Corpora and Statistical Methods Lecture 11.

Word classes and part of speech tagging Chapter 5.

Linguistic Essentials

Rules, Movement, Ambiguity

Natural Language - General

PARSING 2 David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.

CSA3202 Human Language Technology HMMs for POS Tagging.

LING 001 Introduction to Linguistics Spring 2010 Syntactic parsing Part-Of-Speech tagging Apr. 5 Computational linguistics.

CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.

11 Project, Part 3. Outline Basics of supervised learning using Naïve Bayes (using a simpler example) Features for the project 2.

Natural Language Processing Slides adapted from Pedro Domingos

December 2011CSA3202: PCFGs1 CSA3202: Human Language Technology Probabilistic Phrase Structure Grammars (PCFGs)

CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-15: Probabilistic parsing; PCFG (contd.)

Stochastic Methods for NLP Probabilistic Context-Free Parsers Probabilistic Lexicalized Context-Free Parsers Hidden Markov Models – Viterbi Algorithm Statistical.

Overview of Statistical NLP IR Group Meeting March 7, 2006.

PARSING David Kauchak CS159 – Fall Admin Assignment 3 Quiz #1  High: 36  Average: 33 (92%)  Median: 33.5 (93%)

CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 25– Probabilistic Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th March,

LING/C SC 581: Advanced Computational Linguistics Lecture Notes Feb 17 th.

Chapter 12: Probabilistic Parsing and Treebanks Heshaam Faili University of Tehran.

Natural Language Processing Vasile Rus

CSCI 5832 Natural Language Processing

Probabilistic and Lexicalized Parsing

CSCI 5832 Natural Language Processing

Probabilistic and Lexicalized Parsing

CSCI 5832 Natural Language Processing

Natural Language - General

Artificial Intelligence 2004 Speech & Natural Language Processing

Presentation transcript:

1 Introduction to Computational Natural Language Learning Linguistics (Under: Topics in Natural Language Processing ) Computer Science (Under: Topics in Artificial Intelligence ) The Graduate School of the City University of New York Fall 2001 William Gregory Sakas Hunter College, Department of Computer Science Graduate Center, PhD Programs in Computer Science and Linguistics The City University of New York

2 Someone shot the servant of the actress who was on the balcony. Who was on the balcony and how did they get there? Lecture 9: Learning the "best" parse from corpora and tree-banks READING : Charniak, E. (1997) Statistical techniques for natural language parsing, AI Magazine. This is a wonderfully easy to read introduction to how simple patterns in corpora can be used to resolve ambiguities in tagging and parsing. (This is a must read.) Costa et al. (2001) Wide coverage incremental parsing by learning attachment preferences A novel approach to learning parsing preferences that incorporates an artificial neural network. Read Charniak first, but try to get started on this before the meeting after Thanksgiving.

3 Review: Context-free Grammars 1.S-> NP VP 2.VP-> V NP 3.VP-> V NP NP 4.NP-> det N 5.NP-> N 6.NP-> det N N 7.NP-> NP NP The dog ate. The diner ate seafood. The boy ate the fish. Order of rules in a top down parse with one word look-ahead: (see blackboard) Just like our first language models, ambiguity plays an important role. 10 dollars a share.

4 Salespeople sold the dog biscuits. At least three parses. (See Charniak, see blackboard). Wide coverage parsers can generate 100's of parses for every sentence. See Costa et al. for some numbers from the Penn tree-bank. Most are pretty senseless. Traditionally, non-statistically-minded NLP engineering types thought of disambiguation as a post-parsing problem. Statistically-minded NLP engineering folk think more of a continuum parsing and disambiguation go toghether, it's just that some parses are more reasonable than others.

5 POS tagging: Thecanwillrust. detauxauxnoun nounnounverb verbverb Learning algorithm I Input for training: A pre-tagged training corpus. 1)record the frequencies, for each word, of parts of speech. E.g.: The det 1,230 can aux 534 etc noun 56 verb 6 2)on an unseen corpus, apply, for each word, the most frequent POS observed from step (1). For words with frequency of 0 (they didin't appear in the training corpus) guess proper-noun. ACHIEVES 90% (in English)!

6 Learning algorithm I Input for training: A pre-tagged training corpus. 1)record the frequencies, for each word, of parts of speech. The det 1,230 can aux 534 etc noun 56 verb 6 2)on an unseen corpus, apply, for each work, the most frequent POS observed from step (1). For words with frequency of 0 (they didin't appear in the training corpus, guess Proper-noun). ACHIEVES 90%! (But remember totally unambiguous words like "the" are relatively frequent in English which pushes the number way up). Easy to turn frequencies into approximations of the probability that a POS tag is correct given the word: p(t | w). For can these would be: p( aux | can) = 534 / ( ) =.90 p(noun | can) = 56 / ( ) =.09 p(verb | can) = 6 / ( ) =.01

7 Notation for: the tag (t) out of all possible t's that maximizes the probability of t given the current word (w i ) under consideration: = the tag (t) that generates the maximum p of p( aux | can) = 534 / ( ) =.90 p(noun | can) = 56 / ( ) =.09 p(verb | can) = 6 / ( ) =.01 = aux Extending the notation for a sequence of tags : This means give the sequence of tags (t 1,n ), that maximizes the product of the probabilities that a tag is correct for each word.

8 Hidden Markov Models (HMM) det a.245 the.586 adj large.004 small.005 noun house.001 stock p( t i | t i-1 ) = p(adj | det) p( w i | t i ) = p(large | adj) p(small | adj)

9 Secretariat is expected to race tomorrow People continue to inquire the reason for the race for outer space Consider: to/TO race/???? the/Det race/???? The naive Learning Algorithm I would simply assign the most probable tag – ignoring the preceding word - which would obviously be wrong for one of the two sentences above.

10 Consider the first sentence, with the "to race." The (simple bigram) HMM model would choose the greater of the these two probabilities: p(Verb | TO) p(race | Verb) p(Noun | TO) p(race | Noun) Let's look at the first expression: How likely are we to find a verb given the previous tag TO? Can find calculate from a training corpus (or corpora), a verb following a tag TO, is 15 times more likely: p(Verb | TO) =.021 p(Noun | TO) =.34 The second expression: Given each tag (Verb and Noun), ask "if were expecting the tag Verb, would the lexical item be "race?" and "if were expecting the tag Noun, would the lexical item be "race?" I.e. want the likelihood of: p(race | Verb) = and p(race | Noun) =.00041

11 Putting them together: The bigram HMM correctly predicts that race should be a Verb despite fact that race as a Noun is more common: p(Verb | TO) p(race|Verb) = p(Noun | TO) p(race|Noun) = So a bigram HMM taggers chooses the tage sequence that maximizes (it's easy to increase the number of tags looked at): p(word|tag) p(tag|previous tag) a bit more formally: t i = argmax j P(t j | t j-1 )P(w i | t j )

12 After some math (application of Bayes theorem and the chain rule) and Two important simplifying assumptions (the probability of a word depends only on its tag and only the previous tag is enough to approximate the current tag), We have, for a whole sequence of tags t 1,n an HMM bigram model for the predicted tag sequence, given words w 1... w n :

13 the can will rust noun aux DT aux noun verb noun Want the most likely path trough this graph.

14 This is done by the Viterbi algorithm. Accuracy of this method is around 96%. But what about if no training data to calculate likelyhood of tags and words | tags? Can estimate using the forward-backward algorithm. Doesn't work too well without at least a small training set to get it started.

15 Given that there can be many, many parses of a sentence given a typical, large CFG, we should pick the most probable parse as defined by: Probability of a parse of a sentence = the product of the probabilities of all the rules that were applied to expand each constituent. prob of a parse π of sentence s probability of expanding constituent c by context-free rule r product over all constituents in parse π PCFGs

16 1.S-> NP VP(1.0) 2.VP-> V NP(0.8) 3.VP-> V NP NP(0.2) 4.NP-> det N(0.5) 5.NP-> N(0.3) 6.NP-> det N N(0.15) 7.NP-> NP NP (0.05) How do we come by these probabilities? Simply count the number of times they are applied in a tree-bank. For example, if the rule: NP -> det N is used 1,000 times, and overall, NP -> X (i.e. any NP rule) is applied 2,000 times, then the probability of NP -> det N = 0.5. Some example "toy" probabilites attached to CF rules. Apply to Salespeople sold the dog biscuits see Charniak and blackboard.

17 Basic tree-bank grammars parse surprising well (at around 75%) But often mis-predict the correct parse (according to humans). The most troublesome report may be the August merchandise trade deficit due out tomorrow. Tree-bank grammar gets the (incorrect) reading: His worst nightmare may be [the telephone bill due] over $200. N ADJ PP NP deficit due out tomorrow Trees on blackboard or can construct from Charniak.

18 Preferences depend on many factors On type of verb: The women kept the dogs on the beach The women discussed the dogs on the beach Kept: (1)Kept the dogs which were on the beach (2)Kept them (the dogs), while on the beach Discussed: (1)Discussed the dogs which were on the beach (2)Discussed them (the dogs), while on the beach

19 Verb-argument relations Subcategorization But also selects into the type of the Prepositional Phrase (Hindle and Roth) Or, even more deeply, seems to depend on the frequency of semantic associations: The actress delivered flowers threw them in the trash The postman delivered flowers threw them in the trash

20 On ‘selectional’ restrictions “walking on air”; “skating on ice”, vs. “eating on ice” Verb takes a certain kind of argument Subject sometimes must be certain type: John admires honesty vs. ?? Honesty admires John