LING 388 Language and Computers Lecture 22 11/25/03 Sandiway FONG.

Slides:



Advertisements
Similar presentations
Three Basic Problems Compute the probability of a text: P m (W 1,N ) Compute maximum probability tag sequence: arg max T 1,N P m (T 1,N | W 1,N ) Compute.
Advertisements

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING A comparative study of the tagging of adverbs in modern English corpora.
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING PoS-Tagging theory and terminology COMP3310 Natural Language Processing.
Word Classes and POS Tagging Read J & M Chapter 8. You may also want to look at: view.html.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)
LING 388: Language and Computers Sandiway Fong Lecture 2.
Statistical NLP: Lecture 3
BİL711 Natural Language Processing
Part-of-speech tagging. Parts of Speech Perhaps starting with Aristotle in the West (384–322 BCE) the idea of having parts of speech lexical categories,
Part of Speech Tagging Importance Resolving ambiguities by assigning lower probabilities to words that don’t fit Applying to language grammatical rules.
LING 438/538 Computational Linguistics Sandiway Fong Lecture 22: 11/9.
Part II. Statistical NLP Advanced Artificial Intelligence Part of Speech Tagging Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.
 Christel Kemke 2007/08 COMP 4060 Natural Language Processing Word Classes and English Grammar.
Ch 10 Part-of-Speech Tagging Edited from: L. Venkata Subramaniam February 28, 2002.
LING 438/538 Computational Linguistics Sandiway Fong Lecture 20: 11/2.
POS based on Jurafsky and Martin Ch. 8 Miriam Butt October 2003.
LING 388: Language and Computers Sandiway Fong Lecture 23: 11/15.
NLP and Speech 2004 English Grammar
LING 388 Language and Computers Take-Home Final Examination 12/9/03 Sandiway FONG.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Fall 2005-Lecture 2.
Part of speech (POS) tagging
LING 388 Language and Computers Lecture 23 12/2/03 Sandiway FONG.
Machine Learning in Natural Language Processing Noriko Tomuro November 16, 2006.
Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.
LING 438/538 Computational Linguistics Sandiway Fong Lecture 18: 10/26.
BIOI 7791 Projects in bioinformatics Spring 2005 March 22 © Kevin B. Cohen.
CS224N Interactive Session Competitive Grammar Writing Chris Manning Sida, Rush, Ankur, Frank, Kai Sheng.
LING/C SC/PSYC 438/538 Lecture 23 Sandiway Fong. Administrivia Homework 4 – out today – due next Wednesday – (recommend you attempt it early) Reading.
Part-of-Speech Tagging
8. Word Classes and Part-of-Speech Tagging 2007 년 5 월 26 일 인공지능 연구실 이경택 Text: Speech and Language Processing Page.287 ~ 303.
Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.
Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.
LING/C SC/PSYC 438/538 Lecture 27 Sandiway Fong. Administrivia 2 nd Reminder – 538 Presentations – Send me your choices if you haven’t already.
Parts of Speech Sudeshna Sarkar 7 Aug 2008.
LING 388: Language and Computers Sandiway Fong Lecture 17.
1 CPE 641 Natural Language Processing Lecture 2: Levels of Linguistic Analysis, Tokenization & Part- of-speech Tagging Asst. Prof. Dr. Nuttanart Facundes.
NLP LINGUISTICS 101 David Kauchak CS457 – Fall 2011 some slides adapted from Ray Mooney.
Distributional Part-of-Speech Tagging Hinrich Schütze CSLI, Ventura Hall Stanford, CA , USA NLP Applications.
LING 388: Language and Computers Sandiway Fong Lecture 18.
10/30/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 7 Giuseppe Carenini.
13-1 Chapter 13 Part-of-Speech Tagging POS Tagging + HMMs Part of Speech Tagging –What and Why? What Information is Available? Visible Markov Models.
Word classes and part of speech tagging Chapter 5.
Linguistic Essentials
Speech and Language Processing Ch8. WORD CLASSES AND PART-OF- SPEECH TAGGING.
Tokenization & POS-Tagging
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging I Introduction Tagsets Approaches.
Sentence Analysis Week 2 – DGP for Pre-AP.
CSA2050 Introduction to Computational Linguistics Parsing I.
Auckland 2012Kilgarriff: NLP and Corpus Processing1 The contribution of NLP: corpus processing.
CS621: Artificial Intelligence
Part-of-speech tagging
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
Natural Language Processing Slides adapted from Pedro Domingos
LING/C SC/PSYC 438/538 Lecture 18 Sandiway Fong. Adminstrivia Homework 7 out today – due Saturday by midnight.
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
Part-of-Speech Tagging & Sequence Labeling Hongning Wang
Word classes and part of speech tagging Chapter 5.
Part-of-Speech Tagging CSCI-GA.2590 – Lecture 4 Ralph Grishman NYU.
LING/C SC 581: Advanced Computational Linguistics Lecture Notes Feb 3 rd.
LING/C SC 581: Advanced Computational Linguistics Lecture Notes Feb 17 th.
Part-of-Speech Tagging CSE 628 Niranjan Balasubramanian Many slides and material from: Ray Mooney (UT Austin) Mausam (IIT Delhi) * * Mausam’s excellent.
Lecture 9: Part of Speech
Statistical NLP: Lecture 3
LING/C SC/PSYC 438/538 Lecture 21 Sandiway Fong.
LING/C SC/PSYC 438/538 Lecture 20 Sandiway Fong.
CSCI 5832 Natural Language Processing
Syntax.
LING/C SC/PSYC 438/538 Lecture 23 Sandiway Fong.
Week 3 Warm-Ups English 12 Mrs. Fountain.
Natural Language Processing
Presentation transcript:

LING 388 Language and Computers Lecture 22 11/25/03 Sandiway FONG

Administrivia No more homeworks until the final No more homeworks until the final  Final will also cover the material after Homework 4  Take-home final  Handed out on Tuesday December 9th  Discussed in class that day  One week strict deadline No class on Thursday No class on Thursday  Happy Turkey Day!

Relative Clauses From Lecture 14, we have examples like: From Lecture 14, we have examples like:  The cat that John saw(object)  The cat i that John e i  The cat that saw John(subject)  The cat i that e i saw John From Homework 4 (review), we saw that we can have multiply embedded relative clauses From Homework 4 (review), we saw that we can have multiply embedded relative clauses

Relative Clauses Classwork Question (do it now) Classwork Question (do it now)  Rank the following sentences in order of the difficulty of comprehension: 1. I hate the man that the cat that Mary saw hissed at 2. I hate the man that saw the cat that hissed at John 3. I hate the man that the cat that hissed at John saw 4. I hate the man that hissed at the cat that John saw Note: 1 = most difficult If two (or more) are about the same level, give them the same rank

Today’s Lecture In Lecture 21, we looked at Stemming In Lecture 21, we looked at Stemming  … the (morphological) process of going from a fully inflected word form to a root In today’s lecture, we’ll discuss part-of- speech (POS) tagging In today’s lecture, we’ll discuss part-of- speech (POS) tagging  … the process of identifying the part of speech of a fully inflected word form

Part-of-Speech (POS) Tagging Example of a lightweight NLP task Example of a lightweight NLP task  Useful when complete syntactic analysis is not needed, or…  When used as a first stage towards a more complete analysis  POS taggers are practical and do well  95%+ accuracy claimed in the literature

Parts of Speech: Problem Example: Example:  walk: noun, verb  The walk : noun I took …  I walk : verb 2 miles every day Correct tag determined by syntax Correct tag determined by syntax POS taggers try to assign correct tag without actually parsing the sentence POS taggers try to assign correct tag without actually parsing the sentence

Components of a Tagger Dictionary of words Dictionary of words  Exhaustive list of closed class items  Examples: the, a, an: determinerthe, a, an: determiner from, to, of, by: prepositionfrom, to, of, by: preposition and, or: coordination conjunctionand, or: coordination conjunction  Large set of open class (e.g. noun, verbs, adjectives) items with frequency information

Components of a Tagger Mechanism to assign tags Mechanism to assign tags  Context-free: by frequency  Context: bigram, trigram, hand-coded rules  Example: Det Noun/*Verb the walk…Det Noun/*Verb the walk… Mechanism to handle unknown words (extra-dictionary) Mechanism to handle unknown words (extra-dictionary)  Capitalization  Morphology: -ed, -tion

How Hard is Tagging? Brown Corpus (Francis & Kucera, 1982): Brown Corpus (Francis & Kucera, 1982):  1 million words  39K distinct words  35K words with only 1 tag, 4K with multiple tags (DeRose, 1988) Easy task to do well on: Easy task to do well on:  90% accuracy for naïve algorithm (Charniak et al., 1993)

How Hard is Tagging? Multiple POS Multiple POS  Example:  still: noun, adjective, adverb, verb the still of the night, a glass stillthe still of the night, a glass still still watersstill waters stand stillstand still still strugglingstill struggling Still, I didn’t give wayStill, I didn’t give way still your fear of the dark (transitive)still your fear of the dark (transitive) the bubbling waters stilled (intransitive)the bubbling waters stilled (intransitive)

Penn TreeBank Tagset 48-tag simplification of Brown Corpus tagset 48-tag simplification of Brown Corpus tagset Examples: Examples: 1.CC Coordinating conjunction 3.DTDeterminer 7.JJAdjective 11.MDModal 12.NN Noun (singular,mass) 13.NNS Noun (plural) 27VB Verb (base form) 28VBD Verb (past)

Penn TreeBank Tagset

Penn TreeBank Tagset

Penn TreeBank Tagset How many tags? How many tags?  Tag criterion  Distinctness with respect to grammatical behavior?  Make tagging easier? Punctuation tags Punctuation tags  Penn Treebank numbers  Trivial computational task

Penn TreeBank Tagset Simplifications : Simplifications :  TO: infinitival marker, preposition  I want to win  I went to the store  IN (preposition): that, when, although  I know that I should have stopped, although…  I stopped when I saw Bill

Penn TreeBank Tagset Simplifications: Simplifications:  DT (determiner): any, some, these, those  any man  these *man/men  VBP (verb, present): am, are, walk  Am I here?  *Walked I here?/Did I walk here?

Hard to Tag Items Syntactic function Syntactic function  Example:  I saw the man tired from running Examples from Brown Corpus Manual Examples from Brown Corpus Manual  Hyphenation:  long-range, high-energy  shirt-sleeved  signal-to-noise  Foreign words:  mens sana in corpore sano