LING 388: Language and Computers Sandiway Fong Lecture 23: 11/15.

Slides:



Advertisements
Similar presentations
Language and Grammar Unit
Advertisements

Copy the following exactly as it is. DO NOT make corrections!
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING A comparative study of the tagging of adverbs in modern English corpora.
Who loves GRAMMAR?. So what exactly is GRAMMAR? There are 2 types of grammar. What are they? Prescriptive Descriptive.
4 Main Parts of Speech Nouns Verbs Adjectives Adverbs.
Chapter 8. Word Classes and Part-of-Speech Tagging From: Chapter 8 of An Introduction to Natural Language Processing, Computational Linguistics, and Speech.
BİL711 Natural Language Processing
Part of Speech Tagging Importance Resolving ambiguities by assigning lower probabilities to words that don’t fit Applying to language grammatical rules.
LING 388 Language and Computers Lecture 22 11/25/03 Sandiway FONG.
The Eight Parts of Speech
Sentence Structure By: Lisa Crawford, Edited by: UWC staff
LING 438/538 Computational Linguistics Sandiway Fong Lecture 22: 11/9.
 Christel Kemke 2007/08 COMP 4060 Natural Language Processing Word Classes and English Grammar.
Stemming, tagging and chunking Text analysis short of parsing.
LING 438/538 Computational Linguistics Sandiway Fong Lecture 20: 11/2.
POS based on Jurafsky and Martin Ch. 8 Miriam Butt October 2003.
NLP and Speech 2004 English Grammar
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Fall 2005-Lecture 2.
Matakuliah: G0922/Introduction to Linguistics Tahun: 2008 Session 10 Syntax 1.
1 CSC 594 Topics in AI – Applied Natural Language Processing Fall 2009/ Outline of English Syntax.
BIOI 7791 Projects in bioinformatics Spring 2005 March 22 © Kevin B. Cohen.
Chapter Section A: Verb Basics Section B: Pronoun Basics Section C: Parallel Structure Section D: Using Modifiers Effectively The Writer’s Handbook: Grammar.
An adjective describes a noun or a pronoun. An adjective answers: What kind? Which one? How many? Example: happy dog tired boy seven girls.
S.T.E.P. (Structured Tutoring for English Placement)
Chapter 2 A rapid overview.
Phrases and Sentences: Grammar
LING/C SC/PSYC 438/538 Lecture 23 Sandiway Fong. Administrivia Homework 4 – out today – due next Wednesday – (recommend you attempt it early) Reading.
Grammar and Composition Review
8. Word Classes and Part-of-Speech Tagging 2007 년 5 월 26 일 인공지능 연구실 이경택 Text: Speech and Language Processing Page.287 ~ 303.
Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.
I could never play football in the playground carefully last year.
Verb Forms and Related Matters
Parts of Speech Sudeshna Sarkar 7 Aug 2008.
1 CPE 641 Natural Language Processing Lecture 2: Levels of Linguistic Analysis, Tokenization & Part- of-speech Tagging Asst. Prof. Dr. Nuttanart Facundes.
Verbals English 11. Verbals Definition: A word that is formed from a verb but functions as a different part of speech. Verbals can function as nouns,
Distributional Part-of-Speech Tagging Hinrich Schütze CSLI, Ventura Hall Stanford, CA , USA NLP Applications.
IVAN CAPP The 8 Parts of Speech.
Dr. Monira Al-Mohizea MORPHOLOGY & SYNTAX WEEK 11.
NLP. Introduction to NLP Is language more than just a “bag of words”? Grammatical rules apply to categories and groups of words, not individual words.
10/30/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 7 Giuseppe Carenini.
Parts of Speech A Brief Review. Noun Person, Place, Thing, or Idea Common: begins with lower case letter (city) Proper: begins with capital letter (Detroit)
Word classes and part of speech tagging Chapter 5.
Tokenization & POS-Tagging
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging I Introduction Tagsets Approaches.
Word classes and part of speech tagging 09/28/2004 Reading: Chap 8, Jurafsky & Martin Instructor: Rada Mihalcea Note: Some of the material in this slide.
Natural Language Processing
Auckland 2012Kilgarriff: NLP and Corpus Processing1 The contribution of NLP: corpus processing.
Parts of Speech A Brief Review. Noun Person, Place, Thing, or Idea Common: begins with lower case letter (city) Proper: begins with capital letter (Detroit)
Part-of-speech tagging
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Spring 2006-Lecture 2.
Human Language Technology Part of Speech (POS) Tagging II Rule-based Tagging.
IF YOU KNOW THIS, YOU KNOW GRAMMAR Parts of Speech.
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
Basic Syntactic Structures of English CSCI-GA.2590 – Lecture 2B Ralph Grishman NYU.
PARTS OF SPEECH The 8 “building blocks” of the English language…
Word classes and part of speech tagging Chapter 5.
GRAMMAR AND PUNCTUATION REVISE AND REVIEW WORD CLASSES.
Part-of-Speech Tagging CSCI-GA.2590 – Lecture 4 Ralph Grishman NYU.
Parts of Speech Our First Unit in Grammar. What is a noun?
Grammar Parts of Speech. Nouns  A noun is the part of speech that names a person, place, thing or idea.  person – girl, man, James  place – school,
Chapter 5 English Syntax: The Grammar of Words. What is syntax? the study of the structures of sentences combining words to create ‘all & only’ ‘well-formed’
Lecture 1 Sentences Verbs.
Syntax Parts of Speech and Parts of the Sentence.
Sentence Structure By: Lisa Crawford, Edited by: UWC staff
Lecture 9: Part of Speech
If you know this, you know grammar
Sentence Structure By: Lisa Crawford, Edited by: UWC staff
LING/C SC/PSYC 438/538 Lecture 21 Sandiway Fong.
7 “building blocks” of the English language…
Natural Language Processing
Presentation transcript:

LING 388: Language and Computers Sandiway Fong Lecture 23: 11/15

Part-of-Speech (POS) Tagging Basic Idea: –assign the right part-of-speech tag, e.g. noun, verb, conjunction, to a word –useful for shallow parsing –or as first stage of a deeper/more sophisticated system Question: –Is it a hard task? i.e. can’t we just look the words up in a dictionary? Answer: –Yes. Ambiguity. –No. POS tagging programs typically claim 95%+ accuracy

POS Tagging Task: –assign the right part-of-speech tag to a word in context –not always easy Example: walk –the walk : noun I took … –I walk : verb 2 miles every day Example: still: noun, adjective, adverb, verb –the still of the night, a glass still –still waters –stand still –still struggling –Still, I didn’t give way –still your fear of the dark (transitive) –the bubbling waters stilled (intransitive)

POS Tagging Issues/Questions: –What are the parts of speech and subclasses that we might want to tag? –What does a typical tagset look like? –What methods can we use to assign tags?

Parts-of-Speech Divide words into classes based on grammatical function –nouns (open-class: unlimited set) referential items (denoting objects/concepts etc.) –proper nouns: John –pronouns: he, him, she, her, it –anaphors: himself, herself (reflexives) –common nouns: dog, dogs, water »number: dog (singular), dogs (plural) »count-mass distinction: many dogs, *many waters –eventive nouns: dismissal, concert, playback, destruction (deverbal) nonreferential items –it as in it is important to study –there as in there seems to be a problem –some languages don’t have these: e.g. Japanese open-class –factoid, , bush-ism

Parts-of-Speech Pronouns: 1.it 2.I 3.he 4.you 5.his 6.they 7.this 8.that 9.she 10.her 11.we 12.all 13.which 14.their 15.what

Parts-of-Speech Divide words into classes based on grammatical function –verbs (closed-class: fixed set) auxiliaries –be(passive, progressive) –have (pluperfect tense) –do(what did John buy?, Did Mary win?) –modals: can, could, would, will, may Irregular: –is, was, were, does, did

Parts-of-Speech Divide words into classes based on grammatical function –verbs (open-class: unlimited set) Intransitive –unaccusatives: arrive (achievement) –unergatives: run, jog (activities) Transitive –actions: hit (semelfactive: hit the ball for an hour) –actions: eat, destroy (accomplishment) –psych verbs: frighten (x frightens y), fear (y fears x) Ditransitive –put (x put y on z, *x put y) –give (x gave y z, *x gave y, x gave z to y) –load (x loaded y (on z), x loaded z (with y)) –Open-class: reaganize, , fax

Parts-of-Speech Divide words into classes based on grammatical function –adjectives (open-class: unlimited set) modify nouns black, white, open, closed, sick, well attributive: black (black car, car is black), main (main street, *street is main), atomic predicative: afraid (*afraid child, the child is afraid) stage-level: drunk (there is a man drunk in the pub) individual-level: clever, short, tall (*there is a man tall in the bar) object-taking: proud (proud of him,*well of him) intersective: red (red car: intersection of the set of red things and the set of cars) non-intersective: former (former architect), atomic (atomic scientist) comparative, superlative: blacker, blackest, *opener, *openest –open-class: hackable, spammable

Parts-of-Speech Divide words into classes based on grammatical function –adverbs (open-class: unlimited set) modify verbs (adjectives and other adverbs) manner: slowly (moved slowly) degree: slightly, more (more clearly), very (very bad), almost sentential: unfortunately, suddenly question: how temporal: when, soon, yesterday (noun?) location: sideways, here (John is here) –open-class: spam-wise

Parts-of-Speech Divide words into classes based on grammatical function –prepositions (closed-class: fixed set) –come before an object, assigns a semantic function (from Mars, *Mars from) head-final languages: postpositions (Japanese: amerika-kara) –location: on, in, by –temporal: by, until

POS Tagging Task: –assign the right part-of-speech tag, e.g. noun, verb, conjunction, to a word in context POS taggers –need to be fast in order to process large corpora should take no more than time linear in the size of the corpora –full parsing is slow e.g. context-free grammar  n 3, n length of the sentence –POS taggers try to assign correct tag without actually parsing the sentence

POS Tagging Components: –Dictionary of words Exhaustive list of closed class items –Examples: »the, a, an: determiner »from, to, of, by: preposition »and, or: coordination conjunction Large set of open class (e.g. noun, verbs, adjectives) items with frequency information

POS Tagging Components: –Mechanism to assign tags Context-free: by frequency Context: bigram, trigram, HMM, hand-coded rules –Example: »Det Noun/*Verb the walk… –Mechanism to handle unknown words (extra-dictionary) Capitalization Morphology: -ed, -tion

How Hard is Tagging? Brown Corpus (Francis & Kucera, 1982): –1 million words –39K distinct words –35K words with only 1 tag –4K with multiple tags (DeRose, 1988)

How Hard is Tagging? Easy task to do well on: –naïve algorithm assign tag by frequency –90% accuracy (Charniak et al., 1993)

Penn TreeBank Tagset 48-tag simplification of Brown Corpus tagset Examples: 1.CCCoordinating conjunction 3.DTDeterminer 7.JJAdjective 11.MDModal 12.NNNoun (singular,mass) 13.NNSNoun (plural) 27VBVerb (base form) 28VBDVerb (past)

Penn TreeBank Tagset

Penn TreeBank Tagset $

Penn TreeBank Tagset How many tags? –Tag criterion Distinctness with respect to grammatical behavior? –Make tagging easier? Punctuation tags –Penn Treebank numbers Trivial computational task

Penn TreeBank Tagset Simplifications : –Tag TO : infinitival marker, preposition I want to win I went to the store –Tag IN : preposition: that, when, although I know that I should have stopped, although… I stopped when I saw Bill

Penn TreeBank Tagset Simplifications: –Tag DT : determiner: any, some, these, those any man these *man/men –Tag VBP : verb, present: am, are, walk Am I here? *Walked I here?/Did I walk here?

Hard to Tag Items Syntactic Function –Example: resultative I saw the man tired from running Examples (from Brown Corpus Manual) –Hyphenation: long-range, high-energy shirt-sleeved signal-to-noise –Foreign words: mens sana in corpore sano

Rule-Based POS Tagging Example Systems –ENGCG (1,100 rules) –ENGCG-2 (4000 rules) Core Components –English morphological analyzer based on two-level morphology see last lecture –56K word stems –processing apply morphological engine get all possible tags for each word apply rules

Rule-Based POS Tagging Example: –Pavlov had shown that salivation can be a conditioned reflex

Rule-Based POS Tagging Examples of tags: –PCP2 past participle –SV subject verb –SVOO subject verb object object

Rule-Based POS Tagging Example: –it isn’t that:adv odd Rule: –given input “that” –if (+1 A/ADV/QUANT) (+2 SENT-LIM) (NOT -1 SVOC/A) –then eliminate non-ADV tags –else eliminate ADV tag

Rule-Based POS Tagging Now ENGCG-2 (4000 rules) –

Rule-Based POS Tagging Now ENGCG-2 (4000 rules) –

Rule-Based POS Tagging Best performance of all systems: 99.7%

Next Time Look at statistical techniques …