Linguistics Essentials Instructor: Rada Mihalcea Note: most of the material in this slide set was adapted from an NLP course taught by J. Hajic at Johns.

Slides:



Advertisements
Similar presentations
What you’ll need to know for Freshman DGP
Advertisements

The NOUN 1 General characteristics and classification
Statistical NLP: Lecture 3
Chapter 4 Basics of English Grammar
Ana Bertha Camargo Mejía
The Eight Parts of Speech
1 Words and the Lexicon September 10th 2009 Lecture #3.
 Christel Kemke 2007/08 COMP 4060 Natural Language Processing Word Classes and English Grammar.
Syntax Lecture 4.
NLP and Speech 2004 English Grammar
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Fall 2005-Lecture 2.
Linguisitics Levels of description. Speech and language Language as communication Speech vs. text –Speech primary –Text is derived –Text is not “written.
1 CSC 594 Topics in AI – Applied Natural Language Processing Fall 2009/ Outline of English Syntax.
Parts of Speech. Eight parts of speech Nouns Verbs Adjectives Adverbs Pronouns Prepositions Conjunctions Interjections.
Chapter 2 A rapid overview.
PARTS OF SPEECH General Survey. The problem of parts of speech causes great controversies both in general linguistic theory and in the analysis of separate.
Phrases & Clauses.
Syntax The number of words in a language is finite
Embedded Clauses in TAG
Context Free Grammars Reading: Chap 12-13, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
Linguistics Essentials Instructor: Paul Tarau based on Rada Mihalcea’s original slides Note: most of the material in this slide set was adapted from an.
 Noun  Person, place, thing, idea  Common: begins with lower case letter (city)  Proper: begins with capital letter (Detroit)  Possessive: shows ownership.
Linguistic Essentials
Parts of Speech.
Chapter 4 Basics of English Grammar Business Communication Copyright 2010 South-Western Cengage Learning.
THE PARTS OF SPEECH. PART OF SPEECH  All words serve a particular function in a sentence.  A word’s function is determined by what “part of speech”
The 8 Principal Parts of Speech
Introduction to English Syntax Level 1 Course Ron Kuzar Department of English Language and Literature University of Haifa Chapter 2 Sentences: From Lexicon.
Linguistic levels of structure Sound Phoneme Morpheme Word Phrase Clause Sentence Meaning ð iː z b juː t ə f ʊ l w ɪ m ɪ n s ɛ d w iː w ɜː t r uː m ɛ n.
Lecture 6 Verb and verb phrase
ASPECTS OF LINGUISTIC COMPETENCE 4 SEPT 09, 2013 – DAY 6 Brain & Language LING NSCI Harry Howard Tulane University.
Dr. Monira Al-Mohizea MORPHOLOGY & SYNTAX WEEK 11.
1 Introduction to Natural Language Processing ( ) Linguistic Essentials: Syntax AI-lab
Lecture 3, 7/27/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 3 27 July 2005.
DGP MONDAY NOTES (Parts of Speech) NOUNPRONOUNADVERB ADJECTIVE PREPOSITIONS CONJUNCTION VERB VERBAL.
English Review for Final These are the chapters to review. In Textbook: Chapter 1 Nouns Chapter 2 Pronouns Chapter 3 Adjectives Chapter 4 Verbs Chapter.
Chapter 5 Syntax English Linguistics: An Introduction.
English Review for Final These are the chapters to review. In Textbook: Chapter 1 Nouns Chapter 2 Pronouns Chapter 3 Adjectives Chapter 4 Verbs Chapter.
Parts of Speech Notes. Part of Speech: Nouns  A naming word  Names a person, place, thing, idea, living creature, quality, or idea Examples: cowboy,
Parts of Speech A Brief Review. Noun Person, Place, Thing, or Idea Common: begins with lower case letter (city) Proper: begins with capital letter (Detroit)
Linguistic Essentials
Grammar Parts of Speech Eight Parts of Speech Nouns Pronouns Adjectives Adverbs Conjunctions Prepositions Verbs Interjections.
Parsing and Translating
Parts of Speech A Brief Review. Noun Person, Place, Thing, or Idea Common: begins with lower case letter (city) Proper: begins with capital letter (Detroit)
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Spring 2006-Lecture 2.
Leonid Iomdin Institute for Information Transmission Problems, Russian Academy of Sciences
SYNTAX.
◦ Process of describing the structure of phrases and sentences Chapter 8 - Phrases and sentences: grammar1.
English Grammar PARTS OF SPEECH.
SYNTAX 1 NOV 9, 2015 – DAY 31 Brain & Language LING NSCI Fall 2015.
NLP 2 주차 강의 Linguistic Essentials (Ch 3). Competence and Performance Innate  Learning, Categorical  Statistical –CFG (Context free grammar) Performance.
Unit 1 Language Parts of Speech. Nouns A noun is a word that names a person, place, thing, or idea Common noun - general name Proper noun – specific name.
Parts of speech. Eight parts of speech: Noun pronoun verb adjective adverb preposition conjunction and interjection.
The Eight Parts of Speech Yes!! Awesome!! Finally!! English is so much fun!!
 Nouns name persons, places, things, or ideas. 1. Proper: CAPITAL LETTERS  Montana, Sally, United States of America 2. Common: no capital.
Writing 2 ENG 221 Norah AlFayez. Lecture Contents Revision of Writing 1. Introduction to basic grammar. Parts of speech. Parts of sentences. Subordinate.
Lecture 1 Sentences Verbs.
SPAG Parent Workshop April Agenda English and the new SPaG curriculum How to help your children at home How we teach SPaG Sample questions from.
Linguistics Essentials
The theory of word classes in modern grammar studies
Introduction to Linguistics
Non-finite forms of the verb
Statistical NLP: Lecture 3
Revision Outcome 1, Unit 1 The Nature and Functions of Language
What is a sentence? A sentence is a group of words that expresses a complete thought. Ex. This gift is for you. Every sentence has two parts: Subject Predicate.
The Eight Parts of Speech
Introduction to Linguistics
DGP TUESDAY NOTES (Parts of Speech)
Linguistic Essentials
Vocabulary/Lexis LEXIS: n., collective, uncountable
Presentation transcript:

Linguistics Essentials Instructor: Rada Mihalcea Note: most of the material in this slide set was adapted from an NLP course taught by J. Hajic at Johns Hopkins University

Slide 1 The Description of Language Language = Words and Rules  Dictionary (vocabulary) + Grammar Dictionary set of words defined in the language open (dynamic) Traditional paper based Electronic machine readable dictionaries; can be obtained from paper-based Grammar set of rules which describe what is allowable in a language Classic Grammars meant for humans who know the language definitions and rules are mainly supported by examples no (or almost no) formal description tools; cannot be programmed Explicit Grammar (CFG, Dependency Grammars, Link Grammars,...) formal description can be programmed & tested on data (texts)

Slide 2 Levels of (Formal) Description 6 basic levels (more or less explicitly present in most theories) : and beyond (pragmatics/logic/...) meaning (semantics) (surface) syntax morphology phonology phonetics/orthography Each level has an input and output representation output from one level is the input to the next (upper) level sometimes levels might be skipped (merged) or split

Slide 3 Phonetics/Orthography Input: acoustic signal (phonetics) / text (orthography) Output: phonetic alphabet (phonetics) / text (orthography) Deals with: Phonetics: consonant & vowel (& others) formation in the vocal tract classification of consonants, vowels,... in relation to frequencies, shape & position of the tongue and various muscles intonation Orthography: normalization, punctuation, etc.

Slide 4 Phonology Input: sequence of phones/sounds (in a phonetic alphabet); or “normalized” text (sequence of (surface) letters in one language’s alphabet) [NB: phones vs. phonemes] Output: sequence of phonemes (~ (lexical) letters; in an abstract alphabet) Deals with: relation between sounds and phonemes (units which might have some function on the upper level) e.g.: [u] ~ oo (as in book), [æ] ~ a (cat); i ~ y (flies)

Slide 5 Morphology Input: sequence of phonemes (~ (lexical) letters) Output: sequence of pairs (lemma, (morphological) tag) Deals with: composition of phonemes into word forms and their underlying lemmas (lexical units) + morphological categories (inflection, derivation, compounding) e.g. quotations ~ quote/V + -ation(der.V->N) + NNS.

Slide 6 (Surface) Syntax Input: sequence of pairs (lemma, (morphological) tag) Output: sentence structure (tree) with annotated nodes (all lemmas, (morphosyntactic) tags, functions), of various forms Deals with: the relation between lemmas & morphological categories and the sentence structure uses syntactic categories such as Subject, Verb, Object,... e.g.: I/PP1 see/VB a/DT dog/NN ~ ((I/sg)SB ((see/pres)V (a/ind dog/sg)OBJ)VP)S

Slide 7 Meaning (semantics) Input: sentence structure (tree) with annotated nodes (lemmas, (morphosyntactic) tags, surface functions) Output: sentence structure (tree) with annotated nodes (semantic lemmas, (morpho- syntactic) tags, deep functions) Deals with: relation between categories such as “Subject”, “Object” and (deep) categories such as “Agent”, “Effect”; adds other cat’s e.g. ((I)SB ((was seen)V (by Tom)OBJ)VP)S ~ (I/Sg/Pat/t (see/Perf/Pred/t) Tom/Sg/Ag/f)

Slide 8...and Beyond Input: sentence structure (tree): annotated nodes (autosemantic lemmas, (morphosyntactic) tags, deep functions) Output: logical form, which can be evaluated (true/false) Deals with: assignment of objects from the real world to the nodes of the sentence structure e.g.: (I/Sg/Pat/t (see/Perf/Pred/t) Tom/Sg/Ag/f) ~ see( Mark-Twain[SSN:...],Tom-Sawyer[SSN:...] ) [Time:bef 99/9/27/14:15][Place:39ş19’40”N76ş37’10”W]

Slide 9 Phonology (Surface « Lexical) Correspondence “symbol-based” (no complex structures) Ex.: (stem-final change) lexical: b a b y + s (+ denotes start of ending) surface: b a b i e s (phonetic-related: b é b ì 0s ) Arabic: (interfixing, inside-stem doubling) lexical: kTb+uu+CVCCVC (CVCC...vowel/consonant pattern) surface: kuttub

Slide 10 Phonology Examples German (umlaut) ( satz ~ sentence ) lexical: s A t z + e (A denotes “umlautable” a) surface: s ä t z e (phonetic: z æ c e, vs. zac ) Turkish (vowel harmony) lexical: e v + l A r (~house) surface: e v l e r

Slide 11 Morphology: Morphemes & Order Handles what is an isolated form in written text Grouping of phonemes into morphemes sequence deliverables  deliver, able and s (3 units) could as well be some “ID” numbers: e.g. deliver ~ 23987, s ~ 12, able ~ 3456 Morpheme Combination certain combinations/sequencing possible, other not: deliver+able+s, but not able+derive+s; noun+s, but not noun+ing typically fixed (in any given language)

Slide 12 Morphology: From Morphemes to Lemmas & Categories Lemma: lexical unit, “pointer” to lexicon might as well be a number, but typically is represented as the “base form”, or “dictionary headword” possibly indexed when ambiguous/polysemous: state 1 (verb), state 2 (state-of-the-art), state 3 (government) from one or more morphemes (“root”, “stem”, “root+derivation”,...) Categories: non-lexical small number of possible values (< 100, often < 5-10)

Slide 13 Morphology Level: The Mapping Formally: A +    2 (L,C 1,C 2,...,Cn) A is the alphabet of phonemes (A + denotes any non-empty sequence of phonemes) L is the set of possible lemmas, uniquely identified C i are morphological categories, such as: grammatical number, gender, case person, tense, negation, degree of comparison, voice, aspect,... tone, politeness,... part of speech (not quite morphological category, but...) 2 (L,C1,C2,...,Cn) denotes the power set of (L,C 1,C 2,...,C n ) A, L and C i are obviously language-dependent

Slide 14 The Dictionary (or Lexicon) Repository of information about words: Morphological: description of morphological “behavior”: inflection patterns/classes Syntactic: Part of Speech relations to other words: subcategorization (or “surface valency frames”) Semantic: semantic features frames...and any other! (e.g., translation)

Slide 15 The Categories: Part of Speech: Open and Closed Categories Part of Speech - POS (pretty much stable set across languages) morphological “behavior” is typically consistent within a POS category Open categories: (“open” to additions) verb, noun, pronoun, adjective, numeral, adverb subject to inflection (in general); subject to cross-category derivations newly coined words always belong to open POS categories potentially unlimited number of words Closed categories: preposition, conjunction, article, interjection, clitic, particle not a base for derivation (possibly only by compounding) finite and (very) small number of words

Slide 16 The Categories: Part of Speech, Open Categories: Nouns Nouns: typically refer to entities Inflection: numbersingular, plural genderfeminine, masculine, neuter casenominative, genitive, accusative, dative, vocative semantic classification: human/animal/(non-living) things: driver/bird/stone concrete/abstract: computer/thought common/proper: table/Microsoft syntactic classification: countable/uncountable: book, water morphological classification: pluralia/singularia tantum: data (is), police (are) “adverbial” nouns: afternoon, home, east (no inflection)

Slide 17 The Categories: Part of Speech, Open Categories: Verbs Verbs: Inflectional: subject numbersingular, plural subject personfirst (I read), second (you read), … tensepresent tense, past tense … aspectprogressive, perfect modalitypossibility, … voiceactive, passive syntactic/semantic: classification: ordinary: (to) speak, (to) write auxiliaries: be, have, will, would, do, go (going) modals: can, could, may, should, must, want phasal: begin, end, start morphological classification conjugation type: regular/irregular, (Ge.: weak/strong/irregular) conjugation class: (e.g. Italian: -are, -ere, -ire …)

Slide 18 The Categories: Part of Speech, Open Categories: Pronouns Pronouns: Inflectional : number, person, gender, case much like nouns (syntactic usage also similar) (pro)noun ~ “stands for” a noun classification (mostly syntactic/semantic): personal: I, you, she, she, it, we, you, they demonstrative: this, that possessive: my, your, her, his, its, our, their; mine, yours, ours,... reflexive: myself, yourself, herself,..., oneself interrogative: what, which, who, whom, whose, that indefinite (“nominal”): somebody, something, one

Slide 19 The Categories: Part of Speech, Open Categories: Adjectives Adjectives: describe properties of nouns Inflectional : degree of comparison (comparative/superlative), number, gender, case classification: ordinary: new, interesting, [test (equipment)] possessive: John’s, driver’s proper: Appalachian (Mountains) often derived from verbs/nouns: teaching (assistant), trendy, stylish morphological classification degrees of comparison (En.: big, bigger, biggest) usually requires agreement with the noun

Slide 20 The Categories: Part of Speech, Open Categories: Adverbs Adverbs: modify a verb, and specify place, time, manner, degree Inflectional: degree of comparison derivation from adjectives is common: new   newly, interesting   interestingly non-derived adverbs: ordinary: so, well, just, too, then, often, there wh-adverbs (interrogative): why, when, where, how degree adverbs/qualifiers: very, too morphological classification (not much, really...) degree of comparison: well, better, best soon, sooner

Slide 21 The Categories: Part of Speech, Open Categories: Numerals Numerals: used to indicate numbers inflectional : number, gender, case, negation open (infinite?) category: compounding (Ge.: einundzwanzig, 21) classification: cardinals: one, five, hundred NB: million etc. often considered noun ordinals: first, second, thirtieth quantifiers: all, many, some, none multiplicative: times, twice multilateral: single, triple, twofold morphological classification: as nouns/adjectives; many irregulars

Slide 22 The Categories: Part of Speech, Closed Categories Closed categories: preposition, conjunction, article, interjection, clitic, particle Morphological behavior: indeclinable preposition: of, without, by, to; conjunction: coordinating: and, but, or, however subordinating: that, if, because, before, after, although, as Article (determiner): a, an, the interjection: wow, eh, hello; clitic: ‘s; may be attached to whole phrases (at the end) particle: yes, no, not; to (+verb); many (otherwise) prepositions if part of phrasal verbs, e.g. (look) up

Slide 23 The Categories: Number and Gender Grammatical Number: Singular, Plural nouns, pronouns, verbs, adjectives, numerals computer / computers; (he) goes / (they) go In some languages (Arabic): Dual (nouns, pronouns, adjectives) Grammatical Gender: Masculine, Feminine, Neuter nouns, pronouns, verbs, adjectives, numerals he/she/it; nouns: (mostly) do not change gender for a single lexical unit Also: animate/inanimate (gram., some genders), etc. Mädchen (Ge.; girl, neuter); děti (Cz.; children, masc. inanim.)

Slide 24 The Categories: Case Case English: only personal pronouns/possessives, 2 forms other languages: 4 (German), 6 (Russian), 7 (Czech,Slovak,...), 5 (Romanian) nouns, pronouns, adjectives, numerals most common cases (forms in singular/plural) nominative I/we (work) eu/noi (Ro) genitive (picture of) me/us a mea/al meu dative (give to) me/us mie accusative (see) me/us pe mine vocative you! tu! locative (about) me/us(Czech) instrumental (by) me/us (Czech)

Slide 25 The Categories: Person, Tense Person verbs, personal pronouns 1st, 2nd, 3rd: (I) go, (you) go, (he) goes; (we) go, (you) go, (they) go merg, mergi, merge mergem mergeti merg (Ro) Tense (Ro) (Pol.: go) past: (you) went ai mersszliście present: (you pl.) gomergeti idziecie future (!if not “analytical”) (you) will goveti merge - concurrent (gerund) going mergind idąc

Slide 26 Note on Tense Examples of (traditional) tense ( synthetical and analytical ): infinitive: (to) write (tenseless, personless,..., except negation (Cz.)) simple present/past: (I) write/(she) writes; (I,she) wrote progressive present/past: (I) am writing; (I) was writing perfect present/past: (I) have written; (I) had written all in passive voice (cf. later), too: (the book) is being/has been/had been written etc. all in conditional mood, too (mood: in Eng. not a morph. category!) (the book) would have been written

Slide 27 The Categories: Voice & Aspect Voice active vs. passive (I) drive / (I am being) driven (Ich) setzte (mich) / (Ich bin) gesetzt (Ge.: to sit down) Aspect imperfective vs. perfective: пoкупал / купил (Ru.: I used to buy, I was buying) / I (have) bought) imperfective continuous vs. iterative (repeating) spal / spával (Cz.: I was sleeping / I used to sleep (every...))

Slide 28 The Categories: Negation, Degree of Comparison Negation: even in English: impossible (~ not possible) Cz: every verb, adjective, adverb, some nouns; prefix ne- It: some adjectives: irregular negation (s-, non ) Degree of Comparison (non-analytical): adjectives, adverbs: positive (big), comparative (bigger), superlative (biggest) Pol.: (new) nowy, nowszy, najnowszy Combination (by prefixing): order? both possible: (neg.: Cz./Pol.: ne-/nie-, sup.: nej-/naj-) Cz.: nejnemožnější (the most impossible) Pol.: nienajwierniejszy (the most unfaithful)

Slide 29 Typology of Languages By morphological features Analytical: using (function) words to express categories English, also French, Italian,..., Japanese, Chinese I would have been going ~ (Pol.) szłabym Inflective: using prefix/suffix/infix, combines several categories Slavic: Czech, Russian, Polish,... (not Bulgarian); also French, German; Arabic Latin/Slavic: Romanian (Cz. new(acc.)) novou (Adj, Fem., Sg., Acc., Non-neg., Pos.) Agglutinative: one category per (non-lexical) morpheme Finnish, Turkish, Hungarian (Fin. plural): -i-

Slide 30 Categories & Tags Tagset: list of all possible combinations of category values for a given language typically string of letters & digits: compact system: short idiosyncratic abbreviations: NNS (gen. noun, plural) positional system: each position i corresponds to C i : AAMP3----2A---- (gen. Adj., Masc., Pl., 3rd case (dative), comparative (2nd degree of comparison), Affirmative (no negation)) tense, person, variant, etc.: N/A (marked by “empty position”, or ‘-’) Famous tagsets: Brown, Penn, Multext[-East],...

Slide 31 Words, Phrases, Clauses, Sentences Words smallest units on the syntax level function/semantic Phrases consist of words and/or phrases; “constituents” Clauses have predicative meaning (single predicate) Sentences consist of clauses (one or more)

Slide 32 Words lexical units auxiliary (function) words: have grammatical function have meaning idioms fixed phrases (non-compositional) “hot dog”, “kick the bucket” Relate to other words dictionary: repository of information for each words about its (idiosyncratic) relations to other words

Slide 33 Phrases sequences of words and/or phrases (i.e. of constituents) may be discontinuous, sometimes Types of Phrases: Simple/Clausal (i.e. clauses, which consist of phrases, behave like phrases... recursively!) According to head type: Noun phrase: a new book Adjective phrase: brand new Adverbial phrase: so much Prepositional phrase: in a class Verb phrase: catch a ball

Slide 34 Noun Phrases Head: noun water a book new ideas that small village The greatest rise of interest rates since W.W.II within a single year an operating system which, despite great efforts on the part of our administrators, fails all too often

Slide 35 Adjective Phrases Head: adjective Simple APs very common, complex APs rare old very old really very old five times older than the oldest elephant in our ZOO (was) sure, as far as I know, to be there first

Slide 36 Adverbial and Numerical Phrases Head: adverb three times as much quickly really (... speaks) more loudly than anybody could imagine yesterday Numerical Phrases (... lasted) three hours twenty-two

Slide 37 Prepositional Phrases Head: preposition In fact, play the role of Adverbial Phrases often in the City at five o’clock to a brightest future without a glitch to the point where neither of them could get out of it up to five points instead of Charles

Slide 38 Verb Phrases Head: verb (It) rains... could ever see a large Unidentified Flying Object..., why (we) have got so much rain Please! On Sunday, (he) was driven to the hospital (It) began to snow (...) prohibits smoking in this area

Slide 39 Coordination of Phrases “Head”: conjunction, punctuation and, or, but cats and dogs new or even newer quickly and precisely he came to the conclusion that it makes no sense to hide himself anymore and therefore we could hear him today (flights) from and to Dallas eat your lunch now or at the picnic table

Slide 40 Ellipsis Word or Phrase missing where one would normally expect one; often happens in dialogues Whom did you see there? Peter. ?? verb ?? Most common in coordination (written text) Pittsburgh leads 4-0 but Detroit only 3-1. ??verb in 2 nd part?? Systematic in many languages: pro-drop (leave out a personal pronoun in the Subject position – it’s already indicated by the verb) [She] Passed the exam easily.

Slide 41 Clauses Predicative function: some activity of some subjects/objects, somewhere in time, under certain circumstances Main clause not part of a greater clause Embedded clause part of other clause, having some function (like a phrase) A tile falling from the roof nearly killed him. He fell asleep while listening to the news. Function of a Clause same as for phrase, plus some (direct speech etc.)

Slide 42 Sentences Consist of a single or several main clauses If several main clauses: coordination, much like coordinated phrases more coordinating conjunctions: and, or, but, (and) therefore,... In written text, starts with a capital letter Ends by period/question mark/exclamation mark not all periods end a sentence! – example? Sometimes even semicolon (;) might be a sentence break (...vague)

Slide 43 Syntax: Representation Tree structure (“tree” in the sense of graph theory) one tree per sentence Two main ideas for the shape of the tree: phrase structure (~ derivation tree, cf. parsing later) using bracketed grouping brackets annotated by phrase type heads (often) explicitly marked dependency structure (lexical relations “local”, functions) basic relation: head (governor) - dependent links (edges) annotated by syntactic function (Sb, Obj,...) phrase structure: implicitly present

Slide 44 Phrase Structure Tree Example: ((DaimlerChrysler’s shares) NP (rose (three eights) NUMP (to 22) PP-NUM ) VP ) S

Slide 45 Dependency Tree Example: rose Pred (shares Sb (DaimlerChrysler’s Atr ),eights Adv (three Atr ),to AuxP (22 Adv ))