CIS 8590 – Fall 2008 NLP 1 Introduction to Natural Language Processing (aka, Computational Linguistics) Slides by me, Martha Palmer, Eleni Miltsakaki,

Slides:



Advertisements
Similar presentations
The Structure of Sentences Asian 401
Advertisements

Introduction to Natural Language Processing A.k.a., “Computational Linguistics”
ASPECTS OF LINGUISTIC COMPETENCE 5 SEPT 11, 2013 – DAY 7 Brain & Language LING NSCI Harry Howard Tulane University.
Statistical NLP: Lecture 3
Introduction to phrases & clauses
Chapter Chapter Summary Languages and Grammars Finite-State Machines with Output Finite-State Machines with No Output Language Recognition Turing.
MORPHOLOGY - morphemes are the building blocks that make up words.
LING NLP 1 Introduction to Computational Linguistics Martha Palmer April 19, 2006.
For Monday Read Chapter 23, sections 3-4 Homework –Chapter 23, exercises 1, 6, 14, 19 –Do them in order. Do NOT read ahead.
1 Words and the Lexicon September 10th 2009 Lecture #3.
Introduction to Linguistics and Basic Terms
Introduction to Computational Linguistics Lecture 2.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Fall 2005-Lecture 2.
1 CSC 594 Topics in AI – Applied Natural Language Processing Fall 2009/ Outline of English Syntax.
Models of Generative Grammar Smriti Singh. Generative Grammar  A Generative Grammar is a set of formal rules that can generate an infinite set of sentences.
Constituency Tests Phrase Structure Rules
THE PARTS OF SYNTAX Don’t worry, it’s just a phrase ELL113 Week 4.
11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.
Syntax The number of words in a language is finite
Context Free Grammars Reading: Chap 12-13, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
Phonological Rules Rules about how sounds may or may not go together in a language English: Words may not start with two stop consonants German: Devoicing.
9/8/20151 Natural Language Processing Lecture Notes 1.
1.Syntax: the rules of sentence formation; the component of the mental grammar that represent speakers’ knowledge of the structure of phrase and sentence.
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
1 Computational Linguistics Ling 200 Spring 2006.
ASPECTS OF LINGUISTIC COMPETENCE 4 SEPT 09, 2013 – DAY 6 Brain & Language LING NSCI Harry Howard Tulane University.
Phonemes A phoneme is the smallest phonetic unit in a language that is capable of conveying a distinction in meaning. These units are identified within.
1. Information Conveyed by Speech 2. How Speech Fits in with the Overall Structure of Language TWO TOPICS.
Natural Language Processing Lecture 6 : Revision.
GRAMMARS David Kauchak CS159 – Fall 2014 some slides adapted from Ray Mooney.
1 LIN 1310B Introduction to Linguistics Prof: Nikolay Slavkov TA: Qinghua Tang CLASS 13, Feb 16, 2007.
Introduction to CL & NLP CMSC April 1, 2003.
GrammaticalHierarchy in Information Flow Translation Grammatical Hierarchy in Information Flow Translation CAO Zhixi School of Foreign Studies, Lingnan.
Levels of Language 6 Levels of Language. Levels of Language Aspect of language are often referred to as 'language levels'. To look carefully at language.
Context Free Grammars Reading: Chap 9, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Rada Mihalcea.
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
For Wednesday Read chapter 23 Homework: –Chapter 22, exercises 1,4, 7, and 14.
Linguistic Essentials
CPE 480 Natural Language Processing Lecture 4: Syntax Adapted from Owen Rambow’s slides for CSc Fall 2006.
Rules, Movement, Ambiguity
PARSING 2 David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
Natural Language Processing
Deep structure (semantic) Structure of language Surface structure (grammatical, lexical, phonological) Semantic units have all meaning components such.
WHAT IS LANGUAGE?. INTRODUCTION In order to interact,human beings have developed a language which distinguishes them from the rest of the animal world.
Natural Language Processing Chapter 2 : Morphology.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Spring 2006-Lecture 2.
1 Syntax 1. 2 In your free time Look at the diagram again, and try to understand it. Phonetics Phonology Sounds of language Linguistics Grammar MorphologySyntax.
SYNTAX.
Levels of Linguistic Analysis
◦ Process of describing the structure of phrases and sentences Chapter 8 - Phrases and sentences: grammar1.
1 LIN 1310B Introduction to Linguistics Prof: Nikolay Slavkov TA: Qinghua Tang CLASS 11, Feb 9, 2007.
CSA3050: NLP Algorithms Sentence Grammar NLP Algorithms.
NATURAL LANGUAGE PROCESSING
Language Structure Lecture 1: Introduction & Overview Helena Frännhag Spring 2013.
College of Science and Humanity Studies, Al-Kharj.
King Faisal University جامعة الملك فيصل Deanship of E-Learning and Distance Education عمادة التعلم الإلكتروني والتعليم عن بعد [ ] 1 King Faisal University.
Natural Language Processing Vasile Rus
Introduction to Linguistics
Statistical NLP: Lecture 3
SYNTAX.
Syntax.
BBI 3212 ENGLISH SYNTAX AND MORPHOLOGY
Natural Language - General
Introduction to Linguistics
Levels of Linguistic Analysis
Linguistic Essentials
Artificial Intelligence 2004 Speech & Natural Language Processing
Presentation transcript:

CIS 8590 – Fall 2008 NLP 1 Introduction to Natural Language Processing (aka, Computational Linguistics) Slides by me, Martha Palmer, Eleni Miltsakaki, Dan Jurafsky, Tarkan Kacmaz, and others

Overview General Methods in NLP (3 weeks) –Low-level NLP problems and techniques –Graphical Models for NLP –Text Mining basics Information Retrieval Overview (2 weeks) Information Extraction Overview (2 weeks) Selected topics in Information Extraction (4 weeks) CIS 8590 – Fall 2008 NLP 2

Practical Matters Prereqs: General understanding of probability and statistics Grading: 20% In-class participation, paper presentations 30% Course projects 50% Midterm (and possibly quizzes) Course projects –Please start finding project partners soon! –I will supply some ideas for projects later CIS 8590 – Fall 2008 NLP 3

WHAT IS LANGUAGE?

When we study human language, we are approaching what some might call the “human essence”, the distinctive qualities of mind that are, so far as we know, unique to man. Noam Chomsky

WHAT IS LANGUAGE? Definition with respect to form: Language is a system of speech symbols. It is realized acoustically (sound waves), visually-spatially (sign language) and in written form. Definition with respect to function: Language is the most important means of human communication. It is used to convey and exchange information (informative function) Multiplicity of languages: We know of about 7000 languages, which is about 1% of all the languages that ever existed.

LANGUAGE AND THE BRAIN

THEORIES OF LANGUAGE Noam Chomsky claims that language is innate. B. F. Skinner claims that language is learned; it is basically a stimulus-response mechanism.

WHAT IS GRAMMAR ? When we learn a language we also learn the rules that govern how language elements, such as words, are combined to produce meaningful language. These elements and rules constitute the Grammar of a language. The Grammar is “what we know” Grammar represents our linguistic competence.

DESCRIPTIVE vs PRESCRIPTIVE GRAMMAR Prescriptive (should be) Descriptive (is)

Areas of Linguistics phonetics - the study of speech sounds phonology - the study of sound systems morphology- the rules of word formation syntax - the rules of sentence formation semantics - the study of word meanings pragmatics – the study of discourse meanings sociolinguistics - the study of language in society applied linguistics –the application of the methods and results of linguistics to such areas as language teaching, national language policies, lexicography, translation, language in politics etc.

What is phonetics? Phonetics is the science of speech. We all speak. But how many of us know how we speak? Or what speech is like? Phonetics seeks to answer those questions.

Orthography and Sounds The English language is not phonetic. Words are not spelled as they are pronounced There is no one to one correspondence between the letters and the sounds or phonemes.

Orthography and Sounds Did he believe that Caesar could see the people seize the seas. The silly amoeba stole the key to the machine

Articulatory Phonetics The production of any speech sound involves the movement of an air stream. Most speech sounds are produced by pushing the air out of the lungs through the mouth (oral) and sometimes through the nose (nasal).

SPEECH ORGANS

Phonology Phonology deals with the system and pattern of speech sounds in a language. Phonology of a language is the system and pattern of speech sounds.

Phonology Phonological knowledge permits us to: produce sounds which form meaningful utterances, to recognize a “foreign” accent, to make up new words, To know what is or is not a sound in one’s language to know what different sound strings may represent

Phonetics vs Phonology Phonetics The study of speech sounds. Phonology The study of the way speech sounds form patterns.

Sequences of Phonemes b l ı kI b k ı k l ı bı l b k b ı l kb k ı l k ı l bı b l k k k b b l l ı ı possibleimpossible “I just bought a beautiful new blick” What is a blick? “I just bought a beautiful new bkli” WHAT!!

Sequences of Phonemes Your knowledge of English “tells” you that certain strings of phonemes are permissible and others are not. That’s why /bkli/ does not sound like an English word. It violates the restrictions on the sequencing of phonemes; i.e. it violates the phonological rules of English.

Delete a word-final /b/ when it occurs after a /m/ as in:But not! Rules of Phonology bomb crumb lamb tomb bombard crumble limber tumble

Morphology & Syntax Morphology deals with the combination of morphemes into words. Syntax deals with the combination of words into sentences.

What is the meaning of ‘meaning’? Learning a language includes learning the “agreed upon” meanings of certain strings of sounds and, Learning how to combine these meaningful units into larger units which also convey meaning.

Morphemes Morpheme is the smallest linguistic unit that has meaning. Morpheme is a grammatical unit in which there is an arbitrary union of sound and a meaning and, which cannot be further analysed.

Morphemes A morpheme may be represented by a single sound: e.g. the plural morpheme [s] in cat+s A morpheme may be represented by a syllable (monosyllabic): e.g. child+ish

Morphemes A morpheme may be represented by more than one syllable (polysyllabic): e.g. lady, water or three syllables: e.g. crocodile or four syllables: e.g. salamander

32 Words Two basic ways to form words –Inflectional (e.g. English verbs) Open + ed = opened Open + ing = opening –Derivational (e.g. adverbs from adjectives, nouns from adjectives) Happy  happily Happy  happiness (nouns from adjectives)

33 Syntax The study of classes of words and the rules that govern how the words can combine to make phrases and sentences.

34 Basic classes of words Classes of words aka parts of speech (POS) –Nouns –Verbs –Adjectives –Adverbs The above classes of word belong to the type open class words We also have closed class words –Articles, pronouns, prepositions, particles, quantifiers, conjunctions

35 Basic phrases A word from an open class can be used to form the basis of a phrase The basis of a phrase is called the head

36 Examples of phrases Noun phrases –The manager of the institute –Her worry to pass the exams –Several students from the English Department Adjective phrases –easy to understand –mad as a dog –glad that he passed the exam

37 Examples of phrases Adverb phrases –fast like the wind –outside the building Verb phrases –ate her sandwich –went to the doctor –believed what I told him

38 “Complements” Notice that to be meaningful the verb “go”, for example requires a phrase for “location” –*John went –John went home Such phrases “complete” the meaning of the verb (or other type of head) and are called complements

39 Inside the noun phrase NPs are used to refer to things: objects, places, concepts, events, qualities, etc NPs may consist of: –A single pronoun (he, she, etc) –A name or proper noun (John, Athens, etc) –A specifier and a noun –A qualifier and a noun –A specifier and a qualifier and a noun (e.g., the first three winners)

40 Specifiers Specifiers indicate how many objects are described and also how these objects relate to the speaker Basis types of specifiers –Ordinals (e.g., first, second) –Cardinals (e.g., one, two) –Determiners (see next slide)

41 Determiners Basic types of determiners –Articles (the, a, an) –Demonstratives (this, that, these, those) –Possessives (‘s, her, my, whose, etc) –Wh-determiners (which, what –in questions) –Quantifying determiners (some, every, most, no, any, etc.)

42 Qualifiers Basic types of qualifiers –Adjectives Happy cat Angry feelings –Noun modifiers Cook book University hospitals

43 Inside the verb phrase A simple VP –Adverbial modifier + head verb + complements Types of verbs –Auxiliary (be, do, have) –Modal (will, can, could) –Main (eat, work, think)

44 Types of verb complements Intransitive verbs do not require complements Transitive verbs require an object as a complement (e.g. find a key) Transitive verbs allow passive forms (e.g. a key was found) Ditransitive verbs require one direct and on indirect object (e.g. give Mary a book)

45 Other verb complements Clausal complements –Some verbs require clausal complements Mary knows that John left Prepositional phrase complements –Some verbs requires specific PP complements Mary gave the book to John –Others require any PP complement John put the book on the shelf/in the room/under the table

46 Adjective phrases Simple –Angry, easy, etc Complex –Pleased with the prize –Angry at the committee –Willing to read the book Complex AdjP normally do not precede nouns, they are used as complements of verbs such as be or seem

47 Adverbial phrases Indicators of –Degree –Location –Manner –The time of something (now, yesterday, etc) –Frequency –Duration Location in the sentence –Initial –Medial –Final

48 Grammars and parsing What is syntactic parsing –Determining the syntactic structure of a sentence Basic steps –Identify sentence boundaries –Identify what part of speech is each word –Identify syntactic relations

Context Free Grammar S -> NP VP NP -> det (adj) N NP -> Proper N NP -> N VP -> V, VP -> V PP VP -> V NP VP -> V NP PP, PP -> Prep NP VP -> V NP NP LING NLP 49

LING NLP 50 Parses V PP VP S NP the mat satcat on NP Prep The cat sat on the mat Det N N

LING NLP 51 Parses V PP VP S NP time an arrow flies like NP Prep Time flies like an arrow. N DetN

LING NLP 52 Parses VNP VP S NP flies like an N Det Time flies like an arrow. N time arrow N

LING NLP 53 Features C for Case, Subjective/Objective –She visited her. P for Person agreement, (1 st, 2 nd, 3 rd ) –I like him, You like him, He likes him, N for Number agreement, Subject/Verb –He likes him, They like him. G for Gender agreement, Subject/Verb –English, reflexive pronouns He washed himself. –Romance languages, det/noun T for Tense, –auxiliaries, sentential complements, etc. –* will finished is bad

LING NLP 54 Probabilistic Context Free Grammars Adding probabilities Often, lexicalizing the probabilities

A PCFG S -> NP VP (0.5) S -> ADVP NP VP (0.5) NP -> det (adj) N (0.7) NP -> Proper N (0.15) NP -> N (0.15) VP -> V, (0.1); VP -> V PP (0.1) VP -> V NP (0.4); VP -> V NP PP (0.4) PP -> Prep NP (1) CIS 8590 – Fall 2008 NLP 55

A Lexicalized PCFG Sample rules: S_give -> NP VP_give (1.0) NP_friend -> det (adj) N_friend (1.0) NP_Sally -> ProperN_Sally (1.0) VP_give -> V_give NP NP (0.3) VP_give -> V_give NP PP_to (0.7) CIS 8590 – Fall 2008 NLP 56

Parsing Computational task: Given a set of grammar rules and a sentence, find a valid parse of the sentence (efficiently) Naively, you could try all possible combinations of rules until you get to a parse tree that has “S” at the root, and the right words at the leaves. But that takes exponential time in the number of words. CIS 8590 – Fall 2008 NLP 57

CKY Parsing (aka, CYK) CKY parsing is a dynamic programming solution I bring it up now because dynamic programming shows up all the time in NLP Dynamic programming: simplifying a complicated problem by breaking it down into simpler subproblems in a recursive manner CIS 8590 – Fall 2008 NLP 58

CKY – Basic Idea Let the input be a string S consisting of n characters: a 1... a n. Let the grammar contain r nonterminal symbols R 1... R r. This grammar contains the subset R s which is the set of start symbols. Let P[n,n,r] be an array of booleans. Initialize all elements of P to false. At each step, the algorithm sets P[i,j,k] to be true if the subsequence of words (span) starting from i of length j can be generated from R k We will start with spans of length 1 (individual words), and then proceed to increasingly larger spans, and determining which ones are valid given the smaller spans that have already been processed. CIS 8590 – Fall 2008 NLP 59

CKY Algorithm For each i = 1 to n For each unit production R j -> a i, set P[i,1,j] = true. For each i = 2 to n -- Length of span For each j = 1 to n-i+1 -- Start of span For each k = 1 to i-1 -- Partition of span For each production R A -> R B R C If P[j,k,B] and P[j+k,i-k,C] then set P[j,i,A] = true If any of P[1,n,x] is true (x is iterated over the set s, where s are all the indices for R s ) Then S is member of language Else S is not member of language CIS 8590 – Fall 2008 NLP 60

CKY In Action tuebingen.de/student/martin.lazarov/demo s/cky.html CIS 8590 – Fall 2008 NLP 61

Finding the Best Parse With a PCFG (or lexicalized PCFG), it’s possible to score the trees to find the best (highest probability) parse Instead of a boolean array P, you would need to store weights (or probabilities) in the array; for the rest, the algorithm is almost identical. CIS 8590 – Fall 2008 NLP 62