 Christel Kemke 2007/08 COMP 4060 Natural Language Processing Introduction And Overview.

Slides:



Advertisements
Similar presentations
 Christel Kemke 2007/08 COMP 4060 Natural Language Processing Feature Structures and Unification.
Advertisements

Syntactic analysis using Context Free Grammars. Analysis of language Morphological analysis – Chairs, Part Of Speech (POS) tagging – The/DT man/NN left/VBD.
Language and Cognition Colombo, June 2011 Day 8 Aphasia: disorders of comprehension.
Natural Language Processing - Parsing 1 - Language, Syntax, Parsing Problems in Parsing Ambiguity, Attachment / Binding Bottom vs. Top Down Parsing.
Statistical NLP: Lecture 3
INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING NLP-AI IIIT-Hyderabad CIIL, Mysore ICON DECEMBER, 2003.
For Monday Read Chapter 23, sections 3-4 Homework –Chapter 23, exercises 1, 6, 14, 19 –Do them in order. Do NOT read ahead.
Natural Language Processing - Speech Processing -
Natural Language Processing - Feature Structures - Feature Structures and Unification.
Natural Language Processing Christel Kemke Department of Computer Science University of Manitoba Natural Language Processing, 1st term 2004/5.
1 Words and the Lexicon September 10th 2009 Lecture #3.
Artificial Intelligence 2004 Natural Language Processing - Syntax and Parsing - Language, Syntax, Parsing Problems in Parsing Ambiguity, Attachment.
 Christel Kemke 2007/08 COMP 4060 Natural Language Processing Feature Structures and Unification.
Artificial Intelligence Speech and Natural Language Processing.
NLP and Speech Course Review. Morphological Analyzer Lexicon Part-of-Speech (POS) Tagging Grammar Rules Parser thethe – determiner Det NP → Det.
Speech and Natural Language Processing Christel Kemke Department of Computer Science University of Manitobe Presentation for Human-Computer Interaction.
 Christel Kemke /08 COMP 4060 Natural Language Processing PARSING.
NLP and Speech 2004 Feature Structures Feature Structures and Unification.
Amirkabir University of Technology Computer Engineering Faculty AILAB Efficient Parsing Ahmad Abdollahzadeh Barfouroush Aban 1381 Natural Language Processing.
 Christel Kemke 2007/08 COMP 4060 Natural Language Processing Feature Structures and Unification.
Introduction to Syntax, with Part-of-Speech Tagging Owen Rambow September 17 & 19.
COMP 4060 Natural Language Processing Speech Processing.
Artificial Intelligence 2004 Natural Language Processing - Syntax and Parsing - Language Syntax Parsing.
Models of Generative Grammar Smriti Singh. Generative Grammar  A Generative Grammar is a set of formal rules that can generate an infinite set of sentences.
Lecture 1 Introduction: Linguistic Theory and Theories
1 Basic Parsing with Context Free Grammars Chapter 13 September/October 2012 Lecture 6.
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing INTRODUCTION Muhammed Al-Mulhem March 1, 2009.
Natural Language Understanding
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.
CCSB354 ARTIFICIAL INTELLIGENCE (AI)
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
1 Computational Linguistics Ling 200 Spring 2006.
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
Natural Language Processing Lecture 6 : Revision.
Introduction to CL & NLP CMSC April 1, 2003.
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.
Levels of Language 6 Levels of Language. Levels of Language Aspect of language are often referred to as 'language levels'. To look carefully at language.
인공지능 연구실 황명진 FSNLP Introduction. 2 The beginning Linguistic science 의 4 부분 –Cognitive side of how human acquire, produce, and understand.
For Wednesday Read chapter 23 Homework: –Chapter 22, exercises 1,4, 7, and 14.
Linguistic Essentials
CSA2050 Introduction to Computational Linguistics Lecture 1 Overview.
Parsing with Context-Free Grammars for ASR Julia Hirschberg CS 4706 Slides with contributions from Owen Rambow, Kathy McKeown, Dan Jurafsky and James Martin.
CPE 480 Natural Language Processing Lecture 4: Syntax Adapted from Owen Rambow’s slides for CSc Fall 2006.
Artificial Intelligence 2004 Speech & Natural Language Processing Speech Recognition acoustic signal as input conversion into written words Natural.
CSA2050 Introduction to Computational Linguistics Parsing I.
Natural Language - General
1 Context Free Grammars October Syntactic Grammaticality Doesn’t depend on Having heard the sentence before The sentence being true –I saw a unicorn.
CSE573 Autumn /23/98 Natural Language Processing Administrative –PS3 due today –PS4 out Wednesday, due Friday 3/13 (last day of class) special.
LING 001 Introduction to Linguistics Spring 2010 Syntactic parsing Part-Of-Speech tagging Apr. 5 Computational linguistics.
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
CS 4705 Lecture 7 Parsing with Context-Free Grammars.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Spring 2006-Lecture 2.
Levels of Linguistic Analysis
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Fall 2005-Lecture 3.
Natural Language Processing (NLP)
NATURAL LANGUAGE PROCESSING
MENTAL GRAMMAR Language and mind. First half of 20 th cent. – What the main goal of linguistics should be? Behaviorism – Bloomfield: goal of linguistics.
Tasneem Ghnaimat. Language Model An abstract representation of a (natural) language. An approximation to real language Assume we have a set of sentences,
10/31/00 1 Introduction to Cognitive Science Linguistics Component Topic: Formal Grammars: Generating and Parsing Lecturer: Dr Bodomo.
ARTIFICIAL NEURAL NETWORKS
Statistical NLP: Lecture 3
Basic Parsing with Context Free Grammars Chapter 13
Natural Language - General
Levels of Linguistic Analysis
Linguistic Essentials
Artificial Intelligence 2004 Speech & Natural Language Processing
Presentation transcript:

 Christel Kemke 2007/08 COMP 4060 Natural Language Processing Introduction And Overview

2007/08  Christel Kemke Evolution of Human Language communication for "work" social interaction basis of cognition and thinking (Whorff & Saphir)

2007/08  Christel Kemke Communication "Communication is the intentional exchange of information brought about by the production and perception of signs drawn from a shared system of conventional signs." [Russell & Norvig, p.651]

2007/08  Christel Kemke Natural Language - General Natural Language is characterized by  a common or shared set of signs alphabet; lexicon  a systematic procedure to produce combinations of signs syntax  a shared meaning of signs and combinations of signs (constructive) semantics

2007/08  Christel Kemke Natural Language Processing Overview Speech Recognition Natural Language Processing Syntax Semantics Pragmatics Spoken Language

2007/08  Christel Kemke Natural Language and Speech  Speech Recognition  acoustic signal as input  conversion into phonemes and written words  Natural Language Processing  written text as input; sentences (or 'utterances')  syntactic analysis: parsing; grammar  semantic analysis: "meaning", semantic representation  pragmatics: dialogue; discourse; metaphors  Spoken Language Processing  transcribed utterances  Phenomena of spontaneous speech

 Christel Kemke 2007/08 Words

2007/08  Christel Kemke Morphology A morphological analyzer determines (at least)  the stem + ending of a word, and usually delivers related information, like  the word class,  the number and  the person of the word. The morphology can be part of the lexicon or implemented as a single component, for example as a rule-based system. eats  eat + s verb, singular, 3rd person dog  dog noun, singular

2007/08  Christel Kemke Lexicon The Lexicon contains information on words, as  inflected forms (e.g. goes, eats) or  word-stems (e.g. go, eat). The Lexicon usually assigns a syntactic category, the word class or Part-of-Speech category Sometimes also further syntactic information (see Morphology); semantic information (e.g. semantic classifications like ‘ agent ’ ); syntactic-semantic information, e.g. on verb complements like ‘ give ’ requires a direct object.

2007/08  Christel Kemke Lexicon Example contents: eats  verb; singular, 3 rd person; can have direct object dog  dog, noun, singular; animal semantic annotation

2007/08  Christel Kemke POS (Part-of-Speech) Tagging POS Tagging determines word class or ‘part-of-speech’ category (basic syntactic categories) of single words or word-stems. Thedet (determiner) dog noun eat, eatsverb (3rd singular) the det bone noun

Syntax

2007/08  Christel Kemke NLP - Syntactic Analysis Morphological Analyzer Lexicon Part-of-Speech (POS) Tagging Grammar Rules Parser eat + s eat – verb Verb VP → Verb Noun VP recognized 3rd sing VP Verb Noun parse tree

2007/08  Christel Kemke Language and Grammar Natural Language described as Formal Language L using a Formal Grammar G: start-symbol S ≡ sentence non-terminals NT ≡ syntactic constituents terminals T ≡ lexical entries/ words production rules P ≡ grammar rules Generate sentences or recognize sentences (Parsing) of the language L through the application of grammar rules from G. Overgeneration / undergeneration: accept/generate sentences not in L / not all sentences from L.

2007/08  Christel Kemke Grammar Terminals can be words, part-of-speech categories, or more complex lexical items (including additional syntactic/semantic information related to the word). dog: noun, singular; animal Non-Terminals represent (higher level) ‘syntactic categories’. Noun, NP (noun phrase), S (sentence)

2007/08  Christel Kemke Grammar Most often we deal with Context-free Grammars, with a distinguished Start-symbol S (sentence). det  the noun  dog | bone verb  eat | eats NP  det noun(NP  noun phrase) VP  verb(VP  verb phrase) VP  verb NP S  NP VP(S  sentence) Here, POS Tagging is included in the grammar.

2007/08  Christel Kemke Parsing (here: LR, bottom-up) Determine the syntactic structure of the sentence: “the dog eats the bone” the  detPOS Tagging dog  noun det noun  NPRule application eats  verb the  det bone  noun det noun  NP verb NP  VP NP VP  S

2007/08  Christel Kemke Syntax Analysis / Parsing Syntactic Structure often represented as Parse Tree. Connect symbols according to applied grammar rules (like Rewrite Systems).

2007/08  Christel Kemke Parse Tree det noun NP verb NP VP NP VP S

2007/08  Christel Kemke Lexical Ambiguity Several word senses or word categories: e.g. chase – noun or verb e.g. plant – ????

2007/08  Christel Kemke Syntactic Ambiguity Several parse trees: 1) “The dog eats the bone in the park.” 2) “The dog eats the bone in the package.” Who/what is in the park and who/what is in the package? Syntactically speaking: How do I bind the Prepositional Phrase "in the..." ?

 Christel Kemke 2007/08 Semantics

2007/08  Christel Kemke Semantic Representation Represent the meaning of a sentence. Generate, e.g. a logic-based representation or a frame-based representation Fillmore’s case frames based on the syntactic structure, lexical entries, and particularly the head-verb, which determines how to arrange parts of the sentence and relate them to each other in the semantic representation.

2007/08  Christel Kemke Semantic Representation Verb-centered representation: Verb (action, head) is regarded as center of verbal expression and determines the case frame with possible case roles; other parts of the sentence are described in relation to the action as fillers of case slots. (cf. also Schank’s CD Theory) Typing of case roles is possible (e.g. 'agent' refers to a specific sort or concept, like “humans”)

2007/08  Christel Kemke Frame Representation Case Frames Verb-centered representation Verb (action, head) is regarded as center of verbal expression and determines the case frame with possible case roles; other parts of the sentence are described in relation to the action as fillers of case slots (roles). (cf. also Schank’s CD Theory) Typing of case roles is possible (e.g. 'agent' refers to a specific sort or concept, like “humans”)

2007/08  Christel Kemke General Frame for eat Agent: animate Action: eat Patiens: food Manner: {e.g. fast} Location: {e.g. in the yard} Time: {e.g. at noon}

2007/08  Christel Kemke Frame with Fillers Agent: the dog Action: eat Patiens: the bone / the bone in the package Location: in the park

 Christel Kemke General Frame for driveFrame with fillers Agent: animate Agent: she Action: drive Action: drives Patiens: vehicle Patiens: the convertible Manner: {how} Manner: fast Location: Loc-spec Location: [in the] Rocky Mountains Source: Loc-spec Source: [from] home Destination: Loc-spec Destination: [to the] ASIC conference Time: Time-spec Time: [in] August

 Christel Kemke 2007/08 Pragmatics

2007/08  Christel Kemke Pragmatics Pragmatics includes context-related aspects of NL expressions (utterances). These are in particular anaphoric references, elliptic expressions, deictic expressions, … anaphoric references – refer to items mentioned earlier deictic expressions – simulate pointing gestures elliptic expressions – incomplete expressions; have to be completed with reference to item mentioned earlier

2007/08  Christel Kemke Pragmatics “I put the box on the top shelve.” “I know that. But I can’t find it there.” deictic expression anaphoric reference

2007/08  Christel Kemke Pragmatics “I put the box on the top shelve.” “I know that. But I can’t find it there.” anaphoric reference

2007/08  Christel Kemke Pragmatics “I put the box on the top shelve.” “I know that. But I can’t find it there.” deictic expression

2007/08  Christel Kemke Pragmatics “I put the box on the top shelve.” “I know that. But I can’t find it there.” “The candy-box?” elliptic expression deictic expression anaphoric reference

2007/08  Christel Kemke Pragmatics “I know that. But I can’t find it there.” “The candy-box?” elliptic expression

2007/08  Christel Kemke Intentions One philosophical assumption is that natural language is used to achieve something: “Do things with words.” The meaning of an utterance is essentially determined by the intention of the speaker.

2007/08  Christel Kemke Intentionality - Examples What was said: “There is a terrible draft here.” “How does it look here?” "Will this ever end?" What was meant: "Can you please close the window." "I am really mad; clean up your room." "I would prefer to be with my friends than to sit in class now."

2007/08  Christel Kemke Metaphors The meaning of a sentence or expression is not directly inferable from the sentence structure and the word meanings. Metaphors transfer concepts and relations from one area of discourse into another area. For example, seeing time as a line (in space) or seeing life as a journey.

2007/08  Christel Kemke Metaphors - Examples “This car eats a lot of gas.” “She devoured the book.” “He was tied up with his clients.” “Marriage is like a journey.” “Their marriage was a one-way road into hell.” see George Lakoff, Women, Fire and Dangerous Things

 Christel Kemke 2007/08 Dialogue and Discourse

2007/08  Christel Kemke Discourse / Dialogue Structure Grammar for various sentence types (speech acts) = dialogue, discourse, story grammar Distinguish e.g. questions, commands, and statements:  Where is the remote-control?  Bring the remote-control!  The remote-control is on the brown table. Dialogue Grammars describe possible sequences of speech acts in communication, e.g. that a question is followed by an answer/statement. Similar for Discourse (like continuous texts).

 Christel Kemke 2007/08 Spoken Language Interfaces

2007/08  Christel Kemke

2007/08 Speech

2007/08  Christel Kemke Speech Processing Systems Types and Characteristics  Speech Recognition vs. Speaker Recognition (Voice Recognition; Speaker Identification )  speaker-dependent vs. speaker-independent  training?  unlimited vs. large vs. small vocabulary  single word vs. continuous speech

2007/08  Christel Kemke Speech Recognition Phases  acoustic signal as input  signal analysis - spectrogram  feature extraction  phoneme recognition  word recognition  conversion into written words

 Christel Kemke Video of glottis and speech signal in lingWAVES (

2007/08  Christel Kemke Speech Signal Analysis Analog-Digital Conversion of Acoustic Signal Sampling in Time Frames ( “ windows ” )  frequency = 0-crossings per time frame  e.g. 2 crossings/second is 1 Hz (1 wave)  e.g. 10kHz needs sampling rate 20kHz  measure amplitudes of signal in time frame  digitized wave form  separate different frequency components  FFT (Fast Fourier Transform)  spectrogram  other frequency based representations  LPC (linear predictive coding),  Cepstrum

2007/08  Christel Kemke Waveform and Spectrogram

2007/08  Christel Kemke Phoneme Recognition Recognition Process based on features extracted from spectral analysis phonological rules statistical properties of language/ pronunciation Recognition Methods Hidden Markov Models Neural Networks Pattern Classification in general

 Christel Kemke Formants

2007/08  Christel Kemke Phoneme Recognition Recognition Process based on features extracted from spectral analysis phonological rules statistical properties of language/ pronunciation Recognition Methods Hidden Markov Models Neural Networks Pattern Classification in general

 Christel Kemke Pronunciation Networks / Word Models as Probabilistic FAs (HMMs)

 Christel Kemke Speech Recognizer Architecture

 Christel Kemke 2007/08 Spoken Language

2007/08  Christel Kemke Spoken Language  Output of Speech Recognition System as input "text".  Can be associated with probabilities for different word sequences.  Contains ungrammatical structures, so-called "disfluencies", e.g. repetitions and corrections.

2007/08  Christel Kemke Spoken Language - Examples 1. no [s-] straight southwest 2. right to [my] my left 3. [that is] that is correct Robin J. Lickley. HCRC Disfluency Coding Manual.

2007/08  Christel Kemke Spoken Language - Disfluency Reparandum and Repair Reparandum Repair [come to]... walk right to [the]... the right-hand side of the page

2007/08  Christel Kemke Spoken Language - Example 1. we're going to [g-- ]... turn straight back around for testing. 2. [come to]... walk right to the... right-hand side of the page. 3. right [up... past]... up on the left of the... white mountain walk... right up past. 4. [i'm still]... i've still gone halfway back round the lake again.

2007/08  Christel Kemke Spoken Language - Example 1. [I’d] [d if] I need to go 2. [it’s basi--] see if you go over the old mill 3. [you are going] make a gradual slope … to your right 4. [I’ve got one] I don’t realize why it is there

 Christel Kemke