CMSC 723 / LING 645: Intro to Computational Linguistics January 28, 2004 Lecture 1 (Dorr): Overview, History, Goals, Problems, Techniques; Intro to MT.

Slides:



Advertisements
Similar presentations
Dr. Radhika Mamidi ENG 270 Lecture 2. History: ’s Major influences on the development of CL -Development of formal language theory (Chomsky,
Advertisements

Introduction to Computational Linguistics
Computational language: week 10 Lexical Knowledge Representation concluded Syntax-based computational language Sentence structure: syntax Context free.
Introduction to Computational Linguistics Dr. Radhika Mamidi ENG 270 Lecture 2.
Introduction to Computational Linguistics
Semantics (Representing Meaning)
Statistical NLP: Lecture 3
INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING NLP-AI IIIT-Hyderabad CIIL, Mysore ICON DECEMBER, 2003.
LING NLP 1 Introduction to Computational Linguistics Martha Palmer April 19, 2006.
For Monday Read Chapter 23, sections 3-4 Homework –Chapter 23, exercises 1, 6, 14, 19 –Do them in order. Do NOT read ahead.
Natural Language and Speech Processing Creation of computational models of the understanding and the generation of natural language. Different fields coming.
NLP and Speech Course Review. Morphological Analyzer Lexicon Part-of-Speech (POS) Tagging Grammar Rules Parser thethe – determiner Det NP → Det.
C SC 620 Advanced Topics in Natural Language Processing Lecture 20 4/8.
Introduction to Computational Linguistics Lecture 2.
1/7 INFO60021 Natural Language Processing Harold Somers Professor of Language Engineering.
By Rohana Mahmud (NLP week 1-2)
CMSC 723 / LING 645: Intro to Computational Linguistics September 8, 2004: Dorr MT (continued), MT Evaluation Prof. Bonnie J. Dorr Dr. Christof Monz TA:
Introduction to CL Session 1: 7/08/2011. What is computational linguistics? Processing natural language text by computers  for practical applications.
PSY 369: Psycholinguistics Some basic linguistic theory part3.
Natural Language Query Interface Mostafa Karkache & Bryce Wenninger.
CMSC 723 / LING 645: Intro to Computational Linguistics September 1, 2004: Dorr Overview, History, Goals, Problems, Techniques; Intro to MT (J&M 1, 21)
تمرين شماره 1 درس NLP سيلابس درس NLP در دانشگاه هاي ديگر ___________________________ راحله مکي استاد درس: دکتر عبدالله زاده پاييز 85.
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing INTRODUCTION Muhammed Al-Mulhem March 1, 2009.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
What is Natural Language Processing (NLP)
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
Lecture 2, 7/22/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 2 22 July 2005.
9/8/20151 Natural Language Processing Lecture Notes 1.
Machine Translation Dr. Radhika Mamidi. What is Machine Translation? A sub-field of computational linguistics It investigates the use of computer software.
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
Arthur Chan Prepared for Advanced MT Seminar
1 Computational Linguistics Ling 200 Spring 2006.
Natural Language Processing Introduction. 2 Natural Language Processing We’re going to study what goes into getting computers to perform useful and interesting.
Natural Language Processing Rogelio Dávila Pérez Profesor – Investigador
Postgraduate Diploma in Translation Lecture 1 Computers and Language.
Natural Language Processing Artificial Intelligence CMSC February 28, 2002.
Introduction to CL & NLP CMSC April 1, 2003.
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
For Wednesday Read chapter 23 Homework: –Chapter 22, exercises 1,4, 7, and 14.
October 2005CSA3180 NLP1 CSA3180 Natural Language Processing Introduction and Course Overview.
CSA2050 Introduction to Computational Linguistics Lecture 1 Overview.
What you have learned and how you can use it : Grammars and Lexicons Parts I-III.
CSA2050 Introduction to Computational Linguistics Lecture 1 What is Computational Linguistics?
Rules, Movement, Ambiguity
Artificial Intelligence: Natural Language
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
CSE467/567 Computational Linguistics Carl Alphonce Computer Science & Engineering University at Buffalo.
Natural Language Processing Chapter 1 : Introduction.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 1 (03/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Introduction to Natural.
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Spring 2006-Lecture 2.
SYNTAX.
1 An Introduction to Computational Linguistics Mohammad Bahrani.
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
NATURAL LANGUAGE PROCESSING
Introduction to Deep Processing Techniques for NLP Deep Processing Techniques for NLP Ling 571 January 4, 2016 Gina-Anne Levow.
MENTAL GRAMMAR Language and mind. First half of 20 th cent. – What the main goal of linguistics should be? Behaviorism – Bloomfield: goal of linguistics.
King Faisal University جامعة الملك فيصل Deanship of E-Learning and Distance Education عمادة التعلم الإلكتروني والتعليم عن بعد [ ] 1 جامعة الملك فيصل عمادة.
Statistical NLP: Lecture 3
SYNTAX.
Machine Translation Nov 8, 2006
Natural Language Processing
CS246: Information Retrieval
Artificial Intelligence 2004 Speech & Natural Language Processing
Information Retrieval
Presentation transcript:

CMSC 723 / LING 645: Intro to Computational Linguistics January 28, 2004 Lecture 1 (Dorr): Overview, History, Goals, Problems, Techniques; Intro to MT Prof. Bonnie J. Dorr Dr. Nizar Habash TA: TBA

Administrivia IMPORTANT: For Today: Chapters 1 and 21 For Next Time: Chapter 2

CL vs NLP Why “Computational Linguistics (CL)” rather than “Natural Language Processing” (NLP)? Computational Linguistics — Computers dealing with language — Modeling what people do Natural Language Processing —Applications on the computer side

Relation of CL to Other Disciplines Artificial Intelligence (AI) (notions of rep, search, etc.) Machine Learning (particularly, probabilistic or statistic ML techniques) CL Linguistics (Syntax, Semantics, etc.) Psychology Electrical Engineering (EE) (Optical Character Recognition) Philosophy of Language, Formal Logic Information Retrieval Theory of Computation Human Computer Interaction (HCI)

A Sampling of “Other Disciplines”  Linguistics: formal grammars, abstract characterization of what is to be learned.  Computer Science: algorithms for efficient learning or online deployment of these systems in automata.  Engineering: stochastic techniques for characterizing regular patterns for learning and ambiguity resolution.  Psychology: Insights into what linguistic constructions are easy or difficult for people to learn or to use

History: ’s  Development of formal language theory (Chomsky, Kleene, Backus). –Formal characterization of classes of grammar (context-free, regular) –Association with relevant automata  Probability theory: language understanding as decoding through noisy channel (Shannon) –Use of information theoretic concepts like entropy to measure success of language models.

Symbolic vs. Stochastic  Symbolic –Use of formal grammars as basis for natural language processing and learning systems. (Chomsky, Harris) –Use of logic and logic based programming for characterizing syntactic or semantic inference (Kaplan, Kay, Pereira) –First toy natural language understanding and generation systems (Woods, Minsky, Schank, Winograd, Colmerauer) –Discourse Processing: Role of Intention, Focus (Grosz, Sidner, Hobbs)  Stochastic Modeling –Probabilistic methods for early speech recognition, OCR (Bledsoe and Browning, Jelinek, Black, Mercer)

: Return of Empiricism  Use of stochastic techniques for part of speech tagging, parsing, word sense disambiguation, etc.  Comparison of stochastic, symbolic, more or less powerful models for language understanding and learning tasks.

1993-Present  Advances in software and hardware create NLP needs for information retrieval (web), machine translation, spelling and grammar checking, speech recognition and synthesis.  Stochastic and symbolic methods combine for real world applications.

Language and Intelligence: Turing Test  Turing test: –machine, human, and human judge  Judge asks questions of computer and human. –Machine’s job is to act like a human, human’s job is to convince judge that he’s not the machine. –Machine judged “intelligent” if it can fool judge.  Judgement of “intelligence” linked to appropriate answers to questions from the system.

ELIZA  Remarkably simple “Rogerian Psychologist”  Uses Pattern Matching to carry on limited form of conversation.  Seems to “Pass the Turing Test!” (McCorduck, 1979, pp )  Eliza Demo:

What’s involved in an “intelligent” Answer? Analysis: Decomposition of the signal (spoken or written) eventually into meaningful units. This involves …

Speech/Character Recognition  Decomposition into words, segmentation of words into appropriate phones or letters  Requires knowledge of phonological patterns: –I’m enormously proud. –I mean to make you proud.

Morphological Analysis  Inflectional –duck + s = [N duck] + [plural s] –duck + s = [V duck] + [3rd person s]  Derivational –kind, kindness  Spelling changes –drop, dropping –hide, hiding

Syntactic Analysis  Associate constituent structure with string  Prepare for semantic interpretation S NP VP I V NP watched det N the terrapin OR: watch Subject Object I terrapin Det the

Semantics  A way of representing meaning  Abstracts away from syntactic structure  Example: –First-Order Logic: watch(I,terrapin) –Can be: “I watched the terrapin” or “The terrapin was watched by me”  Real language is complex: –Who did I watch?

Lexical Semantics The Terrapin, is who I watched. Watch the Terrapin is what I do best. *Terrapin is what I watched the I= experiencer Watch the Terrapin = predicate The Terrapin = patient

Compositional Semantics  Association of parts of a proposition with semantic roles  Scoping Experiencer Predicate: Be (perc) I (1st pers, sg) pred patient saw the Terrapin Proposition

Word-Governed Semantics  Any verb can add “able” to form an adjective. –I taught the class. The class is teachable –I rejected the idea. The idea is rejectable.  Association of particular words with specific semantic forms. –John (masculine) –The boys ( masculine, plural, human)

Pragmatics  Real world knowledge, speaker intention, goal of utterance.  Related to sociology.  Example 1: –Could you turn in your assignments now (command) –Could you finish the homework? (question, command)  Example 2: –I couldn’t decide how to catch the crook. Then I decided to spy on the crook with binoculars. –To my surprise, I found out he had them too. Then I knew to just follow the crook with binoculars. [ the crook [with binoculars]] [ the crook] [ with binoculars]

Discourse Analysis  Discourse: How propositions fit together in a conversation—multi-sentence processing. –Pronoun reference: The professor told the student to finish the assignment. He was pretty aggravated at how long it was taking to pass it in. –Multiple reference to same entity: George W. Bush, president of the U.S. –Relation between sentences: John hit the man. He had stolen his bicycle

NLP Pipeline Phonetic Analysis Morphological analysis OCR/Tokenization Syntactic analysis Semantic Interpretation Discourse Processing speechtext

Relation to Machine Translation Morphological analysis Syntactic analysis Semantic Interpretation Interlingua inputanalysisgeneration Morphological synthesis Syntactic realization Lexical selection output

Ambiguity I made her duck I made duckling for her I made the duckling belonging to her I created the duck she owns I forced her to lower her head By magic, I changed her into a duck

S S NP VP NP VP I V NP VP I V NP made her V made det N duck her duck Syntactic Disambiguation  Structural ambiguity:

Part of Speech Tagging and Word Sense Disambiguation  [verb Duck ] ! [noun Duck] is delicious for dinner  I went to the bank to deposit my check. I went to the bank to look out at the river. I went to the bank of windows and chose the one dealing with last names beginning with “d”.

Resources for NLP Systems Dictionary Morphology and Spelling Rules Grammar Rules Semantic Interpretation Rules Discourse Interpretation Natural Language processing involves (1) learning or fashioning the rules for each component, (2) embedding the rules in the relevant automaton, (3) and using the automaton to efficiently process the input.

Some NLP Applications  Machine Translation—Babelfish (Alta Vista):  Question Answering—Ask Jeeves (Ask Jeeves):  Language Summarization—MEAD (U. Michigan):  Spoken Language Recognition— EduSpeak (SRI):  Automatic Essay evaluation—E-Rater (ETS):  Information Retrieval and Extraction—NetOwl (SRA):

What is MT?  Definition: Translation from one natural language to another by means of a computerized system  Early failures  Later: varying degrees of success

An Old Example The spirit is willing but the flesh is weak The vodka is good but the meat is rotten

Machine Translation History  1950’s: Intensive research activity in MT  1960’s: Direct word-for-word replacement  1966 (ALPAC): NRC Report on MT  Conclusion: MT no longer worthy of serious scientific investigation.  : `Recovery period’  : Resurgence (Europe, Japan)  1985-present: Resurgence (US)

What happened between ALPAC and Now?  Need for MT and other NLP applications confirmed  Change in expectations  Computers have become faster, more powerful  WWW  Political state of the world  Maturation of Linguistics  Development of hybrid statistical/symbolic approaches

Three MT Approaches: Direct, Transfer, Interlingual Interlingua Semantic Structure Semantic Structure Syntactic Structure Syntactic Structure Word Structure Word Structure Source Text Target Text Semantic Composition Semantic Decomposition Semantic Analysis Semantic Generation Syntactic Analysis Syntactic Generation Morphological Analysis Morphological Generation Semantic Transfer Syntactic Transfer Direct

Examples of Three Approaches  Direct: –I checked his answers against those of the teacher → Yo comparé sus respuestas a las de la profesora –Rule: [check X against Y] → [comparar X a Y]  Transfer: –Ich habe ihn gesehen → I have seen him –Rule: [clause agt aux obj pred] → [clause agt aux pred obj]  Interlingual: –I like Mary → Mary me gusta a m í –Rep: [Be Ident (I [AT Ident (I, Mary)] Like+ingly)]

MT Systems:  Direct: GAT [Georgetown, 1964], TAUM-METEO [Colmerauer et al. 1971]  Transfer: GETA/ARIANE [Boitet, 1978] LMT [McCord, 1989], METAL [Thurmair, 1990], MiMo [Arnold & Sadler, 1990], …  Interlingual: MOPTRANS [Schank, 1974], KBMT [Nirenburg et al, 1992], UNITRAN [Dorr, 1990]

Statistical MT and Hybrid Symbolic/Stats MT: 1990-Present Candide [Brown, 1990, 1992]; Halo/Nitrogen [Langkilde and Knight, 1998], [Yamada and Knight, 2002]; GHMT [Dorr and Habash, 2002]; DUSTer [Dorr et al. 2002]

Direct MT: Pros and Cons  Pros –Fast –Simple –Inexpensive  Cons –Unreliable –Not powerful –Rule proliferation –Requires too much context –Major restructuring after lexical substitution

Transfer MT: Pros and Cons  Pros –Don’t need to find language-neutral rep –No translation rules hidden in lexicon –Relatively fast  Cons –N 2 sets of transfer rules: Difficult to extend –Proliferation of language-specific rules in lexicon and syntax –Cross-language generalizations lost

Interlingual MT: Pros and Cons  Pros –Portable (avoids N 2 problem) –Lexical rules and structural transformations stated more simply on normalized representation –Explanatory Adequacy  Cons –Difficult to deal with terms on primitive level: universals? –Must decompose and reassemble concepts –Useful information lost (paraphrase)

Approximate IL Approach  Tap into richness of TL resources  Use some, but not all, components of IL representation  Generate multiple sentences that are statistically pared down

Approximating IL: Handling Divergences  Primitives  Semantic Relations  Lexical Information

Interlingual vs. Approximate IL  Interlingual MT: –primitives & relations –bi-directional lexicons –analysis: compose IL –generation: decompose IL  Approximate IL –hybrid symbolic/statistical design –overgeneration with statistical ranking –uses dependency rep input and structural expansion for “deeper” overgeneration

Mapping from Input Dependency to English Dependency Tree Knowledge Resources in English only: (LVD; Dorr, 2001). Goal GIVE V MARY KICK N JOHN ThemeAgent [CAUSE GO] Goal KICK V MARY JOHN Agent [CAUSE GO] Mary le dio patadas a John → Mary kicked John

Statistical Extraction Mary kicked John. [ ] Mary gave a kick at John. [ ] Mary gave the kick at John. [ ] Mary gave an kick at John. [ ] Mary gave a kick by John. [ ] Mary gave a kick to John. [ ] Mary gave a kick into John. [ ] Mary gave a kick through John. [ ] Mary gave a foot wound by John. [ ] Mary gave John a foot wound. [ ]

Benefits of Approximate IL Approach  Explaining behaviors that appear to be statistical in nature  “Re-sourceability”: Re-use of already existing components for MT from new languages.  Application to monolingual alternations

What Resources are Required?  Deep TL resources  Requires SL parser and tralex  TL resources are richer: LVD representations, CatVar database  Constrained overgeneration

MT Challenges: Ambiguity  Syntactic Ambiguity I saw the man on the hill with the telescope  Lexical Ambiguity E: book S: libro, reservar  Semantic Ambiguity –Homography: ball(E) = pelota, baile(S) –Polysemy: kill(E), matar, acabar (S) –Semantic granularity esperar(S) = wait, expect, hope (E) be(E) = ser, estar(S) fish(E) = pez, pescado(S)

How do we evaluate MT?  Human-based Metrics –Semantic Invariance –Pragmatic Invariance –Lexical Invariance –Structural Invariance –Spatial Invariance –Fluency –Accuracy –“Do you get it?”  Automatic Metrics: Bleu

BiLingual Evaluation Understudy (BLEU —Papineni, 2001)  Automatic Technique, but ….  Requires the pre-existence of Human (Reference) Translations  Approach: –Produce corpus of high-quality human translations –Judge “closeness” numerically (word-error rate) –Compare n-gram matches between candidate translation and 1 or more reference translations

Bleu Comparison Chinese-English Translation Example: Candidate 1: It is a guide to action which ensures that the military always obeys the commands of the party. Candidate 2: It is to insure the troops forever hearing the activity guidebook that party direct. Reference 1: It is a guide to action that ensures that the military will forever heed Party commands. Reference 2: It is the guiding principle which guarantees the military forces always being under the command of the Party. Reference 3: It is the practical guide for the army always to heed the directions of the party.

How Do We Compute Bleu Scores?  Key Idea: A reference word should be considered exhausted after a matching candidate word is identified. For each word compute: (1) candidate word count (2) maximum ref count Add counts for each candidate word using the lower of the two numbers. Divide by number of candidate words..

Modified Unigram Precision: Candidate #1 Reference 1: It is a guide to action that ensures that the military will forever heed Party commands. Reference 2: It is the guiding principle which guarantees the military forces always being under the command of the Party. Reference 3: It is the practical guide for the army always to heed the directions of the party. It(1) is(1) a(1) guide(1) to(1) action(1) which(1) ensures(1) that(2) the(4) military(1) always(1) obeys(0) the commands(1) of(1) the party(1) What’s the answer?????? 17/1 8

Modified Unigram Precision: Candidate #2 It(1) is(1) to(1) insure(0) the(4) troops(0) forever(1) hearing(0) the activity(0) guidebook(0) that(2) party(1) direct(0) What’s the answer?????? 8/1 4 Reference 1: It is a guide to action that ensures that the military will forever heed Party commands. Reference 2: It is the guiding principle which guarantees the military forces always being under the command of the Party. Reference 3: It is the practical guide for the army always to heed the directions of the party.

Modified Bigram Precision: Candidate #1 It is(1) is a(1) a guide(1) guide to(1) to action(1) action which(0) which ensures(0) ensures that(1) that the(1) the military(1) military always(0) always obeys(0) obeys the(0) the commands(0) commands of(0) of the(1) the party(1) What’s the answer?????? 10/1 7 Reference 1: It is a guide to action that ensures that the military will forever heed Party commands. Reference 2: It is the guiding principle which guarantees the military forces always being under the command of the Party. Reference 3: It is the practical guide for the army always to heed the directions of the party.

Modified Bigram Precision: Candidate #2 Reference 1: It is a guide to action that ensures that the military will forever heed Party commands. Reference 2: It is the guiding principle which guarantees the military forces always being under the command of the Party. Reference 3: It is the practical guide for the army always to heed the directions of the party. It is(1) is to(0) to insure(0) insure the(0) the troops(0) troops forever(0) forever hearing(0) hearing the(0) the activity(0) activity guidebook(0) guidebook that(0) that party(0) party direct(0) What’s the answer?????? 1/1 3

Catching Cheaters Reference 1: The cat is on the mat Reference 2: There is a cat on the mat the(2) the the the(0) the(0) the(0) the(0) What’s the unigram answer? 2/7 What’s the bigram answer?0/7