Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and.

Slides:



Advertisements
Similar presentations
Computational language: week 10 Lexical Knowledge Representation concluded Syntax-based computational language Sentence structure: syntax Context free.
Advertisements

CHAPTER 2 GC101 Program’s algorithm 1. COMMUNICATING WITH A COMPUTER  Programming languages bridge the gap between human thought processes and computer.
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Chapter 3 Data Modeling Copyright © 2014 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent.
Probabilistic Language Processing Chapter 23. Probabilistic Language Models Goal -- define probability distribution over set of strings Unigram, bigram,
C SC 620 Advanced Topics in Natural Language Processing Lecture 22 4/15.
The quest for meaning in language documentation Felix Ameka.
Machine Translation (Level 2) Anna Sågvall Hein GSLT Course, September 2004.
Natural Language and Speech Processing Creation of computational models of the understanding and the generation of natural language. Different fields coming.
EEE377 Lecture Notes1 EEE436 DIGITAL COMMUNICATION Coding En. Mohd Nazri Mahmud MPhil (Cambridge, UK) BEng (Essex, UK) Room 2.14.
C SC 620 Advanced Topics in Natural Language Processing Lecture 20 4/8.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
Language Specfication and Implementation - PART II: Semantics of Procedural Programming Languages Lee McCluskey Department of Computing and Mathematical.
Slides prepared by Rose Williams, Binghamton University Chapter 1 Getting Started 1.1 Introduction to Java.
CS 4705 Lecture 13 Corpus Linguistics I. From Knowledge-Based to Corpus-Based Linguistics A Paradigm Shift begins in the 1980s –Seeds planted in the 1950s.
Gobalisation Week 8 Text processes part 2 Spelling dictionaries Noisy channel model Candidate strings Prior probability and likelihood Lab session: practising.
Machine Translation Prof. Alexandros Potamianos Dept. of Electrical & Computer Engineering Technical University of Crete, Greece May 2003.
PSY 369: Psycholinguistics Some basic linguistic theory part3.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.
Computational Language Andrew Hippisley. Computational Language Computational language and AI Language engineering: applied computational language Case.
1 Statistical NLP: Lecture 13 Statistical Alignment and Machine Translation.
1.3 Executing Programs. How is Computer Code Transformed into an Executable? Interpreters Compilers Hybrid systems.
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing INTRODUCTION Muhammed Al-Mulhem March 1, 2009.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Natural Language Processing Expectation Maximization.
9/8/20151 Natural Language Processing Lecture Notes 1.
Machine Translation Dr. Radhika Mamidi. What is Machine Translation? A sub-field of computational linguistics It investigates the use of computer software.
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
Automatic Lexical Annotation Applied to the SCARLET Ontology Matcher Laura Po and Sonia Bergamaschi DII, University of Modena and Reggio Emilia, Italy.
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
A Z Approach in Validating ORA-SS Data Models Scott Uk-Jin Lee Jing Sun Gillian Dobbie Yuan Fang Li.
The Impact of Grammar Enhancement on Semantic Resources Induction Luca Dini Giampaolo Mazzini
Globalisation and machine translation Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and First Order Predicate.
1 Computational Linguistics Ling 200 Spring 2006.
Can Controlled Language Rules increase the value of MT? Fred Hollowood & Johann Rotourier Symantec Dublin.
Some Probability Theory and Computational models A short overview.
PETRA – the Personal Embedded Translation and Reading Assistant Werner Winiwarter University of Vienna InSTIL/ICALL Symposium 2004 June 17-19, 2004.
Introduction to CL & NLP CMSC April 1, 2003.
Using Short-Answer Format Questions for an English Grammar Tutoring System Conceptualization & Research Planning Jonggun Gim.
Computing Science, University of Aberdeen1 CS4025: Logic-Based Semantics l Compositionality in practice l Producing logic-based meaning representations.
October 2005CSA3180 NLP1 CSA3180 Natural Language Processing Introduction and Course Overview.
CSA2050 Introduction to Computational Linguistics Lecture 1 What is Computational Linguistics?
Machine Translation (Level 2) Anna Sågvall Hein GSLT Course, January 2003.
Programming Languages and Design Lecture 3 Semantic Specifications of Programming Languages Instructor: Li Ma Department of Computer Science Texas Southern.
The Functions and Purposes of Translators Syntax (& Semantic) Analysis.
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
CSE467/567 Computational Linguistics Carl Alphonce Computer Science & Engineering University at Buffalo.
1 / 48 Formal a Language Theory and Describing Semantics Principles of Programming Languages 4.
Jan 2005CSA4050 Machine Translation II1 CSA4050: Advanced Techniques in NLP Machine Translation II Direct MT Transfer MT Interlingual MT.
Spring, 2005 CSE391 – Lecture 1 1 Introduction to Artificial Intelligence Martha Palmer CSE391 Spring, 2005.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Natural Language Processing Group Computer Sc. & Engg. Department JADAVPUR UNIVERSITY KOLKATA – , INDIA. Professor Sivaji Bandyopadhyay
Natural Language Processing (NLP)
ARTIFICIAL INTELLIGENCE [INTELLIGENT AGENTS PARADIGM] Professor Janis Grundspenkis Riga Technical University Faculty of Computer Science and Information.
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
End of the beginning Let’s wrap up some details and be sure we are all on the same page Good way to make friends and be popular.
23.3 Information Extraction More complicated than an IR (Information Retrieval) system. Requires a limited notion of syntax and semantics.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
Of 24 lecture 11: ontology – mediation, merging & aligning.
Jan 2012MT Architectures1 Human Language Technology Machine Translation Architectures Direct MT Transfer MT Interlingual MT.
Machine Translation (MT) History, Theory, Problems and Usage.
1 Sections 3.1 – 3.2a Basic Syntax and Semantics Fundamentals of Java: AP Computer Science Essentials, 4th Edition Lambert / Osborne.
Knowledge Representation Techniques
PRESENTED BY: PEAR A BHUIYAN
Basic Parsing with Context Free Grammars Chapter 13
Ch. 7 Programming Languages
Statistical Machine Translation
Presentation transcript:

Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and First Order Predicate Calculus Human involvement Historical note

Spelling dictionaries Implementing spelling identification and correction algorithm

Spelling dictionaries Implementing spelling identification and correction algorithm STAGE 1: compare each string in document with a list of legal strings; if no corresponding string in list mark as misspelled STAGE 2: generate list of candidates Apply any single transformation to the typo string Filter the list by checking against a dictionary STAGE 3: assign probability values to each candidate in the list STAGE 4: select best candidate

Spelling dictionaries STAGE 3 prior probability given all the words in English, is this candidate more likely to be what the typist meant than that candidate? P(c) = c/N where N is the number of words in a corpus likelihood Given, the possible errors, or transformation, how likely is it that error y has operated on candidate x to produce the typo? P(t/c), calculated using a corpus of errors, or transformations Bayesian rule: get the product of the prior probability and the likelihood P(c) X P(t/c)

Spelling dictionaries non-word errors Implementing spelling identification and correction algorithm STAGE 1: identify misspelled words STAGE 2: generate list of candidates STAGE 3a: rank candidates for probability STAGE 3b: select best candidate Implement: noisy channel model Bayesian Rule

Resoucres for Globalisation: Machine translation

The ‘decoding’ paradigm Assumes one-to-one relation between source symbol and target symbol

Resoucres for Globalisation: Machine translation The ‘decoding’ paradigm Assumes one-to-one relation between source symbol and target symbol one-to-many (homonymy)

Resoucres for Globalisation: Machine translation The ‘decoding’ paradigm Assumes one-to-one relation between source symbol and target symbol one-to-many (homonymy) one-to-many (hypernym → hyponyms):

Resoucres for Globalisation: Machine translation The ‘decoding’ paradigm Assumes one-to-one relation between source symbol and target symbol one-to-many (homonymy) one-to-many (hypernym → hyponyms): many-to-one (hyponyms → hypernym)

Machine translation The ‘decoding’ paradigm one-to-many (homonymy) bank → Ufer, Bank (German)

Machine translation The ‘decoding’ paradigm one-to-many (homonymy) one-to-many (hypernym → hyponyms): brother → otooto, oniisan (Japanese) blue → синий, голубой (Russian) many-to-one (hyponyms → hypernym)

Machine translation The ‘decoding’ paradigm one-to-many (homonymy) one-to-many (hypernym → hyponyms): many-to-one (hyponyms → hypernym) hill, mountain → Berg (German) learn, teach → leren (Dutch)

Machine translation and globalisation Ambiguity ‘I made her duck’ “The possibility of interpreting an expression in two or more distinct ways” Collins English Dictionary

Machine translation Ambiguity Challenge of the translation depends on the level of ambiguity that arises This depends on the closeness of the source and target languages w.r.t. the following: vocabulary homonyms grammar structural ambiguity conceptual structure specificity ambiguity lexical gaps

Machine translation Pragmatic approach

Machine translation Pragmatic approach aim for a rough translation, ‘gist’ translation Used for multi-lingual information retrieval

Machine translation Pragmatic approach aim for a rough translation, ‘gist’ translation Used for multi-lingual information retrieval involve human translators in the process: computer-aided translation

Machine translation Translation models Transfer model ‘the dog bit my friend’ Hindi: kutte-ne mere dost ko-kata dog my friend bit

Machine translation Translation models Transfer model Alter grammatical structure of source language to make it adhere to the grammatical structure of target language Use transformation rule Analysis process (source) Transfer process (‘bridge’) Generation process (target) Problem: each source-target pair will need it own unique set of transformation rules

Machine translation Translation models Inter-lingua model Extract the meaning from the source string Give it a language independent representation, i.e. an interlingua Translation process takes the interlingua as its input Multiple translation processes take the same input for multiple target language outputs

Machine translation Translation models What is the inter-lingua? for words, some sort of semantic analysis, e.g. (GO, BY-FOOT) (GO, BY-TRANSPORT) Russian: идти ехать English: go go

Machine translation and globalisation Translation models What is the inter-lingua? for sentences, a logical language e.g. First Order Predicate Calculus

Meaning representation Goal: 1.the semantic representation must give you a one-to-one mapping to non-linguistic knowledge of the world 2. The representation must be expressive, i.e. handle different types of data

Meaning representation First Order Predicate Calculus computationally tractable objects (terms) properties of objects relations amongst objects Predicate argument structure large composite representations logical connectives

Meaning representation First Order Predicate Calculus Object: referred to uniquely by a term constant e.g. SurreyUniversity function e.g. LocationOf(SurreyUniversity) variable

Meaning representation First Order Predicate Calculus Relations amongst objects Predicates: “symbols that refer to, or name, the relations that hold among some fixed number of objects” (J & M) Educates(SurreyUniversity, Citizens) two-place predicate

Meaning representation First Order Predicate Calculus Relations amongst objects Predicates: Can specify the category of an object University(SurreyUniversity) one-place predicate

Meaning representation First Order Predicate Calculus properties / parts of objects functions: LocationOf(SurreyUniversity)

Meaning representation First Order Predicate Calculus Composite representations through predicates and functions: Near(LocationOf(SurreyUniversity), LocationOf(Cathedral))

Meaning representation First Order Predicate Calculus Logical connectives combine basic representations to form larger more complex representations e.g ٨ operator = ‘and’

Meaning representation First Order Predicate Calculus Logical connectives combine basic representations to form larger more complex representations Educates(SurreyUniversity, Citizens) ٨ ¬ Remunerates(SurreyUniversity, Staff)

Machine translation and globalisation Machine translation and globalisation: change of priorities 1954: IBM and Georgetown University, first MT demo goal: ‘perfect’ translation 1967: Automatic Language Process Advisory Committee (ALPAC) report: damning of goal Post ALPAC Goal: rough translation, involve human element Current situation: online translation, e.g. Babel Fish, descendant of SYSTRAN whose goal was rough translation Journal of Machine Translation

Next week Globalisation as an industry SDL and the SDLX-TRADOS globalisation application