The Chinese Room: Understanding and Correcting Machine Translation This work has been supported by NSF Grants IIS-0612791. Solution: The Chinese Room Conclusions.

Slides:



Advertisements
Similar presentations
Automatic Timeline Generation from News Articles Josh Taylor and Jessica Jenkins.
Advertisements

The Application of Machine Translation in CADAL Huang Chen, Chen Haiying Zhejiang University Libraries, Hangzhou, China
© Paradigm Publishing, Inc Word 2010 Level 2 Unit 1Formatting and Customizing Documents Chapter 2Proofing Documents.
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Help communities share knowledge more effectively across the language barrier Automated Community Content Editing PorTal.
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Language tools for writers Ola Knutsson IPLab, NADA, KTH Sweden.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
A Language Environment for Second Language Writers Ola Knutsson KTH Nada.
Helping people find content … preparing content to be found Enabling the Semantic Web Joseph Busch.
Dialogue – Driven Intranet Search Suma Adindla School of Computer Science & Electronic Engineering 8th LANGUAGE & COMPUTATION DAY 2009.
© by Pearson Education, Inc. All Rights Reserved.
Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.
Predicting Text Quality for Scientific Articles AAAI/SIGART-11 Doctoral Consortium Annie Louis : Louis A. and Nenkova A Automatically.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
Introduction to Computational Linguistics Lecture 2.
1 CS 430 / INFO 430 Information Retrieval Lecture 24 Usability 2.
Comments on Guillaume Pitel: “Using bilingual LSA for FrameNet annotation of French text from generic resources” Gerd Fliedner Computational Linguistics.
About the Presentations The presentations cover the objectives found in the opening of each chapter. All chapter objectives are listed in the beginning.
Exploring Microsoft® Office Grauer and Barber 1 Committed to Shaping the Next Generation of IT Experts. Robert Grauer and Maryann Barber Using.
An innovative platform to allow translation and indexing of internet sites Localization World
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
English Word Origins Grade 3 Middle School (US 9 th Grade) Advanced English Pablo Sherman The etymology of language.
Knowledge Science & Engineering Institute, Beijing Normal University, Analyzing Transcripts of Online Asynchronous.
Artificial Intelligence. Agenda StartEnd Introduction AI Future Recent Developments Turing Test Turing Test Evaluation.
China Patent Information For Western Users Huabing Liu Intellectual Property Publishing House, SIPO.
An Automatic Segmentation Method Combined with Length Descending and String Frequency Statistics for Chinese Shaohua Jiang, Yanzhong Dang Institute of.
WYNN Reader/Wizard Training Module Karie Lawrence Cypress-Fairbanks I.S.D.
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
Computational Methods to Vocalize Arabic Texts H. Safadi*, O. Al Dakkak** & N. Ghneim**
Instructional Guide Original presentation created by EasyBib, adapted by S. Hall for educational purposes following Fair Use Guidelines and permission.
Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.
An Interactive Multimedia Database of U.S. Courthouses 1 CourtsWeb, is a website that evaluates and documents recent federal courthouses. It is a decision.
Visual User Interfaces David Rashty. “Grasping the whole is a gigantic theme. Arguably, intellectual history’s most important. Ant-vision is humanity’s.
“How much context do you need?” An experiment about context size in Interactive Cross-language Question Answering B. Navarro, L. Moreno-Monteagudo, E.
StAR web server tutorial for ROC Analysis. ROC Analysis ROC Analysis: This module allows the user to input data for several classifiers to be tested.
Sharad Oberoi and Susan Finger Carnegie Mellon University DesignWebs: Towards the Creation of an Interactive Navigational Tool to assist and support Engineering.
IC 3 BASICS, Internet and Computing Core Certification Key Applications Lesson 10 Creating and Formatting an Excel Worksheet.
PETRA – the Personal Embedded Translation and Reading Assistant Werner Winiwarter University of Vienna InSTIL/ICALL Symposium 2004 June 17-19, 2004.
COMP106 Assignment 2 Proposal 1. Interface Tasks My new interface design for the University library catalogue will incorporate all of the existing features,
Comparing syntactic semantic patterns and passages in Interactive Cross Language Information Access (iCLEF at the University of Alicante) Borja Navarro,
1 NORMA Lab. 5 Duplicating Object Type and Predicate Shapes Finding Displayed Shapes Using the Diagram Spy Using Multiple Windows Using the Context Window.
What you have learned and how you can use it : Grammars and Lexicons Parts I-III.
Individual Differences in Human-Computer Interaction HMI Yun Hwan Kang.
Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.
From Text to Image: Generating Visual Query for Image Retrieval Wen-Cheng Lin, Yih-Chen Chang and Hsin-Hsi Chen Department of Computer Science and Information.
Iana Atanassova Research: – Information retrieval in scientific publications exploiting semantic annotations and linguistic knowledge bases – Ranking algorithms.
Editing Basics Lesson 8. Skills Matrix SKILL #MATRIX SKILL 2.2.1Cut, copy, and paste text 2.2.2Find and replace text 4.1.1Insert building blocks in documents.
JISC/NSF PI Meeting, June Archon - A Digital Library that Federates Physics Collections with Varying Degrees of Metadata Richness Department of Computer.
Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,
 Programming - the process of creating computer programs.
Collaborative Query Previews in Digital Libraries Lin Fu, Dion Goh, Schubert Foo Division of Information Studies School of Communication and Information.
Mr. Munaco Computer Technology TEACHING ADVANCED WORD 2007.
Getting Started 1) Open Read & Write Gold 2) Open Word 3) Click on textHELP drop down arrow 4) Choose General Options.
Additional Features in Microsoft Word Session Version 1.0 © 2011 Aptech Limited.
IR&NLP Coursework P1 Text Analysis Within The Fields Of Information Retrieval and Natural Language Processing By Ben Addley Academic Year 2004.
Proofing Documents Lesson 9 #1.09.
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
Jean-Yves Le Meur - CERN Geneva Switzerland - GL'99 Conference 1.
Big Data: Every Word Managing Data Data Mining TerminologyData Collection CrowdsourcingSecurity & Validation Universal Translation Monolingual Dictionaries.
Business Process Modeling What is a process model? – A formal way of representing how a business system operates. – Illustrates the activities that are.
Language Identification and Part-of-Speech Tagging
Microsoft Official Academic Course, Microsoft Word 2013
Differentiating Instruction Using Nettrekker
Language Technologies Institute Carnegie Mellon University
Learning Usage of English KWICly with WebLEAP/DSR
Collaboration Spotting: Visualisation of LHCb process data
Supporting Students' Native Language in the Classroom
Automated MS Word and PowerPoint Translator
Reading Strategies “The only guide you'll ever need to Reading Chinese,” accessed at Zizzle Learn Chinese
Presentation transcript:

The Chinese Room: Understanding and Correcting Machine Translation This work has been supported by NSF Grants IIS Solution: The Chinese Room Conclusions What is Machine Translation? Related Work Machine Translation (MT) is the process of automatically converting text from one human language to another (Ex: Chinese to English) MT is performed by algorithms that extract statistical translation rules from millions of human generated translation pairs (sentences with the same meaning in both Chinese and English) Uses of MT: People that want to read text in an unknown foreign language People who are barely proficient with a language can use it to learn Businesses want to translate documents into other languages We focus on the first case, though our work could easily be extended to the other cases as well Idea: We propose a collaborative approach between users, who have good world knowledge and writing skills, and the machine, which is good at processing large amounts of data into useful linguistic resources. We have created an interactive visualization of these linguistic resources that enables the user to explore alternative translations in order to better understand and correct machine translations. Design was based on iterative improvement with expert users Promising preliminary results on pilot study “The Chinese Room” is an interface that allows users to explore and interact with linguistic resources as they attempt to understand poor automatic translations Many remaining challenges, including integrating other forms of information, and exploiting uncertain sources of information Our tool can manageably expose a variety of resources and a huge amount of data to the user, allowing monolingual speakers to determine the most likely translation without any knowledge of the foreign language. Future Work Displays the original characters, automatically segmented into approximate words Displays the mapping (given by the MT system) between Chinese and English words English words are clustered together based on these alignments. Displays the English translation generated by the MT system for the selected Chinese sentence Represents definitions for words (first column) and individual Chinese characters (second column) Definitions are aligned horizontally with the word or character that they define Shows the automatically generated grammatical structure of the source sentence Colors correspond to different parts of speech (blue for verbs, red for nouns, etc) Other resources are displayed as text in the rightmost pane: N-Best Re-Translations: This is a list of candidate English sentences (or phrases) that the Machine Translation system (in this case, Google) was considering for the phrase selected by the user. The Problem With Machine Translation Machine translated sentences are often difficult or impossible to understand. Example machine translation: He utter eyes and not the slightest attention As leakage. Intended meaning: His eyes were wide apart; nothing in their field of vision escaped. Errors are caused by the machine’s lack of world knowledge and its inability to form coherent sentences or understand ideas. DerivTool – An interface for observing the inner workings of a specific MT system. Required knowledge of both languages and an in-depth knowledge of how MT works. [DeNeefe et. al, 2005] Design cues from systems such as TreeJuxtaposer [Munzner, 2003], and from Envisioning Information [Tufte, 1990] Further applications of this basic collaborative approach (language education, end-user understanding, commercial translation processes, MT design and more) Extending the tool to other language pairs (shown to the left working with Arabic) Further efforts in usability and ease-of-use could be very beneficial Other resources (manually created translation rules, incorporation of translation memory) might be helpful to the user. Visualization: Interaction: Clicking on English words allows them to be edited Dragging English words allows the user to visually experiment with different word orders Mousing over the definitions highlights the corresponding Chinese character or word Clicking on the Chinese Syntax Tree lines causes that section of the sentence to collapse (or expand if clicked again later), allowing the user to better focus on difficult parts of the sentence Clicking and dragging selects a Chinese phrase (and begins the search for similar example translations) Clicking on an example search result puts that sentence in the main view for more detailed inspection. Clicking on the translation tab requests N-Best translations Clicking on a sentence in the document view selects it as the current sentence Clicking on the edit tab allows the user to type and directly modify the translated text Chinese Text: Word Alignments Chinese Syntax Tree English Text: Translation Dictionary: Additional Resources: Document View: Every sentence in the document can be seen at once, giving a better sense of the meaning in the context of the document. Edit Area: The English translation can be edited in a small text area so that users can quickly edit and annotate the sentence. Example Search: Search results are displayed in the rightmost column, with the matches shown in pink, and are sorted by relevance. By interacting with the various components, the users can better understand the original meaning of the Chinese text. Screenshot of the Chinese Room Josh Albrecht, Rebecca Hwa, and G. Elisabeta Marai Department of Computer Science, University of Pittsburgh