1 Lending a Hand: Sign Language Machine Translation Sara Morrissey NCLT Seminar Series 21 st June 2006.

Slides:



Advertisements
Similar presentations
Rationale for a multilingual corpus for machine translation evaluation Debbie Elliott Anthony Hartley Eric Atwell Corpus Linguistics 2003, Lancaster, England.
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
The Application of Machine Translation in CADAL Huang Chen, Chen Haiying Zhejiang University Libraries, Hangzhou, China
The Chinese Room: Understanding and Correcting Machine Translation This work has been supported by NSF Grants IIS Solution: The Chinese Room Conclusions.
Research topics Semantic Web - Spring 2007 Computer Engineering Department Sharif University of Technology.
Speech Translation on a PDA By: Santan Challa Instructor Dr. Christel Kemke.
Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.
Hybridity in MT: Experiments on the Europarl Corpus Declan Groves 24 th May, NCLT Seminar Series 2006.
Image Search Presented by: Samantha Mahindrakar Diti Gandhi.
1/7 INFO60021 Natural Language Processing Harold Somers Professor of Language Engineering.
Data-Driven Machine Translation for Sign Languages Sara Morrissey PhD topic NCLT/CNGL Workshop 23 rd July 2008.
A System for A Semi-Automatic Ontology Annotation Kiril Simov, Petya Osenova, Alexander Simov, Anelia Tincheva, Borislav Kirilov BulTreeBank Group LML,
EBMT1 Example Based Machine Translation as used in the Pangloss system at Carnegie Mellon University Dave Inman.
1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.
Machine Translation Prof. Alexandros Potamianos Dept. of Electrical & Computer Engineering Technical University of Crete, Greece May 2003.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.
Sign Language Representation for Machine Translation Sara Morrissey NCLT/CNGL Seminar Series 1 st April, 2009.
Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.
1 The Web as a Parallel Corpus  Parallel corpora are useful  Training data for statistical MT  Lexical correspondences for cross-lingual IR  Early.
1 Statistical NLP: Lecture 13 Statistical Alignment and Machine Translation.
Linguistic Transference and Interference: Interpreting Between English and ASL Jeffrey Davis Davis, Jeffrey E Linguistic transference and interference:
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
CAREERS IN LINGUISTICS OUTSIDE OF ACADEMIA CAREERS IN INDUSTRY.
Natural Language Processing Lab Northeastern University, China Feiliang Ren EBMT Based on Finite Automata State Transfer Generation Feiliang Ren.
McEnery, T., Xiao, R. and Y.Tono Corpus-based language studies. Routledge. Unit A 2. Representativeness, balance and sampling (pp13-21)
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
Machine Translation Dr. Radhika Mamidi. What is Machine Translation? A sub-field of computational linguistics It investigates the use of computer software.
BTANT 129 w5 Introduction to corpus linguistics. BTANT 129 w5 Corpus The old school concept – A collection of texts especially if complete and self-contained:
Evaluation of the Statistical Machine Translation Service for Croatian-English Marija Brkić Department of Informatics, University of Rijeka
An Integrated Approach for Arabic-English Named Entity Translation Hany Hassan IBM Cairo Technology Development Center Jeffrey Sorensen IBM T.J. Watson.
The Impact of Grammar Enhancement on Semantic Resources Induction Luca Dini Giampaolo Mazzini
Profile The METIS Approach Future Work Evaluation METIS II Architecture METIS II, the continuation of the successful assessment project METIS I, is an.
Eliciting Features from Minor Languages The elicitation tool provides a simple interface for bilingual informants with no linguistic training and limited.
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
Kyoshiro SUGIYAMA, AHC-Lab., NAIST An Investigation of Machine Translation Evaluation Metrics in Cross-lingual Question Answering Kyoshiro Sugiyama, Masahiro.
1 The Ferret Copy Detector Finding short passages of similar texts in large document collections Relevance to natural computing: System is based on processing.
Leveraging Reusability: Cost-effective Lexical Acquisition for Large-scale Ontology Translation G. Craig Murray et al. COLING 2006 Reporter Yong-Xiang.
Chapter 10 Language and Computer English Linguistics: An Introduction.
Sign Language corpora for analysis, processing and evaluation A. Braffort, L. Bolot, E. Chételat-Pelé, A. Choisier, M. Delorme, M. Filhol, J. Segouat,
Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,
Collaborative Annotation of the AMI Meeting Corpus Jean Carletta University of Edinburgh.
Approximating a Deep-Syntactic Metric for MT Evaluation and Tuning Matouš Macháček, Ondřej Bojar; {machacek, Charles University.
Approaches to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way.
Seminar in Applied Corpus Linguistics: Introduction APLNG 597A Xiaofei Lu August 26, 2009.
Translation Memory System (TMS)1 Translation Memory Systems Presentation by1 Melina Takanen & Julianna Ekert CAT Prof. Thorsten Trippel University.
Project Overview Vangelis Karkaletsis NCSR “Demokritos” Frascati, July 17, 2002 (IST )
What you have learned and how you can use it : Grammars and Lexicons Parts I-III.
Mutual bilingual terminology extraction Le An Ha*, Gabriela Fernandez**, Ruslan Mitkov*, Gloria Corpas*** * University of Wolverhampton ** Universidad.
Towards the Use of Linguistic Information in Automatic MT Evaluation Metrics Projecte de Tesi Elisabet Comelles Directores Irene Castellon i Victoria Arranz.
Divided Pretreatment to Targets and Intentions for Query Recommendation Reporter: Yangyang Kang /23.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Error Analysis of Two Types of Grammar for the purpose of Automatic Rule Refinement Ariadna Font Llitjós, Katharina Probst, Jaime Carbonell Language Technologies.
Multilingual Information Retrieval using GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of Kaohsiung.
Pastra and Saggion, EACL 2003 Colouring Summaries BLEU Katerina Pastra and Horacio Saggion Department of Computer Science, Natural Language Processing.
October 10, 2003BLTS Kickoff Meeting1 Transfer with Strong Decoding Learning Module Transfer Rules {PP,4894} ;;Score: PP::PP [NP POSTP] -> [PREP.
A Multilingual Hierarchy Mapping Method Based on GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
Review: Review: Translating without in-domain corpus: Machine translation post-editing with online learning techniques Antonio L. Lagarda, Daniel Ortiz-Martínez,
Short Text Similarity with Word Embedding Date: 2016/03/28 Author: Tom Kenter, Maarten de Rijke Source: CIKM’15 Advisor: Jia-Ling Koh Speaker: Chih-Hsuan.
Grammatical Issues in translation
LingWear Language Technology for the Information Warrior Alex Waibel, Lori Levin Alon Lavie, Robert Frederking Carnegie Mellon University.
Approaches to Machine Translation
Introduction Multimedia initial focus
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Statistical NLP: Lecture 13
8th Annual Post-Graduate Research Symposium
Multimedia Information Retrieval
Approaches to Machine Translation
Presentation transcript:

1 Lending a Hand: Sign Language Machine Translation Sara Morrissey NCLT Seminar Series 21 st June 2006

2 Overview Introduction -What, why, how…? Out with the old… -SL Corpora -The System …in with the new - *new and improved* Lost in Translation -Evaluation issues Conclusion

3 Introduction Q WHAT ? A Sign Language Visually articulated language Linguistic phenomena prevalent to SLs ~Classifiers ~Non-manual features (NMFs) ~Discourse mapping and use of signing space

4 Introduction (2) Q WHY ? A a) Improve communication b) Stretching application of EBMT Q HOW? A Our approach ~Annotated SL corpora ~Example-based MT employing Marker Hypothesis (Green, 1979)

5 Introduction (3) Other approaches ~Transfer - Grieve-Smith, 1999; Marshall & Sáfár, 2002, Sáfár & Marshall 2002; Van Zijl & Barker, 2003 ~Interlingua – Veale et al., 1998; Zhao et al., 2000 ~Multi-path – Huenerfauth, 2004, 2005 ~Statistical – Bauer et al., 1999, Bungeroth & Ney, 2004, 2005, 2006

6 Corpora Out with the old… Corpora Difficult to find ECHO project Nederlandse Gebarentaal (NGT) corpora ~40 minutes of video data ~5 Aesop’s fables by two signers and SL poetry ~Combined corpus of 561 sentences

7 Annotation Out with the old… Annotation Why annotate? ~No formal written form for SLs ~Linguistic description including NMFs ~Can include translation making corpus bi/trilingual ~Time for chunking and aligning present ELAN annotation toolkit ~Graphical user interface displaying videos and annotations simultaneously (Fig. 1) ~Time-aligned and non-time-aligned annotations including NMF description, repetition notation and notes on indexing and role.

8 Figure 1. ELAN interface

9 The System Out with the old… The System Segmentation using the ‘Marker Hypothesis’ (MH) (Green, 1979) ~Analagous to system of (Way & Gough, 2003; Gough & Way, 24a/b) ~Segments spoken language sentences according to a set of closed class words ~Chunks start with closed class words and usually encapsulate a concept or an attribute of a concept forming concept chunks, e.g or with tiny curls

10 MH not suitable for use with SL side of corpus due to sparseness of closed class item markers ~NGT gloss tier segmented based on time spans of its annotations, remaining annotations with same time span grouped with gloss tier segments forming concept chunks similar to English marker chunks ~Despite different methods, they are successful in forming potentially alignable concept chunks The System (2) Out with the old… The System (2)

11 The System (3) Out with the old… The System (3) English chunk or with tiny curls NGT chunk (Gloss RH) TINY CURLS (Gloss LH) TINY CURLS (Repetition RH) u (Repetition LH) u (Eye Gaze) l,d

12 The System (4) Out with the old… The System (4) Searches for exact sentence match in aligned bilingual corpus Uses MH to segment input and searches matching or close match chunks in English side of aligned corpus Looks for individual words in the bilingual lexicon

13 Experiments Out with the old… Experiments English and Dutch to NGT (Morrissey & Way, 2005) ~100 sentences ~Annotations subjective so evaluation difficult, but promising results NGT to English Dutch ~Traditional MT evaluation metrics can be applied (SER, WER, PER, BLEU) ~Sparse output and low scores due to lack of closed class lexical items in NGT ~Common marker word insertion

14 Experiments (2) Out with the old… Experiments (2) SER96%WER119% PER78%BLEU0 Example output and reference translation: mouse promised help “you see” said the mouse, “I promised to help you”

15 …in with the new New Corpus -~1400 sentences (SunDial and ATIS corpora) -Flight information queries ISL signed video version Homespun annotation -With view to end product New system -OpenLab

16 Evaluation issues Lost in Translation Evaluation issues Mainstream evaluation techniques ~Exact text matching ~No recognition of synonyms, syntactic structure, semantics ~SLs no gold standard Other possible evaluation metrics ~Number of content words/number of words in ref translation ~Evaluation of syntactic or semantic relations

17 Conclusions Basic system Corpus problems - Larger corpus such as ISL one in creation, more scope for matches, annotations subjective EBMT caters for some SL linguistic phenomena Evaluation metrics unsuitable oral non-oral translation

18 Future Work Adding in NMF information Manual analysis Language model to improve output Suitable evaluation metrics Review other writing systems for SLs Avatar…

19 Thank You Questions?