Interlingua Annotation Owen Rambow Advaith Siddharthan Kathleen McKeown

Slides:



Advertisements
Similar presentations
CODE/ CODE SWITCHING.
Advertisements

Psycholinguistic what is psycholinguistic? 1-pyscholinguistic is the study of the cognitive process of language acquisition and use. 2-The scope of psycholinguistic.
Overview of the Hindi-Urdu Treebank Fei Xia University of Washington 7/23/2011.
INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING NLP-AI IIIT-Hyderabad CIIL, Mysore ICON DECEMBER, 2003.
CALTS, UNIV. OF HYDERABAD. SAP, LANGUAGE TECHNOLOGY CALTS has been in NLP for over a decade. It has participated in the following major projects: 1. NLP-TTP,
Speech and Language Technologies in the Next Generation Localisation CSET Prof. Andy Way, School of Computing, DCU.
CSE111: Great Ideas in Computer Science Dr. Carl Alphonce 219 Bell Hall Office hours: M-F 11:00-11:
Center for Computational Learning Systems Independent research center within the Engineering School NLP people at CCLS: Mona Diab, Nizar Habash, Martin.
Center for Computational Learning Systems Independent research center within the Engineering School NLP people at CCLS: Mona Diab, Nizar Habash, Martin.
David Farwell, Stephen Helmreich Computing Research Laboratory/New Mexico State University Lori Levin, Teruko Mitamura Language Technologies Institute/Carnegie.
Introduction to CL Session 1: 7/08/2011. What is computational linguistics? Processing natural language text by computers  for practical applications.
Linguisitics Levels of description. Speech and language Language as communication Speech vs. text –Speech primary –Text is derived –Text is not “written.
Comments on Guillaume Pitel: “Using bilingual LSA for FrameNet annotation of French text from generic resources” Gerd Fliedner Computational Linguistics.
 Final: This classroom  Course evaluations Final Review.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
ELN – Natural Language Processing Giuseppe Attardi
AQUAINT Kickoff Meeting – December 2001 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
9/8/20151 Natural Language Processing Lecture Notes 1.
Arabic STD 2006 Results Jonathan Fiscus, Jérôme Ajot, George Doddington December 14-15, Spoken Term Detection Workshop
Machine Translation Dr. Radhika Mamidi. What is Machine Translation? A sub-field of computational linguistics It investigates the use of computer software.
Computational Investigation of Palestinian Arabic Dialects
1 Interlingual Annotation of Multilingual Text Corpora (IAMTC) Project Overview for ITIC November 13, 2003 Carnegie Mellon University Lori Levin, Teruko.
Linguistics & AI1 Linguistics and Artificial Intelligence Linguistics and Artificial Intelligence Frank Van Eynde Center for Computational Linguistics.
Feb 23, Interlingua Annotation of Multilingual Corpora (IAMTC) Project Lori Levin and Teruko Mitamura Language Technologies Institute Carnegie Mellon.
Scott Duvall, Brett South, Stéphane Meystre A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development.
Lessons Learned Mokusei: Multilingual Conversational Interfaces Future Plans Explore language-independent approaches to speech understanding and generation.
WHAT IS LINGUISTICS?. LINGUISTICS IS THE SCIENTIFIC STUDY OF HUMAN NATURAL LANGUAGE.
Parallel Syntactic Annotation of Multiple Languages Owen Rambow, Bonnie Dorr, David Farwell, Rebecca Green, Nizar Habash, Stephen Helmreich, Eduard Hovy,
ACL Birds of a Feather Corpus Annotation with Interlingual Content Interlingual Annotation of Multilingual Text Corpora Bonnie Dorr, David Farwell, Rebecca.
Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,
Natural Language Processing Daniele Quercia Fall, 2000.
THE NATURE OF TEXTS English Language Yo. Lets Refresh So we tend to get caught up in the themes on English Language that we need to remember our basic.
Approaches to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way.
11 Chapter 19 Lexical Semantics. 2 Lexical Ambiguity Most words in natural languages have multiple possible meanings. –“pen” (noun) The dog is in the.
1 CSI 5180: Topics in AI: Natural Language Processing, A Statistical Approach Instructor: Nathalie Japkowicz Objectives of.
SVETLA KOEVA SVETLOZARA LESEVA BORISLAV RIZOV. The project Automatic information extraction based on semantic relations (RILA – a bilateral co-operation.
What you have learned and how you can use it : Grammars and Lexicons Parts I-III.
ICS 482: Natural language Processing Pre-introduction
Introduction to Linguistics Class # 1. What is Linguistics? Linguistics is NOT: Linguistics is NOT:  learning to speak many languages  evaluating different.
Lecture 1 Lec. Maha Alwasidi. Branches of Linguistics There are two main branches: Theoretical linguistics and applied linguistics Theoretical linguistics.
For Friday Finish chapter 24 No written homework.
1 Branches of Linguistics. 2 Branches of linguistics Linguists are engaged in a multiplicity of studies, some of which bear little direct relationship.
The Minimalist Program
Semantic Annotation for Interlingual Representation of Mulilingual Texts Teruko Mitamura (CMU), Keith Miller (MITRE), Bonnie Dorr (Maryland), David Farwell.
Natural Language Processing Chapter 1 : Introduction.
nd PIRE project workshop1 Tectogrammatical Representation of English Silvie Cinková Lucie Mladová, Anja Nedoluzhko, Jiří Semecký, Jana Šindlerová,
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 1 (03/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Introduction to Natural.
Jeopardy Syntax Morphology Sociolinguistics and Prescriptivism Phonology Language and Diversity Q $100 Q $200 Q $300 Q $400 Q $500 Q $100 Q $200 Q $300.
Utkal University We Work On Image Processing Speech Processing Knowledge Management.
Human-Assisted Machine Annotation Sergei Nirenburg, Marjorie McShane, Stephen Beale Institute for Language and Information Technologies University of Maryland.
Slang. Informal verbal communication that is generally unacceptable for formal writing.
History of the English Language ENGL Spring Semester 2005.
1 CPA: Where do we go from here? Research Institute for Information and Language Processing, University of Wolverhampton; UPF Barcelona; University of.
Final Review  Syntax  Semantics/Pragmatics  Sociolinguistics  FINAL will be part open book, and part closed book  Will use similar tasks as Problem.
Basics of Natural Language Processing Introduction to Computational Linguistics.
A method to restrict the blow-up of hypotheses... A method to restrict the blow-up of hypotheses of a non-disambiguated shallow machine translation system.
Welcome to All S. Course Code: EL 120 Course Name English Phonetics and Linguistics Lecture 1 Introducing the Course (p.2-8) Unit 1: Introducing Phonetics.
INTRODUCTION TO APPLIED LINGUISTICS
Language choice in multilingual communities
Cross-Dialectal Data Transferring for Gaussian Mixture Model Training in Arabic Speech Recognition Po-Sen Huang Mark Hasegawa-Johnson University of Illinois.
Bilingualism, Code-Switching, Code Mixing, Pidgin, Creole Widhiyanto 1Subject: Topics in Applied Linguistics.
Language in Use Pragmatics Natural and conventional signs.
Approaches to Machine Translation
--Mengxue Zhang, Qingyang Li
Approaches to Machine Translation
Language- an abstract cognitive system which allows humans to produce and comprehend meaningful utterances Dialect- a variety of language, defined by geographical.
CS224N Section 3: Corpora, etc.
Artificial Intelligence 2004 Speech & Natural Language Processing
Owen Rambow 6 Minutes.
Presentation transcript:

Interlingua Annotation Owen Rambow Advaith Siddharthan Kathleen McKeown

Goal Determine feasible deep semantic, language-independent annotation (“interlingua”) for text Different from PropBank, FrameNet, WordNet: these projects are language- dependent

Expected Results Annotation guidelines (methodology, manual) for annotating language- independent meaning representation on texts in 7 languages Methodology for porting to new languages Annotated corpora

Methodology Use source-language texts and multiple translations into English Develop successively more language- independent levels of representation: o (deep syntax) o language-specific lexical disambiguation and thematic structure (agent, theme, …) o language-independent representation

Methodology (2) Six sites (CMU, Columbia, ISI, Mitre, NMSU, UMd) Each site has one language; Columbia: Hindi Closer cooperation Columbia-UMd on Arabic and Hindi Division of tasks and expertise among sites

Methodology (3) Use annotators from beginning to test inter-annotator agreement Columbia: have hired a native Hindi annotator (near-native English) and an English-language annotator

Research Issues Research: develop annotation scheme(s), methodology, manuals Levels (reminder): o (deep syntax) o language-specific lexical disambiguation o language-independent representation Questions: o Which levels do we annotate explicitly? o What is included where? o How do we annotate? Using which tools?

Timeline January: develop language-specific disambiguation February-March: annotate, measure April-June: develop language-independent annotation July-August: annotate, measure Year 2: review results, adjust annotation scheme Year 3: annotate

Arabic Dialects Owen Rambow (Nizar Habbash)

Goal Investigate representation of linguistic resources for closely related languages/dialects Example: Arabic Automatically derive NLP tools for cross- dialect MT

Note on Arabic Interest: Only one written dialect: Modern Standard Arabic (MSA), rarely spoken spontaneously Many spoken dialects, almost never written Dialects function of geography, urban/rural, Bedouin/sedentary, sex, religion, … Code switching (mainly dialect-MSA): several linguistic systems in same sentence Challenge for traditional NLP approaches!

Expected Results Representation of phonology, lexicon, morphology, and syntax for Modern Standard Arabic and Egyptian Colloquial Arabic Tools for converting between MSA and ECA Demonstration of tools in several domains (ECA speech recognition, ECA -> English translation)

Methodology Use existing scholarly resources to compile sound-change rules, morphological representations, syntactic representations Use native speakers to validate, and augment lexicon Develop representation Develop automatic compilation of NLP tools

Timeline Sep-Dec: start compiling sound change rules, morphological rules, syntax Jan-April: develop representations for sound change rules, morphology Jan-Apr: develop conversion rules May-August: work on ECA speech recognition application Note: also working on MSA syntax Year 2: extend to syntax, extend to second dialect (Palestinian? Iraqi?)