Presentation is loading. Please wait.

Presentation is loading. Please wait.

Interlingua Annotation Owen Rambow Advaith Siddharthan Kathleen McKeown

Similar presentations


Presentation on theme: "Interlingua Annotation Owen Rambow Advaith Siddharthan Kathleen McKeown"— Presentation transcript:

1 Interlingua Annotation Owen Rambow Advaith Siddharthan Kathleen McKeown rambow@cs.columbia.edu

2 Goal Determine feasible deep semantic, language-independent annotation (“interlingua”) for text Different from PropBank, FrameNet, WordNet: these projects are language- dependent

3 Expected Results Annotation guidelines (methodology, manual) for annotating language- independent meaning representation on texts in 7 languages Methodology for porting to new languages Annotated corpora

4 Methodology Use source-language texts and multiple translations into English Develop successively more language- independent levels of representation: o (deep syntax) o language-specific lexical disambiguation and thematic structure (agent, theme, …) o language-independent representation

5 Methodology (2) Six sites (CMU, Columbia, ISI, Mitre, NMSU, UMd) Each site has one language; Columbia: Hindi Closer cooperation Columbia-UMd on Arabic and Hindi Division of tasks and expertise among sites

6 Methodology (3) Use annotators from beginning to test inter-annotator agreement Columbia: have hired a native Hindi annotator (near-native English) and an English-language annotator

7 Research Issues Research: develop annotation scheme(s), methodology, manuals Levels (reminder): o (deep syntax) o language-specific lexical disambiguation o language-independent representation Questions: o Which levels do we annotate explicitly? o What is included where? o How do we annotate? Using which tools?

8 Timeline January: develop language-specific disambiguation February-March: annotate, measure April-June: develop language-independent annotation July-August: annotate, measure Year 2: review results, adjust annotation scheme Year 3: annotate

9 Arabic Dialects Owen Rambow (Nizar Habbash) rambow@cs.columbia.edu

10 Goal Investigate representation of linguistic resources for closely related languages/dialects Example: Arabic Automatically derive NLP tools for cross- dialect MT

11 Note on Arabic Interest: Only one written dialect: Modern Standard Arabic (MSA), rarely spoken spontaneously Many spoken dialects, almost never written Dialects function of geography, urban/rural, Bedouin/sedentary, sex, religion, … Code switching (mainly dialect-MSA): several linguistic systems in same sentence Challenge for traditional NLP approaches!

12 Expected Results Representation of phonology, lexicon, morphology, and syntax for Modern Standard Arabic and Egyptian Colloquial Arabic Tools for converting between MSA and ECA Demonstration of tools in several domains (ECA speech recognition, ECA -> English translation)

13 Methodology Use existing scholarly resources to compile sound-change rules, morphological representations, syntactic representations Use native speakers to validate, and augment lexicon Develop representation Develop automatic compilation of NLP tools

14 Timeline Sep-Dec: start compiling sound change rules, morphological rules, syntax Jan-April: develop representations for sound change rules, morphology Jan-Apr: develop conversion rules May-August: work on ECA speech recognition application Note: also working on MSA syntax Year 2: extend to syntax, extend to second dialect (Palestinian? Iraqi?)


Download ppt "Interlingua Annotation Owen Rambow Advaith Siddharthan Kathleen McKeown"

Similar presentations


Ads by Google